200801229 final presentation
DESCRIPTION
Presentation on Textual commitment system for natural language understandingTRANSCRIPT
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Towards Building a Text CommitmentSystem
Gaurav Arora1
200801229Supervisor
Prof. Prasenjit Majumder1
1Dhirubhai Ambani Institute of Information and Communication Technology
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Outline
1 IntroductionProblem DefinationNatural Langauge understandingLiterature Survey and UsageApproach Overview
2 Structural Features
3 Our ApproachGenerating Model for simple sentencesExtracting Similar POS Patterns and sentencegenreration
4 Results and Analysis
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Problem Definition
Textual Commitment
Publicly Held BeliefsTextual Commitment system simplifies complex sentence in aset of simple sentences which are public beliefs conveyed bythe complex sentence.
Textual Commitment OriginTextual Entailment was proposed by LCC(Language ComputerCorporation) to be used as core component module for NaturalLanguage Understanding.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Problem Definition
Textual Commitment Example
Example Complex SentenceText: "The Extra Girl" (1923) is a story of a small?town girl,SueGraham (played by Mabel Normand) who comes to Hollywoodto be in the pictures. This Mabel Normand vehicle, produced byMack Sennett, followed earlier films about the film industry andalso paved the way for later films about Hollywood, such asKing Vidor?s "Show People" (1928).
Simplified SentencesT1. "The Extra Girl" is a story of a small?town girl.T2. "The Extra Girl" is a story of Sue Graham.T3. Sue Graham is a small?town girl.T4. Sue Graham [was] played by Mabel Normand.T5. Sue Graham comes to Hollywood to be in the pictures.T6. A Mabel Normand vehicle was produced by Mack Sennett.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Natural Language understanding
Language Understanding Components
By machine reading or understanding text mean theformation of a coherent set of beliefs based on a textualcorpus and a background theory.Textual Entailment systems determine whether onesentence is entailed by another.
Language understanding FeaturesNoisyLimited scopeCorpus-wide statisticsMinimal reasoningBottom upGeneralVery Fast!
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Question Answering to QA4MRE
Question Answering(QA) System have a upper bound of60% of accuracy in systems performance.Current QA system have less emphasis on Understandingand analyzing text.To tackle 60% upper bound QA4MRE focuses onunderstanding single document and emphasis is oncomponent like Textual Commitment,Textual Entailment.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Textual Entailment
Pascal Recognising Textual Entailment(RTE) Challenge isreputed evaluation campaign for research in TextualEntailment from past 7 years.Researcher use logic prover to detect entailment toovercome need of background knowledge with anperformance upper bound as 71%.LCC Proposed Textual commitment obtained 9%improvement over upper bound.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Textual Entailment classes
Figure: Textual Entailment Classes
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Literature Survey and Usage
Textual Commitment Approach
LCC Heuristic ApproachLCC’s TC system uses a series of ex-traction heuristics in orderto enumerate a subset of the discourse commitments that areinferable from either the text or hypothesis
Statistical approach for Textual CommitmentDue to unavailability of Heuristics, we decided to build a TextualCommitment system using statistical features of Language.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Approach Overview
Statistical approach for Textual Commitment
Learning Grammatical Structural rules of SimpleSentences(POS Tags).Converting Complex Sentences into Structural Elements.Finding Similar Rules for Generating Simple sentences.Generating simple sentences in natural language based onRules.
Example Part of Speech TaggingThey-PRP were-VBD easy-JJ as-IN they-PRP levelled-VBD
FeatureKey feature for statistical language , Textual Commitmentgeneration is Part of Speech tagging.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Simple Sentence Distribution
Figure: A Distribution of sentence in english
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Comparison of POS Tags
Figure: A Distribution of POS Tags in simple sentences
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Module 1 Block diagram
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Basic Components
Tri-gram Language Model Generation on POS Tags.Artificial Generation of POS Patterns.Ranking of Artificially generated sentences based oncreated Language Model.
ExampleRanked POSTAG Patterns-53.7293 DT NN VBD VBN-54.0778 PRP VBP RB VBN-54.2327 NNP NN NNP NNP-54.7982 PRP VBP RB JJ-55.3234 NNP NNP NN NNPTotal Generated Rules: 9606406
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Distribution of POSTAG in Simple Sentence Tokens
Figure: Textual Entailment Classes
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Generating Model for simple sentences
Distribution of Simple Sentence based on LM score
Total Rules: 9606406Number of rules categorized by scoresRules > -100 ( 679545 )Rules > -90 ( 170662 )Rules > -80 ( 27280 )Rules > -70 ( 2328 )Rules > -65 ( 474 )Rules > -60 ( 76 )Rules > -70,Words length - 5 and 61594Rules > -83,Words length - 7 and 83110Total Rules considered for matching: 4704
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Module 2 and 3 Block Diagram
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Extracting Similar POS Patterns - Basic Components
Extraction of POS tags and Chunks from Complexsentences.Chunks are Noun Phrase,Together occurring words whichmust also occur together in simple sentences.Considering POS Rules from Module 1 as Virtualdocuments.Searching for Rules/Documents Similar to Chunks andPOSTAG in complex sentences.Xapian is used for search,Phrase search to ensureoccurrence of chunks tags together in Similar rules.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Extracting Similar POS Patterns - Module I/O
ExampleSentence:A Revenue Cutter, the ship was named for Harriet Lane, nieceof President James Buchanan,who served as Buchanan?sWhite House hostess.
ExampleFrequency of POS tags,chunks in Complex Sentences:POSTags: WP=1, VBN=1, IN=3, NNP=8, DT=2, VBD=2,..Chunks: DT NN NN=1, VBD VBN=1,NNP NNP NNP=1 ,..
ExampleExtracted Patterns from Xapian:91% NNP NNP VBD VBN IN DT NN RB86% NNP VBD RB VBN IN DT NN RB
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Simple Sentence Generation - Basic Components
Replacement of all chunks in Similar POS Tag Rules withchunk value.Additional rules with different chunk values are added ifchunk maps to more than one value.After replacement of chunk, Left POS Tags are filled withvalues.Module Generate a lot of noisy sentences from thismodule.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Extracting Similar POS Patterns and sentence generation
Simple Sentence Generation - Module I/O
chunk value mappingNNP NNP-1=White House, NNP NNP-0=Harriet Lane,VBDVBN-0=was named, DT NN-0=the ship, RB IN-0=niece of
ExampleA Revenue Cutter, the ship was named for Harriet Lane, nieceof President James Buchanan,who served as Buchanan?sWhite House hostess.Simple Sentences:Harriet Lane President James Buchanan nieceHarriet Lane served for hostessHarriet Lane was for the shipBuchanan ? White House the ship hostessHarriet Lane was the ship
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Recall System
Recall of System is important,System is input to textualentailment and other Natural Language UnderstandingModule.Recall of our statistical Textual Commitment system is0.23.System Recall calculated on 5 Complex Sentences.Recall value shows positive signs for SophisticatedStatistical Textual Commitment system.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Analysis
Require additional module to rank good sentences andremove noisy sentences.Sophisticated Natural Language Generation Module.Generating Simple sentence pattern from Complexsentences rather than Artificially generating Rules.Finding a more suitable Model- Combination of bi-gramand tri-gram , bigram model .
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
Acknowledgement
I would like to express my sincere thanks to Prof. PrasenjitMajumder for providing opportunity to work under his esteemguidance and helping throughout the project and providing hisvaluable critical suggestion on my work.Additionally i would liketo thanks SRILM and Xapian team for helping me work withtheir open source software.
Introduction Structural Features Our Approach Results and Analysis Acknowledgement References
References
Hickl: A discourse commitment-based framework forrecognizing textual entailmentAnselmo Peñas et. al. Overview of QA4MRE at CLEF2011: Question Answering for Machine ReadingEvaluation, Working Notes of CLEF (2011)L. Bentivogli (FBK-irst) et. al. The Sixth PASCALRecognizing Textual Entailment Challenge( 2010) Olly Betts,Xapian,version 1.2.9Asher Stern and Ido Dagan: A Confidence Model forSyntactically-Motivated Entailment Proofs. In Proceedingsof RANLP 2011Katrin Kirchhoff et. al. Factored Language Models Tutorial(2008)(2002) The IEEE website. [Online]. Available:http://www.ieee.org/(2010) SRILM-Language Modelling Toolkit.