instant question answering system
DESCRIPTION
Instant Question Answering System using machine learning and natural language processingTRANSCRIPT
![Page 1: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/1.jpg)
Instant Question Answering
Dhwaj Raj
![Page 2: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/2.jpg)
● User asks a question in text format and the instantQA system automatically retrieves or formulates an answer and presents it back to the user, instantly.
What is Instant Question Answering?
![Page 3: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/3.jpg)
Why Instant Question Answering?
● In spite of the continuous progress of search engines, many of users’ needs still remain unanswered.
● While Community Question Answering (e.g. AnA platform) can feature factoid questions but their primary goal is to satisfy needs such as: Opinion seeking, Recommendation, Open-ended questions, Problem solving.
● In community question answering user has to wait for answers which he seeks, even if question is very simple and a mere fact.
● Better User Experience : Why browse through search result listings or related questions when information can be catered upfront.
![Page 4: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/4.jpg)
Why Instant Question Answering?
● CASE : SHIKSHA.COM
● Top domains being searched based on Both query logs and data availability with listings: fees, duration, seats, application date, application url, affiliation, approval, entrance exams, placement companies and job salaries.
● High number of Fact type questions, which can be targeted, although we are not targeting opinion based or open ended questions.
● 23% of questions belong to these 10 domains out of 1.15L random sample.
![Page 5: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/5.jpg)
Is it something similar to AnA platform?
● Our organization have a discussion forum called as AnA(Ask and Answer) platform.
● InstantQA has no relation what so ever and no direct usecase with the current AnA forum contents, as of now.
![Page 6: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/6.jpg)
What kind of questions we target?
● What is the price of X?
● When is the last date of Y?
● How much is the fee for W?
● What is the fee for W?
● Which company hire from campus Q?
● How is the placement at Z?
● Is Z college in Delhi? (transform to where)
● What is meaning of life, universe and everything?
● I do not feel like studying, what to do?
● Will I get admission in Z?
● How to improve my career?
● Should I invest in noida?
● I have purchased X project, should I sell it now or hold?
● Is it beneficial to buy 2bhk in 30 lacs?
![Page 7: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/7.jpg)
What kind of questions we target?
● What is the price of X?
● When is the last date of Y?
● How much is the fee for W?
● What is the fee for W?
● Which company hire from campus Q?
● How is the placement at Z?
● Is Z college in Delhi? (transform to where)
● What is meaning of life, universe and everything?
● I do not feel like studying, what to do?
● Will I get admission in Z?
● How to improve my career?
● Should I invest in noida?
● I have purchased X project, should I sell it now or hold?
● Is it beneficial to buy 2bhk in 30 lacs?
FACTOID
S
Open e
nded.
Not def
inite
![Page 8: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/8.jpg)
![Page 9: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/9.jpg)
● General architecture
question Question Classification and Analysis
Information Retrieval
Answer
ExtractionAnswer
answer
e.g.
What is Calvados?
/Q is /A where:/Q=“(Calvados)”
Query=“Calvados is”
Text retrieva l=“…Calvados is often used in cooking…Calvados is a dry apple brandy made in…
/A is : a dry apple brandy
Answer:
/Q is /A:
“Calvados” is ”a dry apple brandy”
What is the very basic approach to instant question answering?
![Page 10: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/10.jpg)
If it is so simple, why haven't you done it already?
![Page 11: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/11.jpg)
There are challenges in QA !
● Quality of text data.● Language variability (paraphrase)● Knowledge base domain: the answer has to be
supported by the collection, not by the current state of the world.
● How to locate the information given the question keywords.
● It is unlikely that a system will have all necessary resources pre-computed.
● The task requires some deduction or extra linguistic knowledge.
● How does a reasoning system find relevant pieces of information.
![Page 12: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/12.jpg)
Do we have any prior research to tackle these challeneges?
![Page 13: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/13.jpg)
QA research
● Well established over two decades● TREC (Text REtrieval Conference)
● funded by NIST/DARPA since 1992● QA track 1999 – 2007, directed at ‘Factoids’
● CLEF (Cross Language Evaluation Forum)● 2001- current● Information Retrieval, language resources
● NTCIR (NII Test Collection for IR Systems)● 1997 – current● IR, question answering, summarization, extraction
● Our Literature Survey can be accessed at : http://svn.infoedge.com:8080/Common_Engineering_Projects_Trac/wiki/instant_question_answering#LiteratureSurvey
![Page 14: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/14.jpg)
Ok investigation is done.
But how to do it actually?
![Page 15: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/15.jpg)
Knowledge base generation
![Page 16: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/16.jpg)
Knowledge base generation
PHASE 1
![Page 17: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/17.jpg)
Knowledge base generation: Example
PHASE 1
● The fees for Btech course in IIT D is 24000 INR.
● The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.
● Fees, Btech, IIT D, 24000
● What is the fees of Btech course at IIT Delhi?
● How much is the fees for Btech Coure from IIT Delhi?
● How many INR is the fees of btech from iit delhi.
● What ….........
IndexBtech, iit d, fees, 24000, INR
The fees for Btech course in IIT D is 24000 INR.
The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.
![Page 18: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/18.jpg)
Answer Retreival
![Page 19: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/19.jpg)
Answer Retreival : Example
How much will I pay for btech from IIT D?
How much will I <<pay for>>
<<btech>> from <<IIT D>>?
Focus: How MuchObject : Pay
Class: quanitity to pay, fees
Consistency checks
● You should pay 24000 INR for Btech from IIT D.
● The fees for Btech from IITD is 24000 INR.
● 24000 INR should be paid for Btech from IIT D.
Already indexed knowledge base.
Trained once at startup.
Rank and prune best answer based on collective match.
![Page 20: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/20.jpg)
So many boxes !!Let us check out major components in
brief.
![Page 21: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/21.jpg)
A.1. Fact phrase generator from structured listings
● Structured listing to factoid text.● No need to rely only on user generated sentences.● Use basic language model techniques to create
sentences from templates.
<doc>….. <college_name>iit</college_name> <college_id>13213</college_id> <fee>54000 inr annual</fee> <location>delhi</location>…....</doc>
Language Model
Fee of iit delhi is 54000 inr annual.
![Page 22: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/22.jpg)
A.2. Template Generator
● Start with identifying:– Answer Type
– Entities in focus
– Part of Speech tags
● With these tags and language grammar rules, a factoid/ sentence can be converted into all possible question forms. (Question Generation QG task)
Fee of iit delhi is 54000 inr annually. ● What is the fee of iit delhi annually?● What is the fee of iit delhi● How much is the fee of iit delhi?● Is fee of iit delhi 54000 inr?
Answer type: quantity focus: feeentity : iit + delhiPos tags etc.
Fee of <II> <LL> is <$$>.Fees of <II> <LL> is <$$>.Cost of <II> <LL> is <$$>.
![Page 23: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/23.jpg)
B.1. Text Preprocessing● Short-forms
– i’m, im, i m i am– can’t, cant, can t can not
● Spelling correction
● Repeated punctuation (!!!, ???, …)
● Smilies
● Salutations (Hi all, Hiya, etc.)
● Names, signature, course codes
![Page 24: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/24.jpg)
B.2. Entity and POS Tagger
● QER– Names, locations etc.
● Part of Speech Tagger using word sequence patterns
– Sequence (noun, verbs, auxiliaries, modifiers)
● Phrase Chunker● Dependency parsing : validate tag relationships
![Page 25: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/25.jpg)
B.3. Question Analysis● Create features to be used during answer extraction
● Identify keywords to be matched in document sentences
● Identify answer type to match answer candidates. We can create an inventory of questions and expected answer types and so we can train a classifier– Quantity?– Dates?– Definition?
● Select a list of useful patterns from a pattern repository
● Identify question relations which may be used for sentence analysis, etc.
![Page 26: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/26.jpg)
B.4. Query Formulation
● The question needs to be transformed in a query to the document retrieval system
● Each IR system has its own query language so we need to perform this mapping
● Identify useful keywords; use type of answer sought, entities to boost etc.
● Query Creation : Ordered terms, combined terms, weighted terms.
![Page 27: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/27.jpg)
B.5. Answer Candidate Searcher
● Index the <question, qtypes, entities, answer template> in a training corpus
● Retrieve set of n <question, qtypes, entities, answer template> given a new question
● Decide based on the scores of answers returned the best answer to the new question
![Page 28: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/28.jpg)
Pheww.... !
![Page 29: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/29.jpg)
Where do we need Natural Language Processing?
● Tokenisation (words, numbers, punctuation, whitespace)● Sentence detection● Part of speech tagging (verbs, nouns, pronouns, etc.)● Query entity recognition● Chunking/Parsing (noun/verb phrases and relationships)● Statistical modelling tools● Dictionaries, word-lists, WordNet , VerbNet● Template generation using grammar rules.
![Page 30: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/30.jpg)
So you are telling me there are readymade nlp tools?
![Page 31: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/31.jpg)
NLP tools problems● Training data issues
● Training domains are completely different.
● Local english language: slang, spell, localisation
● Sentence detection failures:● Bad style (capitalisation, punctuation)● Ellipsis (i tried... it failed... error message...)
● Tokenisation failures:● Multiple punctuation ???, !!! (student emphasis)● Abbreviations (im, m.b.a, cant, doesnt, etc.)
● POS errors● Spelling, grammar
● We need to experiment, modify codes and train on our domain data !
![Page 32: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/32.jpg)
What are the use cases of instant QA ?
How does it fit in our system?
![Page 33: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/33.jpg)
Interaction● If users are not writing good english then try to minimize their
writings. We can focus on capturing user intent with least amount of typed text.
● This helps not onle user experience but increases the accuracy of language based statistical systems.
✔ Auto complete
✔ Guidance
✔ Spell check
✔ Auto correct
✔ Manual feedback on conflicts
✔ Make them write good queries
![Page 34: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/34.jpg)
Shiksha : main search & cafe search
![Page 35: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/35.jpg)
Shiksha : Integration with main search auto-suggestor
We will already generate good quality questions. Could be intigrated here.
![Page 36: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/36.jpg)
99acres● Similar use cases like shiksha.
● The real estate domain has more open ended opinion question and very less factoid questions.
● If a single text box search is introduced in future– SRP can cater not only listings but also Question
Answers– Instant QA would be really helpful in user experience.
![Page 37: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/37.jpg)
And many more other use cases …...
Plus some components of this system will be utilized separately in improving other existing systems.
![Page 38: Instant Question Answering System](https://reader034.vdocuments.net/reader034/viewer/2022052619/555f3610d8b42a6a118b4f30/html5/thumbnails/38.jpg)
Thank you.