quirk: question answering = information retrieval + knowledge cycorp ibm presenter: stefano bertolo...
TRANSCRIPT
QUIRK: QUIRK: QUQUestion Answering = estion Answering = IInformation nformation RRetrieval + etrieval + KKnowledgenowledge
Cycorp
IBM
Presenter: Stefano Bertolo (Cycorp)
Project GoalsProject Goals
Break answer-by-retrieval bottleneckDeep (semantic) understanding of
queries and answersIntegration of heterogeneous
sourcesFormalized knowledge to integrate
state-of-the-art IR components with state-of-the-art knowledge bases
Answer by retrievalAnswer by retrieval
Q: Who was the first president of Zambia?
………………………………………… Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..
Answer by reasoningAnswer by reasoning
Q: Who sponsored Kai’s attack against Pamina?
…On February 13, Kai detonated the truck in front of Pamina’s HQ…
…On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank…
… On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…
QUIRK strategyQUIRK strategy
Use Formalized knowledge for:– Semantic understanding of queries;– Justification of answers;
Use Formalized knowledge as:– Format for data normalization– ‘Glue’ for data integration of:
• information extracted from unstructured data• SQL queries against structured DBs• Cyc’s knowledge
Blackboard
Query Manager
Answer Manager
Inference Agent
IR Agent
Cyc KB
GuruQA
(IBM)
DB1
DB2
DB-N
Preemptive annotations
Unstructured
Documents
Q-Eng
A-Eng
Q-CycL
A-CycL
Q-Guru
A-Guru
Query Interpreter GuruQA Assistant
GuruQA (IBM)
Cyc English Generator Cyc Inference EngineAnswer Manager
Query Refiner
Blackboard
Blackboard architectureBlackboard architecture
Add/remove agents without disrupting existing architecture
Test performance/speed with several combinations of agents
Operate asynchronously.
Query InterpreterQuery Interpreter
Q: “Who opposes the WTO?”
(and (isa ?WHO Person)
(thereExists ?EVENT
(and (isa ?EVENT ActOfDissent)
(performedBy ?EVENT ?WHO)
(maleficiary ?EVENT WorldTradeOrganization))))
GuruQA AssistantGuruQA Assistant
CycL query =>
PERSON$ oppose(s/d) the WTO
denounce(s/d) the World Trade Organization
attacke(s/d)
…
Cyc Inference EngineCyc Inference Engine
CycL Query =>
[(PersonNamedFn “Kai”) JUSTIFICATION-1]
[(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2]
…
[(PersonNamedFn “Kai”) JUSTIFICATION-N]
…
Cyc JustificationsCyc Justifications
A?
A from [B and C] (source 6743)
B from source 67430
C from source 78539
Sources for Cyc InferenceSources for Cyc Inference
1.4M+ CycL assertions already in Cyc’s Knowledge Base
Virtual Assertions in DataBases
Unsupervised Textract / CycL annotation of unstructured documents
Data Source IntegrationData Source Integration
Data Normalization
Data Fusion
Data NormalizationData Normalization
Interpretation
Search
cat chat Katze gato gatto “felis felis”
cat OR chat OR Katze OR gato OR gatto OR “felis felis”
Data NormalizationData Normalization
…Zhang Mei Li, was born on January 1, 1927…
Name DOBZhang Mei Li 01-01-1927
… …
(birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))
Data NormalizationData Normalization
language independent representation of- entities- concepts- relationships
CycL contains 100K+ primitives, cancompositionally define infinitely many non-atomic terms.
Data FusionData Fusion
Dr. Chen lives in FresnoZhang Mei Li lives in OaklandKai lives in Los AngelesCalifornia is in the Pacific Time Zone
Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone
HeterogeneousHeterogeneous Sources Sources
Q: How old is Dr. Chen’s mother?
…Zhang Mei Li, mother of Pamina’s Dr. Chen…
Name DOBZhang Mei Li 01-01-1927
… …
Data FusionData Fusion
Requires language independent connections/inferential links among
- Entities- Concepts- Propositions (Facts, Rules)Cyc’s OntologyCyc’s Knowledge Base
Consensus RealityConsensus Reality
Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion
E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”
DBs as `virtual assertions’ storesDBs as `virtual assertions’ stores
(birthDate
(PersonNamedFn “Zhang Mei Li)
?WHEN)
SELECT: DOB
FROM: PERSONAL_DATA
WHERE: NAME = “Zhang Mei Li”
Unsupervised Textract / CycL AnnotationsUnsupervised Textract / CycL Annotations
IBM Textract relations:
[Cycorp, Inc. : located-in : Austin, TX]
mapped to CycL Assertions:
(objectFoundInLocation
Cycorp CityOfAustinTX)
Augmenting Textract AnnotationsAugmenting Textract Annotations
Concept Annotation“Boston” { CityOfBostonMA, BostonTheBand, … }
Word Sense Disambiguation“I went to Boston” CityOfBostonMA
Analysis of nominal compounds“leather jacket”
(SubcollectionOfWithRelationToTypeFn
Jacket mainConstituent Leather)
Unsupervised CycL AnnotationsUnsupervised CycL Annotations
IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ-VERB, VERB-OBJ)
Map dependencies to CycL event structures.
Cyc-to-English generatorCyc-to-English generator
(PersonNamedFn “Dr. Chen”) JUSTIFICATION-N
“Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).
Year 1 TasksYear 1 Tasks
Get entire system to run robustly with integration of all the IBM and Cycorp components described
Improve question understanding and refinement
Broaden coverage of English to CycL mapping enabling annotation of large collection of documents
Year 2 TasksYear 2 Tasks
Add new agents to the blackboard to represent the user and session context
Improve integration of answers obtained from GuruQA and Cyc
Improve integrated IBM and Cycorp modules for unstructured document annotation