quirk: question answering = information retrieval + knowledge cycorp ibm presenter: stefano bertolo...

Post on 28-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

QUIRK: QUIRK: QUQUestion Answering = estion Answering = IInformation nformation RRetrieval + etrieval + KKnowledgenowledge

Cycorp

IBM

Presenter: Stefano Bertolo (Cycorp)

Project GoalsProject Goals

Break answer-by-retrieval bottleneckDeep (semantic) understanding of

queries and answersIntegration of heterogeneous

sourcesFormalized knowledge to integrate

state-of-the-art IR components with state-of-the-art knowledge bases

Answer by retrievalAnswer by retrieval

Q: Who was the first president of Zambia?

………………………………………… Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..

Answer by reasoningAnswer by reasoning

Q: Who sponsored Kai’s attack against Pamina?

…On February 13, Kai detonated the truck in front of Pamina’s HQ…

…On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank…

… On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…

QUIRK strategyQUIRK strategy

Use Formalized knowledge for:– Semantic understanding of queries;– Justification of answers;

Use Formalized knowledge as:– Format for data normalization– ‘Glue’ for data integration of:

• information extracted from unstructured data• SQL queries against structured DBs• Cyc’s knowledge

Blackboard

Query Manager

Answer Manager

Inference Agent

IR Agent

Cyc KB

GuruQA

(IBM)

DB1

DB2

DB-N

Preemptive annotations

Unstructured

Documents

Q-Eng

A-Eng

Q-CycL

A-CycL

Q-Guru

A-Guru

Query Interpreter GuruQA Assistant

GuruQA (IBM)

Cyc English Generator Cyc Inference EngineAnswer Manager

Query Refiner

Blackboard

Blackboard architectureBlackboard architecture

Add/remove agents without disrupting existing architecture

Test performance/speed with several combinations of agents

Operate asynchronously.

Query InterpreterQuery Interpreter

Q: “Who opposes the WTO?”

(and (isa ?WHO Person)

(thereExists ?EVENT

(and (isa ?EVENT ActOfDissent)

(performedBy ?EVENT ?WHO)

(maleficiary ?EVENT WorldTradeOrganization))))

GuruQA AssistantGuruQA Assistant

CycL query =>

PERSON$ oppose(s/d) the WTO

denounce(s/d) the World Trade Organization

attacke(s/d)

Cyc Inference EngineCyc Inference Engine

CycL Query =>

[(PersonNamedFn “Kai”) JUSTIFICATION-1]

[(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2]

[(PersonNamedFn “Kai”) JUSTIFICATION-N]

Cyc JustificationsCyc Justifications

A?

A from [B and C] (source 6743)

B from source 67430

C from source 78539

Sources for Cyc InferenceSources for Cyc Inference

1.4M+ CycL assertions already in Cyc’s Knowledge Base

Virtual Assertions in DataBases

Unsupervised Textract / CycL annotation of unstructured documents

Data Source IntegrationData Source Integration

Data Normalization

Data Fusion

Data NormalizationData Normalization

Interpretation

Search

cat chat Katze gato gatto “felis felis”

cat OR chat OR Katze OR gato OR gatto OR “felis felis”

Data NormalizationData Normalization

…Zhang Mei Li, was born on January 1, 1927…

Name DOBZhang Mei Li 01-01-1927

… …

(birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))

Data NormalizationData Normalization

language independent representation of- entities- concepts- relationships

CycL contains 100K+ primitives, cancompositionally define infinitely many non-atomic terms.

Data FusionData Fusion

Dr. Chen lives in FresnoZhang Mei Li lives in OaklandKai lives in Los AngelesCalifornia is in the Pacific Time Zone

Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone

HeterogeneousHeterogeneous Sources Sources

Q: How old is Dr. Chen’s mother?

…Zhang Mei Li, mother of Pamina’s Dr. Chen…

Name DOBZhang Mei Li 01-01-1927

… …

Data FusionData Fusion

Requires language independent connections/inferential links among

- Entities- Concepts- Propositions (Facts, Rules)Cyc’s OntologyCyc’s Knowledge Base

Consensus RealityConsensus Reality

Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion

E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”

DBs as `virtual assertions’ storesDBs as `virtual assertions’ stores

(birthDate

(PersonNamedFn “Zhang Mei Li)

?WHEN)

SELECT: DOB

FROM: PERSONAL_DATA

WHERE: NAME = “Zhang Mei Li”

Unsupervised Textract / CycL AnnotationsUnsupervised Textract / CycL Annotations

IBM Textract relations:

[Cycorp, Inc. : located-in : Austin, TX]

mapped to CycL Assertions:

(objectFoundInLocation

Cycorp CityOfAustinTX)

Augmenting Textract AnnotationsAugmenting Textract Annotations

Concept Annotation“Boston” { CityOfBostonMA, BostonTheBand, … }

Word Sense Disambiguation“I went to Boston” CityOfBostonMA

Analysis of nominal compounds“leather jacket”

(SubcollectionOfWithRelationToTypeFn

Jacket mainConstituent Leather)

Unsupervised CycL AnnotationsUnsupervised CycL Annotations

IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ-VERB, VERB-OBJ)

Map dependencies to CycL event structures.

Cyc-to-English generatorCyc-to-English generator

(PersonNamedFn “Dr. Chen”) JUSTIFICATION-N

“Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).

Year 1 TasksYear 1 Tasks

Get entire system to run robustly with integration of all the IBM and Cycorp components described

Improve question understanding and refinement

Broaden coverage of English to CycL mapping enabling annotation of large collection of documents

Year 2 TasksYear 2 Tasks

Add new agents to the blackboard to represent the user and session context

Improve integration of answers obtained from GuruQA and Cyc

Improve integrated IBM and Cycorp modules for unstructured document annotation

top related