extracting rich knowledge from text john d. prange president 410-964-0179...

Extracting Rich Knowledge from Text

John D. PrangePresident

410-964-0179

[email protected]

www.languagecomputer.com

mailto:[email protected]

http://www.languagecomputer.com/

Our Company

Language Computer Corporation (LCC)– Human Language Understanding Research and Development– Founded 11 years ago in Dallas, Texas; Established a second

office in Columbia, MD in mid-2006– ~70 research scientists and engineers– Research funding primarily from DTO, NSF, AFRL, DARPA

and several individual Government Agencies– Technology has been transferred to individual Government

Organizations, Defense contractors and more recently to Commercial Customers

Outline of Talk

Three Lines of Research & Development within LCC that impact Semantic-Level Understanding

– Information ExtractionInformation Extraction CiceroLite and other Cicero Products

– Extracting Rich Knowledge from TextExtracting Rich Knowledge from Text Polaris: Semantic Parser XWN KB: Extended WordNet Knowledge Base Jaquar: Knowledge Extraction from Text Context and Events: Detection, Recognition & Extraction

– Cogex: Reasoning and Inferencing over Cogex: Reasoning and Inferencing over Extracted KnowledgeExtracted Knowledge Semantic Parsing & Logical Forms Lexical Chains & On-Demand Axioms Logic Prover

Information ExtractionInformation Extraction– Given an entire corpus of documents

– Extracting every instance of some particular kind of information Named Entity Recognition – extraction of entities such as person, location and

organization names Event-based Extraction – extraction of real world events such as bombings,

deaths, court cases, etc.

LCC’s Areas of Research

CiceroLite & Cicero-ML: Named Entity Recognition Systems

Two High-Performance NER Systems

Accurate and customizable NE Recognition for English

Classifies 8 high-frequency NE classes with over 90% precision and recall

Currently extended to detect over 150 different NE classes

Non-deterministic Finite-State Automata (FSA) framework resolves ambiguities in text, performs precise classification

Machine Learning-based NER for multiple languages

Statistical machine learning- based framework makes for rapid extension to new languages

Currently deployed for Arabic, German, English, and Spanish

Arabic: Classifies 18 NE classes with an average of nearly 90% F

CiceroLite CiceroLite-ML

CiceroLite Designed specifically for English, CiceroLite categorizes 8 high-

frequency NE classes with over 90% precision and recall.

But it’s capable of much much more: as currently deployed, CiceroLite can categorize up to 150 different NE classes, including:

Over 100 more!

CiceroLite-ML (Arabic)

CiceroLite-ML currently detects a total 18 different classes of named entities for Arabic with between 80% - 90% F.

Other Cicero Products

CiceroLite-ML (Mandarin Chinese)Similar scope and depth of Arabic Version shown on previous slide

CiceroCustom User customizable event extraction system using a variant of supervised learning called “active learning”

TASER (Temporal & Spatial Normalization System)Recognize 8 different types of time expressions and over 50 types of spatial expressions; Normalizies time using ISO8601; Exact Lat/Long for ~8M place names

Under Contractual Development (With Deliveries in 2007)

– CiceroRelationRelation Detection based upon ACE 2007 specifications

– CiceroCorefEntity coreference utilizing CiceroLite NER; to include cross document entity tracking

– CiceroDiscourseExtract discourse structure & topic semantics

Extracting Rich Knowledge From TextExtracting Rich Knowledge From Text– Explicit knowledge

– Implicit knowledge: implicatures, humor, sarcasm, deceptions, etc.

– Other textual phenomena: negation, modality, quantification, coreference resolution

Lexical Level & Syntax

Semantic Relations

Contexts

Events & Event Properties

Meta-Events

Event Relations


Skip Back

Extracting Rich Knowledge from Text

Innovations

– A rich and flexibility representation of textual semantics

– Extract concepts and semantic relations between concepts, rich event structures

– Extract event properties; extend events using event relations

– Handle textual phenomena such as negation and modality

– Mark implicit knowledge and capture meaning suggested by it whenever possible

Four-Layered Representation

Syntax Representation– Syntactically link words in sentences; Apply Word Sense Disambiguation

(WSD)

Semantic Relations– Provide deeper semantic understanding of relations between words

Context Representation– Place boundaries around knowledge that is not universal

Event Representation– Detect events, extract their properties, extend using event relations

Hierarchical Representation

LexicalLevel &Syntax

Gilda_Flores_NN(x1) & _human_NE(x1) & _s_POS(x1,x2) & kidnapping_NN(x2) & occur_VB(e1,x2,x3) & on_IN(e1,x4) &

_date_NE(x4) & time_TMP(BeginFn(x4),1990,1,13,0,0,0) & time_TMP(EndFn(x4),1990,1,13,23,59,59)

he_PRP(x1) & fire_VB(e3,x1,x5) & kidnapper_NN(x5) & _date_NE(x6) &

time_TMP(BeginFn(x6),1990,1,6,0,0,0) & time_TMP(EndFn(x6),1990,1,6,23,59,59)

Gilda Flores’s kidnapping occurred on January 13, 1990.

A week before, he had fired the kidnappers.

InputText

THM_SR(x1,x2) & AGT_SR(x2,e1) & TMP_SR(x4,e1)

AGT_SR(x1,e3) & THM_SR(x5,e3) & TMP_SR(x6,e3)

SemanticRelations

during_TMP(e1,x4) during_TMP(e3,x6)Contexts

event(e2,x2) & THM_EV(x1,e2) & TMP_EV(x4,e2)

event(e4,e3) & AGT_EV(x5,e2) & AGT_EV(x1,e4) & THM_EV(x5,e4)

& TMP_SR(x6,e4)

Events &EventProperties

CAUSE_EV(e4,e2), earlier_TMP(e4,e2)Event Relations

REVENGEMeta-Events

Polaris: Semantic Parser

Polaris Semantic Relations

# Semantic Relation Abbr

1 POSSESSION POS

2 KINSHIP KIN

3 PROPERTY-ATTRIBUTE HOLDER PAH

4 AGENT AGT

5 TEMPORAL TMP

6 DEPICTION DPC7 PART-WHOLE PW

8 HYPONYMY ISA

9 ENTAIL ENT

10 CAUSE CAU

11 MAKE-PRODUCE MAK

12 INSTRUMENT INS

13 LOCATION-SPACE LOC

14 PURPOSE PRP

15 SOURCE-FROM SRC

16 TOPIC TPC17 MANNER MNR

18 MEANS MNS

19 ACCOMPANIMENT-COMPANION ACC

20 EXPERIENCER EXP

# Semantic Relation Abbr

21 RECIPIENT REC

22 FREQUENCY FRQ

23 INFLUENCE IFL

24 ASSOCIATED-WITH / OTHER OTH

25 MEASURE MEA

26 SYNONYMY-NAME SYN

27 ANTONYMY ANT

28 PROBABILITY-OF-EXISTENCE PRB

29 POSSIBILITY PSB

30 CERTAINTY CRT

31 THEME-PATIENT THM

32 RESULT RSL

33 STIMULUS STI

34 EXTENT EXT

35 PREDICATE PRD

36 BELIEF BLF

37 GOAL GOL

38 MEANING MNG

39 JUSTIFICATION JST

40 EXPLANATION EXN

Propbank vs. Polaris Relations

Question Propbank Relations Polaris Relations

Who? AGENT, PATIENT, RECIPROCAL, BENEFICIARY

AGENT, EXPERIENCER, THEME, POSSESSION, RECIPIENT, KINSHIP, ACCOMPANIMENT-COMPANION, MAKE-PRODUCE, SYNONYMY, BELIEF

What? AGENT, THEME, TOPIC

AGENT, THEME, TOPIC, POSSESSION, STIMULUS, MAKE-PRODUCE, HYPONYMY, RESULT, BELIEF, PART-WHOLE …

Where? LOCATION, DIRECTION

LOCATION, SOURCE-FROM, PART-WHOLE

When? TEMPORAL, CONDITION

TEMPORAL, FREQUENCY

Why? PURPOSE, CAUSE, PURPOSE-NOT-CAUSE

PURPOSE, CAUSE, INFLUENCE, JUSTIFICATION, GOAL, RESULT, MEANING, EXPLANATION, …

How? MANNER, INSTRUMENT

MANNER, INSTRUMENT, MEANS, …

How much?

EXTENT, DEGREE EXTENT, MEASURE

Possible? CONDITIONAL (?) POSSIBILITY, CERTAINTY, PROBABILITY

Example: Polaris on Treebank

We're talking about years ago before anyone heard of asbestos having any questionable properties.

Treebank Relations Polaris Relations

TMP(talking, years ago before anyone heard of asbestos having any questionable properties)

AGT(talking, We)

TPC(talking, about years ago before anyone heard of asbestos having any questionable properties)

EXP(heard, anyone)

STI(heard, of asbestos having any questionable properties)

AGT(having, asbestos)

THM(having, any questionable properties)

PW(asbestos, any questionable properties)

PAH(properties, questionable)

Propbank Relations

AGT(hear, anyone)

THM(hear, asbestos having any questionable properties)

AGT(talking, we)

THM(talking, years ago before anyone heard of asbestos having any questionable properties)

Hand tagged… Automatically generated! (from Treebank tree)

XWN KB: Extended WordNet Knowledge Base

XWN Knowledge Base (1/2)

WordNet® - free from Princeton University– A large lexical database of English, developed by Professor George

Miller, Princeton Univ; now under the direction of Christiane Fellbaum. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.

eXtended WordNet - free from UTD– Glosses: parsed; word sense disambiguated; transformed into logic

forms

XWN Knowledge Base - done at LCC– Glosses: converted into semantic relations (using Polaris semantic

parser)– Represented in a Knowledge Base

Reasoning tool Axiom generator Lexical chain facilitator

XWNKnowledge

Base

XWN Knowledge Base (2/2)

Summary: The rich definitional glosses from WordNet are processed through LCC’s Knowledge Acquisition System (Jaguar) to produce a semantically rich upper ontology

The Clusters: Noun glosses are transformed into sets of semantic relations, which are then arranged into individual semantic units called clusters, with one cluster per gloss

The Hierarchy: The clusters (representing one noun synset each) are arranged in a hierarchy similar to that of WordNet

The Knowledge Base: The generated KB has not only the hierarchy of WordNet, but also a rich semantic representation of each entry in the hierarchy (based on the definitional gloss)

Example: WordNet Gloss

Tennis is a game played with rackets by two or four players who hit a ball back and forth over a net that divides the court

ISA (Tennis, game)

AGT (two or four players, play)

THM (game, play)

INS (rackets, play)

MEA (two or four, players)

AGT (two or four players, hit)

THM (a ball, hit)

MNR (back and forth, hit)

LOC (over a net that divides the court, hit)

AGT (a net, divides)

THM (the court, divides)

Semantic Cluster of a WordNet Gloss

tennisISA game

playerMEA two or four

play

AGT player

THM

INS

game

racket

hitAGT player

THM

MNR

ball

back and forth

LOC over a net

divide

AGT net

THM court

Synset ID: 00457626 Name: tennis, lawn_tennis

Hierarchy (as in WordNet)

tennis

basketball

squash

court game

athletic game

outdoor game

golf croquet

Jaguar: Knowledge Extraction From Text

Jaguar: Knowledge Extraction

Automatically generate ontologies and structured knowledge bases from text

– Ontologies form the framework or “skeleton” of the knowledge base

– Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base

train

passenger trainfreight train

IS-A IS-A

transport

AGENT

THEME

MEANS

X Corp.

products

freight train

Ontology

Semantic relations

…

Automatically generate ontologies and structured knowledge bases from text

– Ontologies form the framework or “skeleton” of the knowledge base

– Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base

IS-A IS-A

carry

AGENT

conduct

THEME

board

AGENTTHEME

board

MEANS

ship

transport

MEA

NS AGENT

arrive

run

stop

Joined train

passenger trainfreight train

Jaguar : Knowledge Extraction

Automatically Building the Ontology

Jaguar builds an ontology using the following steps

1. Seed words selected either manually or automatically

2. Find sentences in the input documents that contain seed words

3. Parse those sentences and extract semantic relations; focusing on selected relations such as IS-A; Part-Whole; Kinship; Locative; Temporal

4. Integrate the selected semantic relations into the ontology being produced

5. Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology

6. If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations.

7. Finally, use the hyponymy information found in Extended WordNet to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processes

Result: Jaguar Knowledge Base

anthrax

biological weapon

Context & Events: Detection, Classification & Extraction

Types of Context

Temporal

– It rained on July 7th

Spatial

– It rained in Dallas

Report

– John said “It rains”

Belief

– John thinks that it rains

Volitional

– John wants it to rain

Planning

– It is scheduled to rain

Conditional

– If it’s cloudy, it will rain

Possibility

– It might rain

Events in Text

Basic Definition: – X is an Event, if X is a possible answer to the question:

What happened?

Applying Definition to Verbs and Nouns– Verb V is an Event if the sentence:

Someone/something V-ed (someone/something)

is an answer to the question “What happened”?

– Noun N is an Event if the sentence: There was/were (a/an) N

is an answer to the question “What happened”?

Events in Text

Most Adjectives are not potential Events

– Verbal 'adjectives' are treated as verbs. eg. 'lost', 'admired'

Factatives ('Light' Verbs) are not separate events

– Suffer-a Loss; Take-a Test; Perform-an Operation

Aspectual Markers Can Combine with a Wide Range of Events

– e.g., Stop, Completion, Start, Continue, Fail, Succeed, Try

Modalities are not separate events

– Possibility, Necessity, Prescription, Suggestion, Optative

Event Detection

Approach for Event Detection

– Annotate WordNet synsets that are Event concepts Annotation completed for Noun and Verb hierarchies

– Detect events by lexical lookup for concepts in annotated WordNet

Project Status

– Prototype implemented for Event detection

– Run Benchmarks Precision: 93%, Recall: 79%

– Currently Tuning Performance

Event Extraction – Future

Event Structures for Modelling Discourse– Aspect (Start, Complete, Continue, Succeed, Fail, Try)

– Modality (Possibility, Necessity, Optativity)

– Event Participants (Actors, Undergoers, Instruments)

– Context (Spatial, Temporal, Intensional)

Event Relations (Causation, Partonomy, Similarity, Contrast)– Event Taxonomy/Classification

– Event Composition

Cogex: Reasoning & Inferencing Over Cogex: Reasoning & Inferencing Over

Extracted KnowledgeExtracted Knowledge


Answer /

Entailment

NL Justification

World K

Axioms

Linguistic

Axioms

Q/T

A/H

Q/T LF

A/H LF

Axioms

Lex Chains

Axiom

Building

Temporal

Axioms

Logic

Forms

XWN

KBase

Semantic

Calculus

ContextSemantic

Parser

Relaxation

Logic

Prover

Answer

Or

Entailment

Ranking

Reasoning & Inferences: Example Tasks that Require Both

TREC Question Answering Track

TREC Question Answering Track held annual since its inception in TREC-9 (1999)

Main Task TREC-2006 QA Track

– AQUAINT Corpus of English News Text http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 Newswire text data in English, drawn from three sources:

Xinhua News Service (People's Republic of China), New York Times News Service Associated Press Worldstream News Service.

Roughly 3 GBytes of Text; Million+ documents

– Test Set: 75 Sets of Questions organized around a common target; where the target is a Person, Organization, Event or Thing

– Each Series of Question contains 6-9 questions; 4-7 Factoids, 1-2 List, and 1 Other

– Total: 403 Factoid Questions; 89 List Questions; 75 Other Questions

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31

TREC-2006 Question Answering Track

145. Target Event John Williams convicted of Murder

145.1 Factoid How many non-white members of the jury were there?

145.2 Factoid Who was the foreman for the jury

145.3 Factoid Where was the Trial held?

145.4 Factoid When was King convicted?

145.5 Factoid Who was the victim of the murder

145.6 List What defense and prosecution attorneys participated in the trial?

145.7 Other

Textual Entailment Textual Entailment

– Textual Entailment Recognition is a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and (multi) document summarization.

– Task definition: T entails H, denoted by T → H, if the meaning of H can be inferred from the meaning of T

PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) RTE (Recognizing Textual Entailment) Challenge – RTE-1 (2004-05); RTE-2 (2005-06) and RTE-3 (2006-07)

– http://www.pascal-network.org/Challenges/RTE/

The Question Answering Task can be interpreted as a Textual Entailment task as follows:– Given a Question Q and a possible Answer Text Passage A, the QA

task is then one of applying semantic inference to the pair (Q, A) to infer whether or not A contains the Answer to Q.

http://www.pascal-network.org/Challenges/RTE/

RTE-2: Example TH Pairs

Entailment?: “Yes” T: Tibone estimated diamond production at four mines operated by Debswana –

Botswana’s 50-50 joint venture with DeBeers – could reach 33 million carats this year.

H: Botswana is a business partner of DeBeers.

Entailment?: “Yes”T: The EZLN differs from most revolutionary groups by having stopped military action

after the initial uprising in the first two weeks of 1994.

H: EZLN is a revolutionary group.

Entailment?: “No”T: Two persons were injured in dynamite attacks perpetrated this evening against two

bank branches in this Northwestern Colombian city.

H: Two persons perpetrated dynamite attacks in a Northwestern Colombian city.

Entailment?: “No”T: Such a margin of victory would give Abbas a clear mandate to renew peace talks with

Israel, rein in militants and reform the corruption-riddled Palestinian Authority.

H: The new Palestinian president combated corruption and revived the Palestinian economy.

Cogex: Logic Prover

Semantically Enhanced COGEX

Answer /

Entailment

NL Justification

Q /T

A / H

Q/T LF

A/H LF

Axioms

Lex Chains

Axiom

Building

Temporal

Axioms

Logic

Forms

XWN

KBase

Semantic

Calculus

ContextSemantic

Parser

Relaxation

Logic

Prover

Answer

Or

Entailment

Ranking

Linguistic

Axioms

World K

Axioms

Output of Semantic ParserQuestion: What is the Muslim Brotherhood's goal?

The output of the semantic parser:

PURPOSE(x, Muslim Brotherhood)

Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992.

The output of the semantic parser:

AGENT(Muslim Brotherhood, advocate)

PURPOSE(turning Egypt into a strict Muslim state, advocate)

TEMPORAL(1928, establish)

TEMPORAL(1992, took up arms)

PROPERTY(strict, Muslim state)

MEANS(political means, turning Egypt into a strict Muslim state)

SYNONYMY(Muslim Brotherhood, Egypt's biggest fundamentalist group)

Semantic

Parser

Generation of Logical FormsQuestion: What is the Muslim Brotherhood's goal?

Question Logical Form (QLF): (exists x0 x1 x2 x3 (Muslim_NN(x0) & Brotherhood_NN(x1) & nn_NNC(x2,x0,x1) & PURPOSE_SR(x3,x2))).

Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992.

Answer Logical Form (AFL): (exists e1 e2 e3 e4 e5 e6 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 (Muslim_NN(x1) & Brotherhood_NN(x2) & nn_NNC(x3,x1,x2) & Egypt_NN(x4) & _s_POS(x5,x4) & biggest_JJ(x5) & fundamentalist_JJ(x5) & group_NN(x5) & SYNONYMY_SR(x3,x5) & establish_VB(e1,x20,x5) & in_IN(e1,x6) & 1928_CD(x6) & TEMPORAL_SR(x6,e1) & advocate_VB(e2,x5,x21) & AGENT_SR(x5,e2) & PURPOSE_SR(e3,e2) & turn_VB(e3,x5,x7) & Egypt_NN(x7) & into_IN(e3,x8) & strict_JJ(x15,x14) & Muslim_NN(x8) & state_NN(x13) & nn_NNC(x14,x8,x13) & PROPERTY_SR(x15,x14) & by_IN(e3,x9) & political_JJ(x9) & means_NN(x9) & MEANS_SR(x9,e3) & set_VB(e5,x5,x5) & itself_PRP(x5) & apart_RB(e5) & from_IN(e5,x10) & militant_JJ(x10) & group_NN(x10) & take_VB(e6,x10,x12) & up_IN(e6,x11) & arms_NN(x11) & in_IN(e6,x12) & 1992_CD(x12) & TEMPORAL_SR(x12,e6)).

ALFLogic

Forms

QLF

Lexical Chains & Axioms: On Demand Input into Cogex

Lexical Chains from XWN

Lexical chains– Lexical Chains establish connections between semantically

related concepts, i.e. WordNet synsets. (note concepts, not words which means Word Sense

Disambiguation is necessary) – Concepts and relations along the lexical chain explain the

semantic connectivity of the end concepts – Lexical chains start by using WordNet relations (ISA, Part-

Whole) and gloss co-occurrence (weak relation)– XWN Knowledge Base then adds more meaningful (precise)

relations “Tennis a game played with rackets by two or four

players…” Prior to XWN-KB: ‘tennis’ ‘two or four’ (gloss co-

occurrence) With XWN-KB: ‘tennis’ ‘game’ ‘play’ ‘player’ ‘two

or four’

ISA AGTTHM MEA

Lexical

Chains

XWNKnowledge

Base

Examples of Lexical Chains

Question: How were biological agents acquired by bin Laden?

Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in

Lexical Chain: ( V - buy#1, purchase#1 ) – HYPERNYM (V - get#1, acquire#1 )

Question: How did Adolf Hitler die?

Answer: … Adolf Hitler committed suicide …

Lexical Chain: ( N - suicide#1, self-destruction#1, self-annihilation#1 ) – GLOSS ( V - kill#1 ) – GLOSS ( V - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 )

Propagating syntactic structures along the chain

The goal is to filter out unacceptable chains, and to improve the ranking of chains when multiple chains can be established

Example 1: AGENTQ: Who did Floyd Patterson beat to win the title?

PATIENTWA: He saw Ingemar Johanson knock down Floyd Patterson seven times there in

winning the title.

V - beat#2 – entail V - hit#4 – derivation N - hitting#1,striking#2 – derivation V - strike#2 – hyponym V - knock-down#2

Example 2: AGENT THEME MEASURES1: John bought a cowboy hat for $50.

AGENT MEASURE THEMES2: John paid $50 for a cowboy hat.

V - buy#1 – entail V - pay#1

Axioms on Demand (1/3)

Extract world knowledge, in the form of axioms, from text or other resources automatically and “on demand”– When the logic prover runs out of rules to use, it can request

one from external knowledge sources

Will ask for a rule connecting two concepts

– Generate axioms on the fly from multiple knowledge sources

WordNet and eXtended WordNet: glosses and lexical chains

Instantiation of NLP rules

Open text from a trusted source (dictionary, encyclopedia, textbook on a relevant topic, etc.)

An automatically-built knowledge base

Axioms on Demand (2/3) eXtended WordNet axiom generator

– Question: What all can a ‘player’ do? Look at all contexts with ‘player’ as AGT Gloss of ‘tennis’: a ‘player’ can ‘hit’ (a ball), ‘play’ (a game) Gloss of ‘squash’: A ‘player’ can ‘strike’ (a ball), etc

– Connect related-concepts kidnap_VB(e1,x1,x2) -> kidnapper_NN(x1) (asian_JJ(x1,x2) asia_NN(x1) & _continent_NE(x1))

World Knowledge axioms– WordNet glosses

– jungle_cat_NN(x1) -> small_JJ(x2,x1) & Asiatic_JJ(x3,x1) & wildcat_NN(x1)

NLP axioms– Linguistic rewriting rules

– Gilda_NN(x1) & Flores_NN(x2) & nn_NNC(x3,x1,x2) -> Flores_NN(x3)

XWN

World KAxioms

LinguisticAxioms

Axioms

Axioms on Demand (3/3)

Semantic Relation Calculus– Combine two or more local semantic relations to establish

broader semantic relations

– Increase the semantic connectivity

– Mike is a rich man → Mike is rich ISA_SR(Mike,man) & PAH_SR(man,rich) →PAH_SR(Mike,rich)

– John lives in Dallas, Texas John lives in Texas. LOC(John,Dallas) & PW(Dallas,Texas) -> LOC(John, Texas)

Temporal Axioms– Time Transitivity of Events

during_CTMP(e1,e2) & during_CTMP(e2,e3) during_CTMP(e1,e3)

– Dates entail more general times October 2000 → year 2000

SemanticCalculus

Temporal Axioms

Axioms

Contextual Knowledge Axioms

Examples If someone boards a plane and the flight takes 3 hours,

then that person travels for 3 hours

The person leaves at the same time and arrives at the same time with the traveling plane

If the departure of a vehicle has a destination and the vehicle arrives at the destination then the arrival is located at the destination

If something is exactly located somewhere, then nothing else is exactly located in the same place

If a Process is located in an area, then all sub Processes of the Process are located in the same area

ContextualKnowledge

Axioms

Axioms

Logic Prover: The Heart of Cogex

Logic Prover (1/2)

A first order logic resolution style theorem prover

Inference rule sets are based on hyperresolution and paramodulation

Transform the two text fragments into 4-layered logic forms based upon LCC’s Syntactic, Semantic, Contextual and Event Processing and Analysis

Automatically create “Axioms on Demand” to be used during the proof– Lexical Chains axioms

– World Knowledge axioms

– Linguistic transformation axioms

– Contextual / Temporal axioms

Logic Prover (2/2)

Load COGEX’s SOS (Set of Support) with Candidate Answer Passage(s) A and Question Q and its USABLE list of clauses with the generated axioms, semantic and temporal axioms

Search for a proof by iteratively removing clauses from SOS and searching the USABLE for possible inferences until a refutation is found– If no contradiction is detected

Relax arguments Drop entire predicates from H

Compute “Proof Score” for each Candidate

Select best Result & Generate NL Justification

Reasoning & Inference: How Well Does LCC Do?

Evaluations: QA (TREC-06) LCC’s PowerAnswer Question Answering (QA) system finished 1st on Factoid

Questions and Overall Combined Score. A second LCC QA system, Chaucer, finished 2nd in both categories in the TREC QA 2006 evaluation.

An LCC QA system has finished 1st every year that the TREC QA Evaluation has been conducted (Annually since TREC-8 in 1999)

Mean:

18.5%

Mean:

18.5%

Mea

n

Tea

m 7

Tea

m 6

Tea

m 5

Tea

m 4

Tea

m 2

Tea

m 1

Ch

auce

r

Pow

erA

nsw

er

Tea

m 3

0

10

20

30

40

50

60

70

Fac

toid

Acc

ura

cy

Top Score:

57.8%

Top Score:

57.8%

Evaluations: PASCAL RTE-2 LCC’s Groundhog system finished 1st overall at the Second

PASCAL Recognizing Textual Entailment Challenge (RTE-2) and LCC’s COGEX system finished 2nd. (http://www.pascal-network.org/Challenges/RTE/ )

Gro

undh

og

Cog

ex

Tea

m 1

Tea

m 2

Tea

m 3

Tea

m 4

Tea

m 5

Tea

m 6

Tea

m 7

Tea

m 8

Tea

m 9

Tea

m 1

0

Tea

m 1

150%

55%

60%

65%

70%

75%

80%

Acc

urac

y

Mean:

57.5%

Mean:

57.5%

Best: 75.4%Best: 75.4%

http://www.pascal-network.org/Challenges/RTE/

Contact Information

Home Office1701 N. Collins Boulevard

Suite 2000

Richardson, TX 75080

972-231-0052 (Voice)

972-231-0012 (Fax)

Maryland Office6179 Campfire

Columbia, MD 21045

410-715-0777 (Voice)

410-715-0774 (Fax)

443-878-8894 (Cell)

June Sunrise over Kirkwall Bay in the Orkney Islands of Scotland

Your QuestionsYour Questions& Comments& Comments

extracting rich knowledge from text john d. prange president 410-964-0179...

Documents

arabic cicerolite

knowledge extraction

cicerolite cicero

different classes

cicero products cicerolite

recognition extraction

lccs areas of research

customizable ne recognition