ontology-guided information extraction from unstructured textsemantix/sushain_thesis... ·...

80
Iowa State University Department of Computer Science Copyright © Sushain Pandit, 2010. Sushain Pandit Department of Computer Science Iowa State University Ames, Iowa, USA M.S. Thesis Defense April 29, 2010 Ontology-guided Extraction of Structured Information from Unstructured Text: Identifying and Capturing Complex Relationships 1/63

Upload: others

Post on 24-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Sushain Pandit

Department of Computer Science

Iowa State University

Ames, Iowa, USA

M.S. Thesis Defense

April 29, 2010

Ontology-guided Extraction of Structured Information from

Unstructured Text:

Identifying and Capturing Complex Relationships

1/63

Page 2: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Introduction

Motivation

Contributions

Problem

Related Concepts

Definition

Approach

Composite Extraction Framework

Semantic Validation Framework

Representation Framework

Evaluation

SEMANTIXS Architecture

Experimental Results and Analysis

Conclusion

Summary

Further Work

Copyright © Sushain Pandit, 2010.

Outline

2/63

Page 3: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Introduction

Motivation

Contributions

Copyright © Sushain Pandit, 2010.

Outline

3/63

Page 4: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Information Extraction from Text

Process of extracting interesting information from unstructured text

Entities – Persons, Organizations, Locations, etc

Attributes – Name, Descriptors, Categories, etc

Events – Company established in 2010

Relationships – Person works for Organization

Co-references – IBM and International Business Machines

….

Our Focus – A subset of complex nested relationships

4/63

Page 5: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Motivating Example - A Moderately Complex Sentence

Sports News predicts that Sachin Tendulkar may score a double-hundred with high probability

and retire in 2015

5/63

Page 6: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Motivating Example – Usual Information Extraction Scene

Sports News predicts that Sachin Tendulkar may score a double-hundred with high probability

and retire in 2015

EntityEntities

Event

Relationship

Attribute

6/63

Page 7: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Motivating Example – Nested Relationships

Sports

News

predicts

that

Sachin Tendulkar may score a

double-hundred

Clause-level DependencyA Qualifying Modifier

Outer Relationship

Dependency

with high

probability

Inner Clause subject to the

Qualifying Modifier

andretire in

2015

Conjunction creating

dependencies between parts of the

sentence

Left part governing the meaning of

right part

7/63

Page 8: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Motivating Example – Domain Description

Sports

News

predicts

that

Sachin Tendulkar may score a

double-hundred

with high

probabilityand

retire in

2015

Internationally recognized

Sportsperson

Or

Someone else by the same

name?

8/63

Page 9: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Motivating Example – Domain Description

Sports

News

predicts

that

Sachin Tendulkar may score a

double-hundred

with high

probabilityand

retire in

2015

Domain Description in the form of a domain ontology

Sachin_Tendulkar type SportsPerson

Internationally recognized

Sportsperson

9/63

Page 10: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Motivating Example – Representation

Sports

News

predicts

that

Sachin_Tendulkar

high

probability

retire

scores

double-hundred

predicts

that

2015

Sachin_Tendulkar

Some Semantic Graph

Formalism that can Capture

the Structured Information

10/63

Page 11: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Existing Approaches for Information Extraction

Rule-based Approaches

Laborious but transparent in capturing complex semantic criteria

Best performing systems invariably use hand-crafted rules

Often rely on domain-specific trigger words

Automatic pattern induction (statistical methods)

Co-occurrence – Require lot of labeled text corpora

Cluster Analysis – Require computational cost

Comprehensive surveys

N. Bach and S. Badaskar, 2007; G. Neumann and F. Xu, 2004

Recall: Our Focus – A subset of complex nested relationships

Our Approach – Domain independent rule formulation

11/63

Page 12: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Contributions

Contributions

A modular ontology-based approach for extraction of a subset of nested-

complex relationships that decouples domain-specific knowledge from

the rules used for information extraction

A framework to semantically represent the extracted relationships in the

form of query-able RDF graphs

Provide open-source implementation of SEMANTIXS, a system for

ontology-guided extraction of structured information from text

Report results of some experiments to validate the proposed approach

12/63

Page 13: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Problem

Related Concepts

Definition

Copyright © Sushain Pandit, 2010.

Outline

13/63

Page 14: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Related Concepts

Parse Trees

Ordered and rooted trees representing the syntactic structure

Penn Treebank1 notation for tagging the sentence

S: Simple declarative clause

NP: Categorizes all constituents depending on a head noun.

VP: Categorizes all constituents headed a verb.

1 Refer - ftp://ftp.cis.upenn.edu/pub/treebank/doc/manual/notation.tex for a complete list

Example

S

NP VP

Heart Attack Causes reduced

averagelifespan

NP

14/63

Page 15: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Related Concepts

Dependency Graphs

Structures capturing implicit dependencies (sentential semantics)

between the tokens of a sentence

Stanford dependency notation2 for labeling the graph

2 Refer Handout. URL for a complete list - http://nlp.stanford.edu/software/dependencies manual.pdf

Example

15/63

Page 16: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Related Concepts

Formal Specifications for Validation and Representation

Domains generally described by concepts, relationships and instances

Need for a formalism to capture the domain description

Need for a suitable representation mechanism

An ontology is a structure O=(R,C) such that:

The sets R and C are disjoint and their elements are called relations

and concepts respectively

The elements in R induce a strict partial order on the elements in C

O = {{SportsPerson, Person, Number}, {scoredRuns}}

Domain (scoredRuns) = {SportsPerson}, Range (scoredRuns) = Number

Ontology

16/63

Page 17: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Related Concepts

Domain Ontology with Instances

A domain ontology with instances is a structure DOI=(O,I,h) such that:

I is a set, whose elements are called instances

There exists a function h:I ! P(C), where P(C) is the power-set of

the set of concepts for the ontology O

Example

O = {{SportsPerson, Person, Number}, {scoredRuns}}

Domain (scoredRuns) = {SportsPerson}, Range (scoredRuns) = Number

I = {John, Steve, Sachin_Tendulkar}

h (Sachin_Tendulkar ) = {SportsPerson, Person}

h (John) = {Person}

17/63

Page 18: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Related Concepts

Resource Description Framework (RDF)

Example – RDF Triple

Resources described using properties and values using RDF statements

Statements represented as RDF triples, consisting of a subject,

predicate and object

Unique Resource Identified (URI) for Resources

RDF Reification – Special mechanism to make assertions about

statements instead of entities

Sachin Tendulkar scored 200 runs

18/63

Page 19: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Definition

Ontology-guided Structured Information Extraction

Given:

Text fragment consisting of sentences {Ti}

Domain ontology with instances DOI=(O,I,h)

Ontology-guided structured information extraction:

Determines a set TCTR of candidate information constructs

using entity and relationship extraction algorithm(s)

Validates TCTR with respect to DOI and finds a set K of

validated information constructs

Represent triples in K using a suitable mechanism

Remainder of the presentation – Details of the above three steps

19/63

Page 20: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Approach

Composite Extraction Framework

Copyright © Sushain Pandit, 2010.

Outline

20/63

Page 21: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Recall: Motivating Example

Sports

News

predicts

that

Sachin Tendulkar may score a

double-hundred

Clause-level DependencyA Qualifying Modifier

Outer Relationship

Dependency

with high

probability

Inner Clause subject to the

Qualifying Modifier

andretire in

2015

Conjunction creating

dependencies between parts of the

sentence

Left part governing the meaning of

right part

Approach Composite Extraction Framework

21/63

Page 22: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Terminology -Extraction Rule

Rule –

“Label(s) from {nn, amod} occur along the edges connected to an nsubj node”

! “Group the associated nodes with the nsubj node”

“Labels nsubj & dobj occur along a set of adjacent edges” ! “Extract the nodes

associated with those edges as information constructs”

Result – Extraction of {{Heat, attack, causes}, reduced, {average, lifespan}} as

candidate information construct

22/63

Page 23: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Motivating Example: Identifying Sub-problems

Sports

News

predicts

that

Sachin Tendulkar may score a

double-hundred

Clause-level Dependency

Outer Relationship

Dependency

Inner Clause subject to the

Qualifying Modifier

Approach Composite Extraction Framework

23/63

Page 24: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Identifying Complex Relationship - Type 1

Relationships with Internal Clauses:

Variants:

That Macs are too cool for its customers, says Microsoft ad

Microsoft ad says: Macs are too cool for its customers

24/63

Page 25: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 1

Dependency Graph:

25/63

Page 26: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 1

Expected Extraction Rule Behavior:

Clausal Complement – ccomp

Variants - parataxis

Leave this for later – Recursively

reduced to one of the other types

26/63

Page 27: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Motivating Example: Identifying Sub-problems

Sachin Tendulkar may score a

double-hundred

A Qualifying Modifier

with high

probability

Approach Composite Extraction Framework

27/63

Page 28: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Identifying Complex Relationship - Type 2

Relationships with Qualifiers:

Variants:

With high probability, Sachin Tendulkar may score a double-hundred.

There is a high probability that Sachin Tendulkar may score a double-hundred

28/63

Page 29: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 2

Dependency Graph:

29/63

Page 30: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 2

Expected Extraction Rule Behavior:

Prepositional Modifier – prep

Variants – prep_xxx

Adjectival Modifier – amod

Prep – amod Pattern Identifies

this Relationship Type

30/63

Page 31: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Motivating Example: Identifying Sub-problems

Sachin Tendulkar may score a

double-hundredand

retire in

2015

Conjunction creating

dependencies between parts of the

sentence

Left part governing the meaning of

right part

Approach Composite Extraction Framework

31/63

Page 32: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Identifying Complex Relationship – Type 3

Relationships with Conjunctions:

32/63

Page 33: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 3

Conjunctions connect parts having immediate dependencies

Reference resolution required between the parts

Utilize Sentence Parses instead of dependency graphs

Formulation:

If right-part contains Simple Declarative Clause (S), process as a

distinct sentence

If right-part contains Verb and Noun Phrases (VP, NP), use the subject

of left-part and process as a distinct sentence

If right-part contains only NP, use the subject and object of left-part

and process as a distinct sentence

33/63

Page 34: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Motivating Example: Identifying Sub-problems

Sachin Tendulkar may score a

double-hundred

Approach Composite Extraction Framework

34/63

Page 35: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Simple

Dependency Graph

35/63

Page 36: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Extraction Algorithm - Illustration

Sports News predicts that

Sachin Tendulkar may score a double-hundred

with high probability

and retire in 2015

36/63

Page 37: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Extraction Algorithm - Illustration

{Sports News, predicts}

Sachin Tendulkar may score a double-hundred

with high probability

and

retire in 2015

37/63

Page 38: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Extraction Algorithm – Illustration

Sachin Tendulkar may score a double-hundred

with high probability

38/63

Page 39: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Extraction Algorithm – Illustration

retire in 2015

Append – Sachin Tendulkar

Sachin Tendulkar retire in 2015

39/63

Page 40: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Extraction Algorithm – Illustration

retire in 2015

Append – Sachin Tendulkar

Sachin Tendulkar retire in 2015

40/63

Page 41: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Extraction Algorithm - Illustration

{Sports News, predicts

{ Sachin Tendulkar, scored,

double hundred, probability, high } }

{ Sports News, predicts

{ Sachin Tendulkar, retire, 2015 } }

41/63

Page 42: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Approach

Composite Extraction Framework

Semantic Validation Framework

Representation Framework

Copyright © Sushain Pandit, 2010.

Outline

42/63

Page 43: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Validation Framework

Validation using Domain Ontology

Extracted Information Constructs to be matched against the domain desc.

Instance matches for the subject and object

Relationship match for the predicate

Domain / Range check to ensure validity as per the domain

Validation Rule

Given

Set of sentences {Ti} with word-set W

Set TCTR of candidate constructs extracted by an extraction algorithm

Domain ontology with instances DOI=(O,I,h),

Mapping F from W to R [ I

Validation process results in a set K of validated constructs such that:

43/63

Page 44: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Validation Framework

Validation Rule (Contd.)

{9 y1,y22 I,9 r2 R | {y1, r, y2} 2 K

, (9 w1, w2, w32 W, c1, c2 2 C | {w1, w3, w2} 2 TCTR Å {w1, y1} 2 F Å {w2, y2} 2 F Å

{w3, r} 2 F Å c1 2 h(y1) Å c2 2 h(y2) Å c1 2 Domain(r) Å c2 2 Range(r)}

Validation Rule - Illustrated

Ti = “Sachin Tendulkar scored 200 runs”

TCTR = {Sachin, scored, 200}

O = { C = {SportsPerson, Number}, R = {scoredRuns}}

Domain (scoredRuns) = {SportsPerson}, Range (scoredRuns) = Number

I = {Sachin_Tendulkar, 200}

F = {{Sachin, Sachin_Tendulkar }, {scored, scoredRuns}, {200, 200}}

h (Sachin_Tendulkar ) = {SportsPerson}; h(200) = {Number}

44/63

Page 45: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Validation Framework

Validation Rule (Contd.)

{9 y1,y22 I,9 r2 R | {y1, r, y2} 2 K

, (9 w1, w2, w32 W, c1, c2 2 C | {w1, w3, w2} 2 TCTR Å {w1, y1} 2 F Å {w2, y2} 2 F Å

{w3, r} 2 F Å c1 2 h(y1) Å c2 2 h(y2) Å c1 2 Domain(r) Å c2 2 Range(r)}

Validation Rule - Illustrated

Ti = “Sachin Tendulkar scored 200 runs”

TCTR = {Sachin, scored, 200}

O = { C = {SportsPerson, Number}, R = {scoredRuns}}

Domain (scoredRuns) = {SportsPerson}, Range (scoredRuns) = Number

I = {Sachin_Tendulkar, 200}

F = {{Sachin, Sachin_Tendulkar }, {scored, scoredRuns}, {200, 200}}

h (Sachin_Tendulkar ) = {SportsPerson}; h(200) = {Number}

45/63

Page 46: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Validation Framework

Validation Rule (Contd.)

{9 y1,y22 I,9 r2 R | {y1, r, y2} 2 K

, (9 w1, w2, w32 W, c1, c2 2 C | {w1, w3, w2} 2 TCTR Å {w1, y1} 2 F Å {w2, y2} 2 F Å

{w3, r} 2 F Å c1 2 h(y1) Å c2 2 h(y2) Å c1 2 Domain(r) Å c2 2 Range(r)}

Validation Rule - Illustrated

Ti = “Sachin Tendulkar scored 200 runs”

TCTR = {Sachin, scored, 200}

O = { C = {SportsPerson, Number}, R = {scoredRuns}}

Domain (scoredRuns) = {SportsPerson}, Range (scoredRuns) = Number

I = {Sachin_Tendulkar, 200}

F = {{Sachin, Sachin_Tendulkar }, {scored, scoredRuns}, {200, 200}}

h (Sachin_Tendulkar ) = {SportsPerson}; h(200) = {Number}

{Sachin_Tendulkar, scoredRuns, 200} 2 K Holds

46/63

Page 47: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Validation Framework

Validation Rule (Contd.)

{9 y1,y22 I,9 r2 R | {y1, r, y2} 2 K

, (9 w1, w2, w32 W, c1, c2 2 C | {w1, w3, w2} 2 TCTR Å {w1, y1} 2 F Å {w2, y2} 2 F Å

{w3, r} 2 F Å c1 2 h(y1) Å c2 2 h(y2) Å c1 2 Domain(r) Å c2 2 Range(r)}

Validation Rule - Illustrated

Ti = “Sachin Tendulkar scored 200 runs”

TCTR = {Sachin, scored, 200}

{Sachin_Tendulkar, scoredRuns, 200} 2 K Holds

Validation Process with

respect to the Domain

Ontology with Instances

47/63

Page 48: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Representation Framework

Simple Relationships - Primitive Transformation

Extracted Information is - Ksimple = {{si, pi, oi} | {si, pi, oi} 2 K Å |oi| = 1}

TransformPrimitive({si, pi, oi} ) ! GRDF ({si, oi}, {pi})

The transformation is easily realized using the RDF triple notation

Primitive Transformation – Example

{ Sachin_Tendulkar, scoredRuns, 200 }

48/63

Page 49: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Representation Framework

Complex Relationships - Composite Transformation

Extracted Information is - Kcomplex = {{si, pi, oi} | {si, pi, oi} 2 K Å |oi| > 1}

TransformComposite({si, pi, oi} ) ! {TransformPrimitive({si, pi, oi1} ) , …}

Transformation realized using RDF reification mechanism

Composite Transformation – Example

{Microsoft ad, says, { Mac_Unit, cool, customers } }

49/63

Page 50: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Representation Algorithm

RDF Graph Generation Algorithm

Composite

Transformation

Primitive

Transformation

50/63

Page 51: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Evaluation

SEMANTIXS Architecture

Experimental Results and Analysis

Copyright © Sushain Pandit, 2010.

Outline

51/63

Page 52: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation SEMANTIXS Architecture

SEMANTIXS

System to extract information from free-text in the form of complex (and

simple) relationships - https://sourceforge.net/projects/semantixs

Java-based Web Application utilizing:

Jena Semantic Web Toolkit

Stanford Parser Libraries

Google Web Toolkit

SVG Visualizer from HP Lab

Operates in 3 different modes – Trade-off between correctness &

coverage

Output conforms to W3C guidelines for RDF – Implicit graph

specification

Visualization facility to analyze entity-specific RDF sub-graphs

52/63

Page 53: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation SEMANTIXS Architecture

SEMANTIXS

Module implementing

the Recursive

Validation and

Representation

Algorithm

Module Implementing

the Extraction Rules

and related Logic

Module implementing

the Core validation

Logic

53/63

Page 54: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Experimental Setup

Pre-annotated benchmark data-set unavailable for complex relationships

Gold standard in IE - Message Understanding Conf (MUC-1 to 7)

Mostly news articles related to military and civil themes

Focused on tasks related to entities, facts, events and attributes

Not rich enough in complex nested relationships

Chosen real-world Text, Ontology and Instances:

Followed suit with MUC – Selected news articles from CBSNews

Queried CBSNews.com1 for “Dow Jones”

Randomly selected 80 sentences across 4 articles

Utilized DBpedia ontology and a subset of types

1 Query - http://www.cbsnews.com/1770-5_162-0-4.html?query=Dow+Jones&searchtype=cbsSearch

2 DBpedia - http://wiki.dbpedia.org/Downloads34#dbpediaontology

54/63

Page 55: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

For complex relations (type 1 & 2), correctness judged based upon -

Correct structural representation extracted for complex relationship

Correct semantic representation extracted for the simple relationship within

Correct and complete extraction of all the relations contributes to each

individual count

Partially-correct extraction still contributes to the count for correctly

extracted relationship

Experimental Text: Counts of Pos and Neg Instances

Methodology used in Analyzing Correctness

55/63

Page 56: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Correctly Classified: Counts of Pos and Neg Instances

Experimental Text: Counts of Pos and Neg Instances

56/63

Page 57: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Simple relationships

False positives & negatives due to shallow syntactic comparisons in validation

For complex (types 1 & 2), correctness based on structure – Recall true measure

For Type 1 (Clause-level)

Most false negatives due to multi-level dependency structures and references

False positives – While validating outer subject and predicate [similar to simple]

For Type 2 (With Qualification)

Most false negatives while validating qualification and value [similar to simple]

Precision, Recall and F-measure

57/63

Page 58: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Type 3 (Conjunctions)

Correctness based on the expected construction of left and right fragments

Analysis of individual fragments falls under one of the other relationship types

References

High recall – Due to naïve pronoun resolution methodology

Low precision – Aggressive pronoun resolution leading to many false positives

Other Failing Cases – Algorithm not designed to handle them

Co-references

Negations, Or-conjunctions, etc

Outliers – Relevant instance but unexpected pattern in the dependency graph

Precision, Recall and F-measure

58/63

Page 59: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Querying the Graph

Example Graph

Extracted RDF metadata forms a Semantic Graph

Can be queried using SPARQL to answer complex questions

Performed queries to answer questions for the entity “Dow Jones”

59/63

Page 60: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Example Questions

Example Query

SELECT ?s1

WHERE {?s <type> <#Statement>.

?s <#subject> <dbpedia.org/page/Dow_Jones_Industrial_Average>.

?s1 ?p1 ?s; }

Look for all those subjects s1, which

have a statement s as their object

such that s talks about Dow Jones

Finding the subjects of assertions that were made about an entity

Who made any assertions about Dow Jones ?

Finding entities based on complex criteria

What are the entities that Dow Jones made qualified statements about ?

Finding entities based on relationship participation

Which entity appears in a fact with Dow Jones ?

60/63

Page 61: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Conclusion

Summary

Further Work

Copyright © Sushain Pandit, 2010.

Outline

61/63

Page 62: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Conclusion Summary

Summary

We described a modular ontology-based approach to information

extraction for a subset of nested complex relationships

We illustrated a semantic representation of the extracted relationships in

the form of query-able (RDF) graphs

We described the system details of SEMANTIXS, a system for ontology-

guided extraction and semantic representation of structured information

from unstructured text and reported results to validate the proposed

approach

62/63

Page 63: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Conclusion Further Work

Further Work

Enhancements to improve the precision and recall of the system

Deep comparisons in validation

Consider synonyms, external resources, etc

Enhance pronoun resolution, co-reference resolution, etc

Complex knowledge discovery and question answering over the

extracted semantic graphs

Opinion mining and recommendation systems by creating semantic

graphs consisting entirely of opinions / recommendations

Extending the rule-base to capture more relationships, handle

negations, Or-conjunctions, etc

Perform domain analysis and ontology building

63/63

Page 64: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Conclusion

Thank You !

Sushain Pandit

[email protected]

Page 65: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Conclusion

Backup Slides

65/63

Page 66: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Introduction Motivation

Existing Approaches for Information Extraction

Rule-based Approaches

Laborious but transparent in capturing complex semantic criteria

Best performing systems invariably use hand-crafted rules

Often rely on domain-specific trigger words

Automatic pattern induction (statistical methods)

Co-occurrence – statistically significant associations

Require a lot of labeled text corpora (hard to acquire for

complex relations)

Cluster Analysis – similarity measure

Require computational cost for feature preparation

66/63

Page 67: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Problem Related Concepts

Derivative Structures from Text

Complete lack of semantics necessitate an intermediate representation

Linguistic Parsers used to generate data structures with respect to a

formal grammar

Popular parsing libraries

Natural Language Toolkit (NLTK)

Two-Stage Discriminative Parser by McDonald, et al

Stanford Parser

Stanford Parser chosen based on

Flexibility of representation

Accuracy in dependency analysis, parsing, tagging, chunking, etc.

Processing Speed

67/63

Page 68: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Terminology

pi: ith condition or premise for a rule (defined below).

cj: jth action or consequent for a rule, corresponding to a set {pi}

G(V,E): A dependency graph with vertex-set V and edge-set E

GS(V '): Subgraph of G induced by the vertex-set V '

D: A set of labels denoting the typed dependency relations

l:E!D: A function that associates labels to the edges in G

Extraction Rule

For a dependency graph G, we define an extraction rule as:

rk: {pi} ! {cj}, meaning – If {pi} holds, perform {cj}

68/63

Page 69: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 1

Forming Extraction Rule

pred1 = {Node with two outgoing edges with labels “nsubj” and “ccomp”}

sub1 = {Node (node1) that is connected to pred1 by edge with label “nsubj”, Node

connected to node1 by an edge with label “nn” or “quantmod”}

Formalized Extraction Rule:

rRIC1: {9 u, v, w 2 V, 9 e1(u, v), e2(v, w) 2 E | l(e1) = “nsubj” Å (l(e2) 2 {“ccomp”,

“parataxis”) ! {pred1 = {v}, sub1 = {u}}

rRIC2: {9 u, v, w, t 2 V, 9 e1(u, v), e2(v, w), e3(u, t) 2 E | l(e1) = “nsubj” Å (l(e2) 2

{“ccomp”, “parataxis”) Å (l(e3) 2 {“nn”, “quantmod”) ! {sub1 = sub1 [{t}}

69/63

Page 70: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Type 2

Forming Extraction Rule

pred1 = {Node with two outgoing edges with labels “nsubj” and “dobj”}

sub1 = {Node (node1) that is connected to pred1 by edge with label “nsubj”,

Node connected to node1 by an edge with label “nn” or “quantmod”}

obj1 = {Node (node2) that is connected to pred1 by edge with label “dobj”, Node

connected to node2 by an edge with label “nn” or “quantmod”}

qual1 = {Node with two edges labeled “prep” and “amod”}

val1 = {Node that is connected to qual1 by the edge with label “amod”}

70/63

Page 71: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Identifying Simple Relationship Type

Simple Relationships:

At most one subject and object each

No clause-level dependencies, conjunctions, or a clausal subject

Only noun-compound, or adjectival modifiers

In terms of Stanford dependencies, this implies:

At most one dependency of type nsubj

At most one dependency from the set {dobj, pobj}

No dependencies from the set {ccomp, xcomp, acomp, compl, conj, etc.}.

Only *mod = {amod, quantmod, nn} as modifiers

71/63

Page 72: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rules for Entity & Relation Extraction – Simple

Dependency Graph

Forming Extraction Rule:

pred1 = {Node with two outgoing edges with labels “nsubj” and “dobj”}

sub1 = {Node (node1) that is connected to pred1 by edge with label “nsubj”, Node

connected to node1 by an edge with label “nn” or “*mod”}

obj1 = {Node (node2) that is connected to pred1 by edge with label “dobj”, Node

connected to node2 by an edge with label “nn” or “*mod”}

72/63

Page 73: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rule-based Entity and Relationship Extraction Algorithm

Handle Clausal

Relationships

recursively

Handle Conjunctions

by Analyzing the

Structure of the

Sentence Parse

Apply Extraction Rules on

the Input Dependency

Graph

Store Information

Constructs for Pronoun

Resolution

73/63

Page 74: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Composite Extraction Framework

Rule-based Entity and Relationship Extraction Algorithm

Handle

Qualified

Relationships

using

Enrichments

Utilize Stored Information

Constructs for Forward

Reference Resolution

Handle Simple

Relationships

74/63

Page 75: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Overall Algorithm

Overall Algorithm to Extract Information from Text

Extract All Candidate

Information Constructs

for the sentence

Validation and

Represent the

Extracted Information

Constructs

75/63

Page 76: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Discussion

Claim 1: The resulting graphs from TransformPrimitive and

TransformComposite are valid RDF fragments [Follows from the definitions of

TransformPrimitive and TransformComposite]

Claim 2: There always exists a transformation from a valid (syntactically

and w.r.t domain definition) natural language sentence containing at least

one of the relationship types identified by us, to a graph formalism such that

the underlying information expressed in the relationship is captured in a

query-able form in the graph [Follows from the algorithms in Composite Extraction

and Semantic Validation frameworks and Claim 1]

Claims Based on the Described Frameworks

76/63

Page 77: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Representation Framework

Transforming Validated Constructs into Graph(s)

Seek transformation from the set K={{si, pi, oi}} of validated constructs

to a (RDF) Graph, GRDF (V, E) such that

the transformation be able to represent all types of validated

constructs for all relationship types

the resulting graph(s) conform to valid RDF specification

transformation for complex relationship types be easily realized

using either simple triple notation, or RDF reification mechanism

77/63

Page 78: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Representation Framework

Complex Relationships - Composite Transformation

Extracted Information is - Kcomplex = {{si, pi, oi} | {si, pi, oi} 2 K Å |oi| > 1}

TransformComposite({si, pi, oi} ) !

TransformPrimitive({si, pi, t} ) , TransformPrimitive({t, obj, ooi} )

TransformPrimitive({t, pred, poi} ), TransformPrimitive({t, sub, soi} )

TransformPrimitive({t, stmt, id} )

78/63

Page 79: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Approach Representation Framework

Complex Type 2 – Composite Transformation

Relationships with qualifications represented as a set of Primitive

Transformations

Inputs for the Primitive Transformations created by Enrichments module

79/63

Page 80: Ontology-guided Information Extraction from Unstructured Textsemantix/Sushain_Thesis... · 2010-11-22 · Sachin Tendulkar may score a double-hundred with high probability and retire

Iowa State University Department of Computer Science

Copyright © Sushain Pandit, 2010.

Evaluation Results and Analysis

Correctly Classified: Counts of Pos and Neg Instances

Confusion Matrices

80/63