algorithm to populate telecom domain owl-dl ontology with a-box object properties derived from...

41
Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom B, 1 Baker CJO 1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada 2 Innovatia, Inc, Saint John, Canada

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Algorithm to populate Telecom domain OWL-DL ontology

with A-box object properties derived from Technical Support Documents

1Kouznetsov A, 2Shoebottom B, 1Baker CJO

1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada2 Innovatia, Inc, Saint John, Canada

Page 2: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Motivation: Why Ontology-Centric?

• Problem: To respond information requests timely contact center workers need to search through many types of knowledge resources

• Challenge: increasing quality of service and decreasing contact center costs

• Solution: using the ontology centric‐ platform– less escalation to more experienced workers– less time spent in resolving cases– training time is also greatly reduced

Page 3: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Motivation: Why Text Mining?

• Problem : Significant time spent by highly educated experts in populating ontology.

• Challenge: Reduce the workload• Solution: Apply text mining - semiautomatic

method for extracting information, specifically named entities and their relations, from texts and populating a domain ontology.

Page 4: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Focus

• We are focused on the problem of accurately extracting and populating relations between the named entities and presenting them as object properties between A-box individuals in an OWL-DL ontology.

Page 5: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Populate A-box Object Property. Single Property

Domain ClassMan

Range ClassWoman

Object Property

hasSister

Domain InstanceSamuel

Range InstanceMary?

T-Box

A-Box

Page 6: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Populate A-box Object Property. Multi-properties

Domain ClassMan

Range ClassWoman

Object Property

hasSister

T-Box

A-Box

Object Property

hasMother

Domain Instance

SamuelRange Instance

MaryhasSister

?

hasMother

?

Page 7: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

More complicate case….

Domain Instance

SamuelRange Instance

Mary

hasSister ?

hasMother ?

hasSameLastName

?

Page 8: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Methodology

• Ontology-based information retrieval applies Natural Language processing (NLP) to link text segments, named entities and relations between named entities to existing ontologies.

• Algorithm leverages a customized gazetteer list, including lists specific to object property synonyms

• Score A-box property candidates by using functions of distance between co-occurred terms.

• A-box Property prediction and population based on these scores (Thresholds, Fuzzy approach)

Page 9: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Main Implementation tools

Java

GATE/JAPE

OWLAPI

Page 10: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Semi-Automatic Ontology populating pipeline

Source Documents

XML

Preprocessing

SynonymsLists

TextSegmentsProcessing

TextSegments

Separation

Sentences

Tables

Other Text Segments

Ontologyunpopulated

(OWL)

Term List(Excel)

OntologyPopulation

Named Entities

Single Relations

MultiRelations

Populated Ontology

Using Ontology

Reasoning

Visualizing

VisualQueries

Connecting Recourses

Page 11: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Populating Ontology

Scoring Framework

Co-occurrence Based Scores

generator

Relation Framework for A-box

candidates extraction

Candidate

Decision Framework

Decisionmodule

Reasoning

Ontology

Scores

Focus

LabelledDataTres

Page 12: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Co-occurrence Based Scores generator

Co-occurrence Based Scores generator (Light version)

A-box CandidateAll related content

Scores

Relations Framework

Relation Object

Tokenizer

Gazetteer

Score calculator

IntegratorFragments Processor

Synonyms List

Page 13: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Generation of Scores

• Relation Collection

Framework to process Relation objects

• Relation Object

integrates object property with:• all types of related text fragments• ontology objects• and score processing intermediate and final results

identified as : Domain Class: Domain Instance : Object Property : Range Class: Range Instance

Page 14: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Scores Generator: Details

Score Calculator: • Score calculation for text fragments associated

with the Relation .

• Current version based on distance between occurred entities and number of text fragments with co-occurrence

• Includes by Text Fragments Processor and Integrator

Page 15: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

2-terms and 3-terms scoring system

Tokenizer

Score Gazeteer

ScoreProcessor

Domain Synonyms list

RangeSynonyms list

Object Property

Synonyms list

Tokenized sentence

sentencescore

Legend Legacy (2 terms) System

Modified/Added on new (3 terms) system

Page 16: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Multiple Formats Score Generation

Technical documentation contains knowledge displayed in multiple formats, each requiring different processing subroutines:

• Table Processing• Sentence Processing• Other segments

Page 17: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Extensible Data Model

Document Segment

Table Segment

Data Cell

IDContent

Row Header

IDContent

Column Header

IDContent

Table Header

IDContent

Text Segment

Sentence

IDContent

Document

Corpus

Doc ID

Options: Sections, Paragraphs, Bullet lists, Headings

Page 18: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

A-Box Prop. Population

A-Box property candidates list

Text Mining

corpus

Gazetteer List

A-Box Obj. Properties (399)

Properties with occurrence of domain

or rangeIndividuals (256)

Properties with co-occurrence of

domain and rangeIndividuals (143)

Ontology processing

T-Box Obj. Properties (102)

Page 19: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

A-Box scoring

Evidences for A-box Obj. Property candidates

Current A-box Object Property Candidate

Evidences for Current A-box (co-occurrence of Domain and Range)

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Evidences for Current A-box (occurrence of Domain or Range)

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Text Segment

Sentence

IDContent

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Table Segment

Data Cell

ID

Content

Row Header

ID

Content

Column Header

ID

Content

Table Header

ID

Content

Page 20: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Table Segments: Primary ScoringTable Segment

Data Cell

IDContent

Row Header

IDContent

Column Header

IDContent

Table Header

IDContent

A-Box scoring

Current A-box Object Property Candidate

Domain Property Range

Page 21: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Table Segments: Secondary ScoringTable Segment

Data Cell

IDContent

Row Header

IDContent

Column Header

IDContent

Table Header

IDContent

A-Box scoring

Current A-box Object Property Candidate

Domain Property Range

Page 22: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Sentence Scoring• A-box Object property Score for sentenceSentenceScore=1/(distance+1)+Bonus

• Integrated Object property Score over all related sentences

IntegratedScore= SUM(SentenceScore)

• Summarize Integrated Score with Table Scores

• Normalized Object property Score NormolizedScore= IntegratedScore/Norm

Page 23: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Sentence scoring Score=1/(distance+1)+Bonus

< > </ > 1D R

< > </ > 21 2 3D 4 R

< > </ > 41 2 PD 4 R

< > </ > 31 2 3D 4 R 6 P

Domain Synonym Range Synonym Object Property Synonym

D R P

Distance: 1000, Bonus =0, Score= 1/(1000+1)+0=0.00099

Distance: 4, Bonus =0, Score= 1/(4+1)+0=0.2

Distance: 6, Bonus =3, Score= 1/(6+1)+3=3.14

Distance: 4, Bonus =10, Score= 1/(4+1)+10=10.2

Page 24: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Example Sentence Type 1< > </ > 1D R

Distance: 1000, Bonus =0, Score= 1/(1000+1)+0=0.00099

sentence before cleaning: ["<Paragraph></Action> <Figure Numbered="Unnumbered" Position="Inline" TextSize="medium" Width="column" frame="all" id="DLM-11334063" xml:lang="en"><image border-style="none" border-width="medium" xml:lang="en" href="ERGNN46205-301Loosening_screws_on_the_SDM_FW4_8010co_chassis33b.png"/></Figure></Step><Step xml:lang="en"><Action><Paragraph xml:lang="en">Rotate the insert/extractlevers to eject the 8660 SDM from the chassis.] Final Score=9.99000999000999E-4 Best Bonus=0.0 Final Distance=1000.0

Telecommunications_Chassis:8010co_Chassis:hasChassis_Shipping_Accessories:Telecommunications_Chassis_Screws:Screws

Property Synonyms:

•need•have•require•has

Domain Synonyms:•8010co chassis•8010co Chassis•8010 CO chassis•8010co•8010CO chassis

Range Synonyms:

•Screws•screws

Page 25: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Example Sentence Type 2

sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other.

Final Score=0.05Best Bonus=0.0 Final Distance=19

Telecommunications_Chassis:Chassis:hasChassis_Components:Telecommunications_Chassis_Power_Supply:Power_Supply

Property Synonyms:

•have•has

Domain Synonyms:

•chassis•switch chassis•8000 series•Chassis•CO chassis

Range Synonyms:

•Power Supply•transformer•power supply•power module•Power supply

< > </ > 21 2 3D 4 R

Page 26: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Example Sentence Type 4

sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other.

Final Score=10.05Best Bonus=10.0 Final Distance=19

Telecommunications_Chassis_Power_Supply:Power_Supply:isPart_of_Chassis:Telecommunications_Chassis:Chassis

Property Synonyms:

•used in•include

Domain Synonyms:

•Power Supply•transformer•power supply•power module•Power supply

Range Synonyms:

•chassis•switch chassis•8000 series•Chassis•CO chassis

< > </ > 41 2 PD 4 R

Page 27: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Bonus Calculation

< > </ >1 2 PD 4 R6

< > </ >1 2 3D R6P Distance: 6, Bonus Constant =10, Tokens in Property=2, Score= 1/(6+1)+2*10=20.14

Distance: 6, Bonus Constant=10, Tokens in Property=1, Score= 1/(6+1)+1*10=10.14

P

3

Bonus= Bonus Constant * Number of tokens in property

Sentence Example: Device X does not support Device Y

Object Properly Tokens Number Obtained Score Support 1 1/(3+1)+1*10=10.25

Not Support 2 1/(3+1)+2*10=20.25 V

Page 28: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Normalization

• Norm coefficient for A-box object property

Log(1.0+(NSD+1.0/Cd) *(NSR+1.0/Cr) )NSD – Number Of Sentences Domain OccurredCd – Domain Synonyms List CardinalityNSR – Number Of Sentences Range OccurredCr – Range Synonyms List Cardinality

Page 29: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Gold Standard and Evaluation Framework

A-BoxOntology

T-Box Ontology

LabelsEvaluation

Report

Source Documents

XML

Preprocessing

Synonyms

Lists

TextSegmentsProcessing

TextSegment

sSeparati

on

Sentences

Tables

BulletLists

Ontologyunpopulated

(OWL)

Term List(Excel)

OntologyPopulation

Named

Entities

Single Relatio

ns

MultiRelatio

ns

Populated Ontology

Using Ontology

Reasoning

Visualizing

VisualQueries

Connecting

Recourses

PopulateOntology

Prediction evaluation Framework

Evaluate predictedProperties

/Update DB

Golden StandardDatabase

Import labels

KnowledgeEngineer

Page 30: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Thresholds: Decision Boundary

All scores for each A-box property candidate are summarized for based on eligible sources of evidence for the A-box in question

Threshold in use Trade off - Recall vs. Precision

Page 31: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Results for Tables: Baseline result

Focus on Positive class Recall and Positive class Precision

Class of interest (Positive class) Recall =0.80 Precision=0.85

Page 32: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Results for Tables: Continued

Focus on Positive class Precision

Class of interest (Positive class) Recall =0.25 Precision=1.0

Page 33: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Results for Tables: Continued

Focus on Positive class Recall

Class of interest (Positive class) Recall =1.0 Precision=77.5

Page 34: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Results for Sentences

Focus on Positive class Precision

Class of interest (Positive class) Recall =0.14 Precision=1.0

Page 35: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Results for Sentences and Tables

Focus on Positive class Precision Class of interest (Positive class)

Recall =0.4 Precision=1.0

Synergetic effect of using Sentences and Tables (wrt Precision=1.0):

Recall (sentences)= 0.14 Recall (tables)= 0.25 Recall (sentences & tables)= 0.4

Page 36: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Advantages

Improve Quality of Knowledge BaseManaging the argumentation process KB vs KE Iterative improvement of accuracy

Tier1 doing Tier 2 task (improve service)Tier1 (high precision) KB queryTier 2 (high recall) – knowledge integration Facilitate information processing without KE

Reduce workload (saving)

Page 37: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Improve Quality of Knowledge Base

• Offline task by Knowledge Engineer • Disambiguation– Expert can pay special attention to any significant

inconsistency in human and machine outputs such as - Highly scored A-box candidates labeled as negatives

• Human Expert & Machine Committee vs. single human expert

Page 38: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Real Time Integration of New Evidence

• Online, by call centre worker, at knowledge use stage– Extracting additional object properties from new

documents for emergency case– High Positive Precision focused scenario

• Offline, by Senior call centre worker, at knowledge use stage– Extracting additional object properties from new

documents for questions not answered online– High Positive Recall focused scenario

Page 39: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Reduce Workload

• Online and Offline • Automatically Extracted Evidenced• Ranked Solutions with notified level of

confidence

Page 40: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Gold Standard Corpus and Evaluation Framework

A-BoxOntology

T-Box Ontology

LabelsEvaluation

Report

Source Documents

XML

Preprocessing

Synonyms

Lists

TextSegmentsProcessing

TextSegment

sSeparati

on

Sentences

Tables

BulletLists

Ontologyunpopulated

(OWL)

Term List(Excel)

OntologyPopulation

Named

Entities

Single Relatio

ns

MultiRelatio

ns

Populated Ontology

Using Ontology

Reasoning

Visualizing

VisualQueries

Connecting

Recourses

PopulateOntology

Prediction evaluation Framework

Evaluate predictedProperties

/Update DB

Golden StandardDatabase

Import labels

KnowledgeEngineer

Page 41: Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom

Future Work: Extend Literature Scheme

• Sections• Paragraphs• Bullet Lists• Connect with Headings and Topics