modular ontology architecture for using human defined sets of concepts presentation by...

25
Modular Ontology architecture for using human defined sets of concepts Presentation by OntologyStream Inc Paul Stephen Prueitt, PhD Ontology Tutorial 5, copyright, Paul S Prueitt 2005

Upload: oswin-malone

Post on 25-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Modular Ontology

architecture for using human defined sets of concepts

Presentation by OntologyStream Inc

Paul Stephen Prueitt, PhD

Ontology Tutorial 5, copyright, Paul S Prueitt 2005

The best example of an ontology is the set of positive integers

Set of positive integers

Mathematical models of natural systems

Arrow of timeGeographical positions

Instances in the world where the concepts of a counting number are essential

Accounting Quantitative measurement

Set of positive integers

Instances in the world where the concepts of a counting number are essential

The concept of an integer is used without the specific use of a concept effecting the definition of the concept, of “two-ness” for example.

The existence of this set of concepts allows a great diversity of human activities.

The “ontology standard” is enforced by the correctness of the concepts and by the ease in which new applications can be found. The standard is ultra-stable and resilient because the concepts are correct. The standard is not owned by anyone.

Modular ontology is used to measure the properties of events with sets of concepts.

processes

Notation

e(i) = w(i)/s(i)

The measurement of an event has a weakly structured and a structure part

{ e(i) }{ w(i) }

{ s(i) }

Semantic extraction

Discrete analysis

Events occur in a real world as part of complex processes. Largely because events are seen as having patterns and structure, software engineers can build relational databases, or XML repositories to help us understand and interact with information that is situation specific.

With ontology, human communities will be able to reveal a set of concepts, and define regular relationships between concepts. We call this “Ontology mediation of information flow”. The formal representations of the concepts are used to organize data and to move data from one place to another. This has to be demonstrated.

We will illustrate Ontology mediation of information flow, as an example, in the development and use of Harmonized Trade Tariff Schedule Administrative Rulings. A HTS Administrative Ruling is a short public document that ties together a code used to determine duties on imported or exported commodities.

A second example is suggested whereas Selectivity and Targeting reports are seen as measurement of selectivity and targeting events by Custom and Border Protection.

processesSemantic extraction

A framework holds a higher level abstraction representing an analysis of how things follow each other.

Example: event-Structure Ontology Framework (e-SOF) has 18 cells developed from the cross product of the three dimensions :

{past,present,future}; {people,places,things}; {how,why}

Example: risk/gains Ontology Framework (rg-OF) has 40 cells developed from the cross product of the three dimensions:

{Risk, Gain}; {Anomaly, Trend}; { measurement/assessment, name/group, event/context, rule, policy/component, function/behavior }

Ontology Framework

processesExplicit ontology such as OWL DL

By aligning the internal (implicit set of concepts) in a semantic extraction computation with the explicit form of concept representation, provided by the OWL DL standard, one is able to organize information expressed as concepts in free form text.

One is able to use look up tables, lists, controlled vocabularies and taxonomies to expand that statement of these conceptual expressions so that the expression is as clear, complete and consistent as possible.

One is able to move the information from a single event into a computational space where specific structure is available to bring relevant information to the report development process.

One is able to, after the fact, create a better report about an event, such as an administrative ruling or a selectivity and targeting action.

One is able to develop long term trending and analogy detection using specific information about how things are related to each other in the real world.

Ontology Framework

A modular ontology management infrastructure provides various services in the context of field reporting over transactions

upper level ontologies

“other” upper level ontologyLaw governing US CustomsAdvanced Trade DataEconomic Supply Chain Data

Findings ontology Entities ontologyGain/Risk ontology

sources of data

Location ontology

Later application areas

HTS Ontology

Written reportsWritten reports

Structural Event .

Ontology Framework

In our work, human knowledge is captured separately in two computer computable forms:

implicit (semantic extraction ontology) and explicit OWL DL ontology

Gain / Risk

Ontology Framework .

{ who, where, what, how, why }

x

{ past, present, future}

Structural Event

Ontology Framework

The classical, existing from Greek times, six interrogatives is partitioned into three parts; {people, places, things} + { event structure with causality } + time

{ people, places, things } event structure

18 questions from frames

(past, who, how),

(past, who, why),

(present, who, how)

(present, who, why)

(future, who, how)

(future, who, why)

Etc…

event Structure Ontology Framework (e-SOF) **

** e-SOF was “discovered” by Dr. Paul S. Prueitt while thinking about a US Customs ontology prototype in March 2005

Ontology Framework

Ontology Reasoner

Scoped Ontology Individuals

Knowledge Management visualization

Knowledge Engineer visualization

By internally adjusting the rules within any one of the commercially available semantic extraction (implicit) ontology we measure text, or structured data in a single record, using a three element frame

( y, x, z)

where x is from the set { people, places, things } where y is form the set { past, present, future } and where z is from the set { how, why }

There are 3*3*2 = 18 of these three element frames, each which can be seen to ask a question. The measurement using linguistic and structural knowledge to answer those questions that can be answered. Those that are not answered are left blank.

Other semantic extraction tools can be similarly manipulated to produce an alignment between internal ontology (not often OWL) and external OWL DL ontology (which is our standard).

High Risk

Ontology ExpressionBio-systemsWeapon-systems

Commodity history analysis

Entry Reportsand Findings

{ concepts }

Ontology Framework with Differential Ontology Expressions

informs

aligns aligns

Ontology expression about the risks measured from historical analysis of

commodities

US Customs cultural viewpoints expressed as sets of concepts

Shipping manifests

Entity histories

High Risk

Ontology ExpressionBio-systemsWeapon-systems

Commodity history analysis

Entry Reportsand Findings

{ concepts }

Rapid knowledge acquisition and reporting about a transaction

Ontology expression about the risks measured from historical analysis of

commodities

US Customs cultural viewpoints expressed as sets of concepts

A transaction:Nautilus Explorer (“Nautilus”) owns and operates the M/V NAUTILUS EXPLORER, a 116-foot Canadian-flagged long-range dive boat. Nautilus would like to embark passengers in San Diego, California, on two separate occasions, for three days of diving in Mexican waters before returning to San Diego. The passengers would be embarked and disembarked at the same location in San Diego.

Semi-automated generation of Reports

We take the first two dimensions of a framework to be

 { Anomality, Trend } union { Gain, Risk }

 

And the other dimension to be:

 { measurement, assessment, name, group, event, context, rule, policy, component,

function/behavior }

 Then, in the cross product, we have four sets of ten concepts. In fact the ten concepts are five sets of two concepts – each with an interesting “oppositional scale type” relationship.

 { measurement, assessment,

name, group, event, context,

rule, policy, component, function/behavior }

** This Gain/Risk Ontology Framework was “discovered” by Dr Prueitt in March 2005 while thinking about possible US Customs Selectivity and Targeting enhancements. Dr Peter Stephenson and Dr Prueitt are extending this in the context of Cyber Security ontology mediation data analysis.

gain/risk Ontology Framework (gf-OF) **

Semantic Extraction

Link Analysis

Pattern recognition

Ontology Tools

Statistics

Advanced Trade Data

Harmonized Tariff Schedule

Detailed work with tools over available data

Practical problem: Provide the three Cs, clarity, consistency, and completeness in EACH judicial review of a commodity in passage across national boarders.

Integrated collection of reified ontologies with some specific inferences and some information organization and retrieval

Possible deployment as U. S. Custom’s Total Information Awareness (TIA) capability

DataTransfer Object

(SOI) Scoped Ontology Individual

Transactions

Findings

Entry

Entry Summary

Script

SOI pushes information

Portal pulls information

databases

Script pulls information

Ontology Individuals have a subsumption relationship to upper abstract ontologies

Ontology Framework

Ontology Reasoner

Scoped Ontology Individuals

Hum

an

m

achi

ne

inte

rfa

ce

Knowledge Management visualization

Knowledge Engineer visualization

client visualization

An event

SOI design by-passes the critical “visualization” choke point

Scoped Ontology Individuals

Hum

an

m

achi

ne

inte

rfac

eSOI

SOI

SOI

SOI

SOI

Stack of SOIs supporting analysis of analysis

Ontology Framework

Ontology reasoning

The mental event is the model for the Scoped Ontology Individual (SOI). The SOI is a minimal formal ontology (defined in OWL DL) that binds the concepts and data together about a single event.

The Framework’s small number of concepts organize the organization of everything that is known about the data elements that occur in a Harmonized Tariff Schedule administrative ruling.

Once the data elements have been used as the initial conditions for SOI formation, additional SQL queries may be made, or additional ontology subsetting may be made so as to bring new information or information that was not initially known “into the visualized frame”.

Scoped Ontology Individuals

Hum

an

m

achi

ne

inte

rfac

e

SOI

SOI

SOI

SOI

SOI

Stack of SOIs supporting analysis of analysis

Ontology Framework

Ontology reasoning

Visualization of ontology: The concept of a Scoped Ontology Individual (SOI) opens up a visualization paradigm that has never been exposed before (it is an original concept that is based on decades of work in cognitive neuroscience)

SOI design by-passes the critical “visualization” choke point that occurs when Ontology Systems are built on the relational data base model (as is done in our ontology augmentation of rule engines).

This by-pass is created when data elements in a report is used to subset upper ontologies and domain ontologies to produce the minimal set of “concepts” needed to frame the data.

If Framework Ontology is being used, then this subsetting process has an expansion / contraction cycle that produces very small SOI objects. (see previous slide)

Ontology Framework

Ontology Reasoner

Scoped Ontology Individuals

Knowledge Management visualization

Readware

MITi Inc and InOrb Technologies have teamed to develop a demonstration capability based on the use of Readware internal ontology API to create text elements that populate the 18 cells of the e-SOF.

We use the triple:

( y, x, z)

where x is from the set { people, places, things } where y is form the set { past, present, future } and where z is from the set { how, why }

This involves three steps:

1) Coding eight probes that use the internal Readware stem-based text understanding computations to find information and classify this information as answers to people, places, things, past, present, future, how or why questions.

2) There are some options, but the one we are investigating first is to use the People Places and Things probes first. This is a well know “Named entity extraction” approach.

3) Then when one of the these three probes “finds” something; then the local neighborhood (in the Readware stem structure) is examined to see if more of one or more of the 18 questions can be answered.

Custom’s analyst

The other choke point is dependency on a relational database

DataTransfer Object

(SOI) Scoped Ontology Individual

Transactions

Findings

Entry

Entry Summary

Script

databases

Script pulls information

ILOG Rulebase Reasoner

An event

Ontology Augmentation of a rule based engine

For complex reasons, demonstration about how to use ontology have often used a fixed data set with doctored data to pretend as if scalability issues have been solved or are not relevant. These demonstrations fail far short of correctness and hid specific known weaknesses of classical IT architecture.

The scalability issue comes from the need to extend ontology or XML , add delete or modify concepts. These extension requirements come from many different origins, different communities of practice, and as circumstances change. Extensibility is the key contribution that XML has brought.

For example without a common data encoding paradigm, the scalability issue creates a second choke point. The relational database must have a fixed data schema. The work on such a solution is under the XML MetaData Repository standards process:

http://hpcrd.lbl.gov/SDM/XMDR/arch/

XMDR, RDF or OWL DL may, or may not, solve this problem. Modular ontology helps, but the principles developed in differential ontology, formative ontology and Framework Ontology seem essential to solving the whole problem as completely as possible. With these approaches, we find by-passes to technology problems that are seen now by the XMDR standards committee as being unsolvable. The definition of a event specific Scoped Ontology Individual is one of those by-passes.

On the relational database dependency

There are some existing software products, Convera, AeroText, MITi, Semagix, Autonomy, and others; were a common data encoding solution exists.

• A data encoding solution is generally protected by patents, and is used to provide computational efficiency; one of the best examples is PriMentia's Hilbert engine were a key-less hash table type data encoding allows contextual search in the most natural fashion. Autonomy has also the technology that Michael Lynch developed in the Autonomy spin-off N-Corp. Semagix, Applied technical Systems, and 15 or 20 others have excellent data encoding solutions.• If an government agency selected the two or three best technologies, the communications between the internal representation would be required. This may or may not be easy, depending on the specific technologies.

In Summary: These software products create an integration of classically understood methods using a common data encoding. Each COTS product uses a different internal data representation, and so the use of more than one COTS product will create binding issues.

A modular ontology management architecture can be used to integrate technologies like

• semantic extraction and related knowledge discovery in data technology (implicit ontology)• ontology development and editing (explicit ontology)• advanced algorithms related to risk definition and decision support• visualization technology

So government agencies really have two solution paths:

1) Choice one or two vendors after actually understanding what each vendor provides and create a complete solution with that tool set. The requires integration architecture.

2) Learn from a Trade Study process what the methods are that make COTS semantic extraction work, move around the patents and other IP; and develop a unique application that is specific to that government agency.

In either case, the greater challenge is the technology transition challenge.

If the technology is not a LOT better than the current beta sites and doctored demonstrations, then the transition effort will fail.

But, leaving transition issues aside, let us look closer that these two options

High level view of integration architecture

So we have two solution paths:

1) Choice one vendor after actually understanding what each vendor provides and create a complete solution with that tool set. But how to select?

CoreSystem CoreOntology first takes on the underlying stability issue by moving forward a design time Iconic language that may revolutionize how society uses computers.

Current generation best of bread technology

The list of possible qualified candidates for offering a complete solution might be less than 20 companies. In many cases, these companies are highly capitalized and would provide stability for some period of time. However, the underlying XML and ontology standards are not stable.

One would expect that better global solutions will exist within five years. So one needs to know that the sets of concepts can be exported and transformed as the market matures.

Current generation may not solve all problems in an optimal fashion

Next generation tools are no yet ready to produce systems

So we have two solution paths:

2) Learn from a Trade Study process what the methods are that make COTS semantic extraction work, move around the patents and other IP; and develop a unique application that is specific to Customs.

These two diagrams are from OntologyStream Inc.

There is no suggestion that this non-capitalized small company has the management skills required to build out an application specifically designed from the principles discussed by Prueitt and his colleagues. So we have sought the support and guidance from SAIC or IBM to bring a small team together to develop a government owned system based on these principles and at the smallest possible cost.

Summary

• Current contractors almost always treat ontology and XML technology as if the same as relational database technology.

• Current contractors are gaming the contracts so that maximum Time and Materials resources can be expended.

• Ontology and XML standards committees struggle with the issues of private intellectual property and hidden agendas.

• Ontology visualization by users is required to find optimal solutions consist with cultural expectations.

• Ontology and XML standards have not been able to address ontology visualization or process models that place Ontology and XML into complex work flow.

• A single payer entity is needed to bind together the best technology and to resolve IP and philosophical differences.