september 2001

26
06/27/22 SKC Synopsis September 2001 Gio Wiederhold, Shrish Agarwal, Stefan Decker, Jan Janninck, Prasenjit Mitra, et al. Stanford University, CSD S K C Scalable Knowledge Composition

Upload: wade-mcintosh

Post on 31-Dec-2015

31 views

Category:

Documents


1 download

DESCRIPTION

C. K. S. Scalable Knowledge Composition. September 2001 Gio Wiederhold, Shrish Agarwal, Stefan Decker, Jan Janninck, Prasenjit Mitra, et al. Stanford University, CSD. Data + Knowledge  Information. Apply relevant Knowledge to relevant Data. Analyses. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: September 2001

04/19/23 SKC Synopsis

September 2001

Gio Wiederhold, Shrish Agarwal, Stefan Decker, Jan Janninck, Prasenjit Mitra, et al.

Stanford University, CSD

S K C

Scalable Knowledge Composition

Page 2: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 2

Data + Knowledge Information

• Apply relevant Knowledge to relevant Data

ObservationsQuality Filters

Analyses

• to obtain: Information for decision-making

Aggregation of instances

Selection

SKC focusComposition of source information

Page 3: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 3

Many sources, disciplines, people

Extraction of actionable information, so that future benefits can accrue,

requires broad-based knowledge

Areas make deep progress in isolation

Benefits are possible when

solid results are available

the results are Integrated or Composed

Broad base leads to heterogeneity and inconsistency of terminologies

Page 4: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 4

Language differences inhibit integration

• An essential feature of science– autonomy of fields– differing granularity and scope of focus– growth of fields requires new terms

• A feature of technological process– standards require stability– yesterday’s innovations are today’s infrastructure– today’s innovations are tomorrow’s infrastructure

• Must be dealt with explicitly– sharing, integration, and aggregation are essential

– large quantities of data require precision

Page 5: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 5

Semantic Mismatches

Autonomous sources in all domains have• Differing viewpoints ( by source )

– differing terms for similar items { lorry, truck }

– same terms for dissimilar items trunk( luggage, car)

– differing coverage vehicles ( DMV, police, AIA )

– differing granularity trucks ( shipper, manuf. )

– different scope student ( museum fee, Stanford )

– different hierarchical structures supplier vs. usage

• Hinders use of information from disjoint sources – missed linkages loss of information, opportunities– irrelevant linkages overload on user or application program

• Poor precision when merged

Ok for web browsing , poor for business & science

Page 6: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 6

Heterogeneity among Domains is natural

Interoperation creates mismatch • Autonomy conflicts with consistency,

– Local Needs have Priority,– Outside uses are a Byproduct

Heterogeneity must be addressed• Platform and Operating Systems • Data Representation and Access Conventions • Metadata: Naming and Ontology

– needed to share data from distinct sources

Page 7: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 7

Two Mismatch Solutions

1. A Single, Globally consistent Ontology ( Your Hope ) – wonderful for users and their programs– too many interacting sources– long time to achieve, 2 sources ( UAL, LH ), 3 (+ trucks), 4, … all ? – costly maintenance, since all sources evolve – no world-wide authority to dictate conformance

2. Domain-specific ontologies ( XML DTD assumption )– Small, focused, cooperating groups– high quality, some examples - arthritis, Shakespeare plays

– allows sharable, formal tools – ongoing, local maintenance affecting users - annual updates

– poor interoperation, users still face inter-domain mismatches

Page 8: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 8

Our approach (SKC project)

1. Define Terminology in a domain precisely Schemas, XML DTDs Ontologies

2. Develop methods to permit interoperation among differing domains (not integration) Articulation --- support the limited interoperation

needed to solve problems in an application domain Ontology Algebra --- enable scalability to as many

sources as are needed to support applications

3. Develop tools to support the methods Ontology matching

Page 9: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 9

An Ontology Algebra

A knowledge-based algebra for ontologies

The Articulation Ontology (AO) consists of matching rules that link domain ontologies

Intersection create a subset ontology keep sharable entries

Union create a joint ontology merge entries

Difference create a distinct ontology remove shared entries

The glue that holds the bricks together

Page 10: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 10

Sample Operation: INTERSECTION

Source Domain 1:Owned and maintained by Store

Result contains shared termsArticulates the two domains

Source Domain 2:Owned and maintainedby Factory

Terms usefulfor purchasing

Page 11: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 11

Sample Intersections

Shoe Store• Shoes { . . . }• Customers { . . . }• Employees { . . . }

size = size color =table(colcode)

style = style

Ana-tomy {. . . }

• Material inventory {...}• Employees { . . . }• Machinery { . . . }• Processes { . . . }• Shoes { . . . }

Shoe Factory

Hard-ware

Articulation ontologymatching rules :

foot = foot Employees Employees Nail (toe, foot) Nail (fastener). . . . . .

Department Store

Page 12: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 12

Within a Domain Terms have clear Meanings

• a domain will contain many objects

• the object configuration is consistent

• within a domain all terms are consistent &

• relationships among objects are consistent

• context is implicit

No committee is needed to forge compromises * within a domain

* Compromises hide valuable details

Domain Ontology

Page 13: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 13

SKC grounded definition .

• Ontology: a set of terms and their relationships• Term: a reference to real-world and abstract objects• Relationship: a named and typed set of links between objects• Reference: a label that names objects• Abstract object: a concept which refers to other objects• Real-world object: an entity instance with a physical manifestation (or its representation in a factual database)

Page 14: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 14

Grounding enables implementation

• We use many abstract terms in our work– Needed because we are dealing with many objects

– Human thinking is limited to short-term memory

• Someone must be able to translate them into code reliably

– Each abstract term must have a path to reality

– One must provide that path for

– students and

– coders

• Without a clear path that is not possible– Not automatically at all – machines need specs

– Not reliably by human programmers – failures occur

• Without implementation there is no benefit

Page 15: September 2001

04/19/23 SKC Synopsis

INTERSECTION support

Store Ontology

Articulation ontology

Matching rules that use terms from the 2 source domains

Factory Ontology

Terms usefulfor purchasing

Page 16: September 2001

04/19/23 SKC Synopsis

Other Basic Operations

typically priorintersections

UNION: mergingentire ontologies

DIFFERENCE: materialfully under local control

Arti-culation ontology

Page 17: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 17

Features of an algebra

The record of past operations can be kept and reused

(experience: 3 months 1 week for Webster's annual update, 2 weeks for OED (6 x size) [Jannink:01])

Maintenance is enabled by using 1. remote, deep domain expertise

2. rapid recomposition for application domain

Expect also thatOperations can be composedOperations can be rearranged

Alternate arrangements can be evaluatedOptimization is enabled

Page 18: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 18

Sample Processing in the DARPA HPKB challenge

What is the most recent year an OPEC member nation was on the UN security council (SC)?

– SKC resolves 3 Sources

• CIA Factbook ‘96 (nation)

• OPEC (members, dates)

• UN (SC members, years)

– SKC obtains the Correct Answer

• 1996 (Indonesia)

– Other groups obtained more,

but factually wrong answers;

they relied on one global source, the CIA factbook.

– Problems resolved by SKC

* Factbook – a secondary source -- has out of date OPEC & UN SC lists

– Indonesia not listed

– Gabon (left OPEC 1994)* different country names

– Gambia => The Gambia* historical country names

– Yugoslavia• UN lists future security council

members

– Gabon 1999

needed ancillary data

Page 19: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 19

Interoperation via Articulation

Process phases:

At application definition time– Match relevant ontologies where needed– Establish articulation rules among them.– Record the process

At execution time– Perform query rewriting to get to sources– Optimize based on the ontology algebra.

For maintenance– Regenerate rules using the stored formulation

Page 20: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 20

Generation of the articulation rules

Provide library of automatic match heuristics• Lexical Methods – spelling similarity --- commonly used by

others

Structural Methods -- relative graph position Reasoning-based Methods Nexus – a graph we derive from the OED / Websters

links terms based on definitions, not lexical similarity

• Hybrid Methods– Iteratively, with an expert in control

GUI tool to - display matches and - verify generated matches using the human expert - expert can also supply matching rules

Page 21: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 21

Articulation Generator

Driver

Ont1 Ont2

Phrase Relator

Thesaurus

StructuralMatcher Semantic

Network(Nexus)

Context-basedWord Relator

OntA

Human Expert

Being built by Prasenjit Mitra

Page 22: September 2001

04/19/23 SKC Synopsis

Articulationknowledgefor U

U

U

(A B)U

(B C)U

(C E)

Principle of Knowledge Composition

Knowledge resource

B

Knowledge resource

A

Knowledge resource

C

Knowledge resource

D

U

(C D)

U

(B C)

Articulationknowledge

Composed knowledge forapplications using A,B,C,E

Knowledge resource

E

U

(C E)

Legend:

U : union

U

: intersection

Articulationknowledgefor (A B)

U

Page 23: September 2001

04/19/23 SKC Synopsis

Exploiting the result (future plans)

Processing & query evaluation is best performed withinSource Domains & by their engines

Result has linksto source

Avoid n2 problem of interpretermapping [Swartout HPKB year 1]

Page 24: September 2001

04/19/23 SKC Synopsis

Support Domain Specialization

• Knowledge Acquisition (20% effort) &• Knowledge Maintenance (80% effort *)

to be performed• Domain specialists (SMEs)• Professional organizations• Field teams

of modest size

Empowermentautomouslymaintainable

* based on experience with software

Page 25: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 25

Domain-specific Expertise .

Knowledge needed is huge

• Partition into natural domains

• Determine domain responsibility

and authority

• Empower domain owners

• Exploit domain-specific expertise

• Provide computer-science tools

Consider interaction

Society ofSociety of specialistsspecialists

Our OntologyOur Ontology

Page 26: September 2001

04/19/23 SKC Synopsis Gio Wiederhold 26

SKC Project Synopsis

• Research Objective: – Precise information for applications from

heterogeneous, imperfect, scalably many data sources

• Sources for Ontologies used currently:– General: CIA World Factbook ‘96, www.UN, www.OPEC

Webster’s Dictionary, Thesaurus, Oxford English Dictionary– Topical: NATO, BattleSpace Sensors, Logistics Servers

• Theory: – Domain autonomy and exploitation– Rule-based algebra over ontologies– Translation & Composition primitives

• Sponsors and collaboration – AFOSR; DARPA DAML program; W3C; Stanford KSL and SMI;

Univ. of Karlsruhe, Germany; others.