ontologies in data and application integration – an update

62
Ontologies in Data and Application Ontologies in Data and Application Integration – an Update Integration – an Update Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems (DAKS) San Diego Supercomputer Center University of California San Diego http://www.geongrid.org

Upload: shauna

Post on 15-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Ontologies in Data and Application Integration – an Update. Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems (DAKS) San Diego Supercomputer Center University of California San Diego. http://www.geongrid.org. Outline. Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontologies in Data and Application Integration – an Update

Ontologies in Data and Ontologies in Data and Application Integration – an Application Integration – an

UpdateUpdateKai Lin

Bertram Ludäscher

Knowledge-Based Information Systems Lab

Data and Knowledge Systems (DAKS)San Diego Supercomputer CenterUniversity of California San Diego

http://www.geongrid.org

Page 2: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 2

Outline

1. Motivation

2. Ontology Cheat Sheet

3. Ontology-enabled Prototypes and Tools

4. Data & Service Registration (Structural + Semantic)

5. Scientific Workflows

Page 3: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 3

Page 4: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 4

Ontology Cheat Sheet (1/2)

• What is an ontology? An ontology usually … – specifies a theoryspecifies a theory (a set of modelsmodels) by …– definingdefining and relatingrelating …– conceptsconcepts representing features of a domain of interest

• Also an overloaded (sometimes sloppy) term for:– Controlled vocabularies– Database schema (relational, XML, …)– Conceptual schema (ER, UML, … )– Thesauri (synonyms, broader term/narrower term)– Taxonomies– Informal/semi-formal representations

• “Concept spaces”, “concept maps”• Labeled graphs / semantic networks (RDF)

– Formal ontologies, e.g., in [Description] Logic (OWL)• “formalization of a specification” constrains possible interpretation of terms

Page 5: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 5

A Multi-Hierarchical Rock Classification “Ontology” (GSC)

Composition

Genesis

Fabric

Texture

Page 6: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 6

Ontology Cheat Sheet (2/2)

• What are ontologies used for? – Conceptual models of a domain or application,

(communication means, system design, …)– Classification of …

• concepts (taxonomy) and • data/object instances through classes

– Analysis of ontologies e.g.• Graph queries (reachability, path queries, …)• Reasoning (concept subsumption, consistency checking, …)

– Targets for semantic data registration– Conceptual indexes and views for

• searching,• browsing, • querying, and • integration of registered data

Page 7: Ontologies in Data and Application Integration – an Update

Application Example: Geologic Map Integration

domainknowledge

domainknowledge

Knowledge r

epresentatio

n

Ontologies!?

NevadaNevada

Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy

GEON Metamorphism Equation:

+/- a few hundred million years

Page 8: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 8

Geologic Map Integration in the Portal

• After registering datasets, ontologies (here: “classes”), and an application (“OMI”), the datasets can be searched and displayed in an integrated way.

Page 9: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 9

Concept-Based Queries and Analysis

• After registering a source with one or more ontologies, concept-based queries and analysis can be launched

• Here: light-weight client-side processing (SVG)

Page 10: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 10

Ontologies and Data Management

• Where do ontologies fit within data management architectures?

• Several answers, specifically:– An ontology is similar to a schema or conceptual model if

one exists, but is– Developed independently of a particular application– Probably given in a different language– Inherently more general– Usually not a very good schema (weak structure)

Page 11: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 11

Ontologies and Data Management( watch out for Semantic Data Registration later)

Schema Schema Schema Schema

ConceptualModel

ConceptualModel

Ontology

Data

Metadata

DesignArtifact

use concepts from(explicitly or implicitly)

Page 12: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 12

Creating and Sharing Concept Maps (here: Seismology concept map & Cmap

tool)

• Lock up scientists for 2+ days• Add CS/KRDB types• Create concept maps• Refine• Iterate from napkin drawings, to

concept maps, to ontologies

Page 13: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 13

Page 14: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 14

Page 15: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 15

Page 16: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 16

Graph (RDF) Queries on Ontologies

visualisation

RQL Query:Show all “products”

Query Results

Page 17: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 17

Community-Based Ontology Development

• Draft of a geochemistry ontology developed by scientists

Current concept maps and emerging ontologies:1. Igneous Rocks/Plutons2. Seismology3. Geochemistry

Page 18: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 18

Protégé (… not so ezOWL yet…)

Page 19: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 19

Sparrow (a poor man’s OWL tool …)

Simple ASCII-based RDF and OWL entry and manipulation

Page 20: Ontologies in Data and Application Integration – an Update

Semantic Data Registration(joint work w/ Shawn Bowers)

Page 21: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 21

What is Data/Ontology/… Registration?• A A mechanismmechanism by which by which data sources, data sources,

ontologies, services,ontologies, services, … …

• … … are are publishedpublished in a repository/registryin a repository/registry

• for the purpose of “smart” for the purpose of “smart” discoverydiscovery, , queryingquerying, , integrationintegration

Page 22: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 22

Things to Register

• Data files (individual files)– Shapefile as a blob (+ file type)

• Collections (of files; nested; eg satellite data)

• Databases (has schema and can be queried)– Shapefile with schema registered

• Ontologies• Services (web + grid services)• Other/external applications

Page 23: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 23

Connecting Datasets to Ontologies

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

DataCollectionEventMeasurement

MeasurementContextMeasurableItem

SpeciesCountSpeciesAbundance

AbundanceCollectionEventLocation

LTERSiteSBLTERSite

{naples,…}

⊑ contains.Measurement⊑ measureOf.MeasurableItem ⊓ hasContext.MeasurementContext

⊑ hasTime.DateTime ⊓ hasLocation.Location ⊑ hasUnit.Unit ⊓ hasValue.UnitValue ⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit

… ⊑ Measurement ⊓ measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance ⊑ position.Coordinate ⊑ Location ⊑ LTERSite ⊓ position.SBLTERCoordinate ⊑ SBLTERSite

How can we “register”the dataset to concepts in the Ontology?

Ontology (snippet)

Dataset

Page 24: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 24

Step1: Selecting Relevant Concepts

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Concepts from an Ontology

Dataset

• DataCollectionEvent• AbundanceCollectionEvent

• Measurement• Abundance

• SpeciesAbundance

• MeasurableItem• SpeciesCount

• Location• LTERSite

• SBLTERSite• naples

• Species• …

• MeasurementContext• …

Page 25: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 25

Step1: Selecting Relevant Concepts

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Concepts from an Ontology

Dataset

• DataCollectionEvent• AbundanceCollectionEvent

• Measurement• Abundance

• SpeciesAbundance

• MeasurableItem• SpeciesCount

• Location• LTERSite

• SBLTERSite• naples

• Species• …

• MeasurementContext• …

Page 26: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 26

Step2: Generate Object ModelConcepts from an Ontology

AbundanceCollection Event

SpeciesAbundance

containsSpeciesCount

measureOf

Species

hasSpecies

RatioUnit

hasUnit

RatioValue

hasValue

DateTime SBLTERSite

hasTime hasLoc

• DataCollectionEvent• AbundanceCollectionEvent

• Measurement• Abundance

• SpeciesAbundance

• MeasurableItem• SpeciesCount

• Location• LTERSite

• SBLTERSite• naples

• Species• …

• MeasurementContext• …

Page 27: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 27

Page 28: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 28

Page 29: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 29

Applications of Semantic Registration• Mentioned before:

– Smart data discovery, integration etc.

• New application:– Generating data transformation semi-

automatically for chaining together computational services

Page 30: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 30

Problem: Service Reusability

• Unless “designed to fit,” independent services are structurally incompatible

• Generally, the source output type will not be a subtype of the target input type

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

(⋠)

Page 31: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 31

Service Reusability

• A data transformation mapping () is required to connect the services … artificially creating subtype compatibility

• If such a exists, the services are “structurally feasible”

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

(⋠)

(Ps)(Ps) (≺)

Page 32: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 32

Service Reusability

• Idea: – annotate services with semantic types (concept

expressions) primarily for discovery of services

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

Ontologies (OWL)Ontologies (OWL)

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

Desired Connection

Compatible ( )⊑

Page 33: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 33

Service Reusability

• Services can be semantically compatible, but structurally incompatible

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

Compatible

(⋠)

(⊑)

(Ps)(Ps) (≺)

Ontologies (OWL)Ontologies (OWL)

Page 34: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 34

The Ontology-Driven Framework (work w/ Shawn Bowers, SEEK)

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

RegistrationMapping (Output)

RegistrationMapping (Input)

CorrespondenceCorrespondence

Generate (Ps)(Ps)

Ontologies (OWL)Ontologies (OWL)

Transformation

Page 35: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 35

Example Generated Data Transformation (in XQuery)

• Based on the structural correspondences and certain assumptions, we derive the transformation query:

<cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> }</cohortTable>

Page 36: Ontologies in Data and Application Integration – an Update

Scientific Workflows(Efrat Jaeger et al.)

Page 37: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 37

Reverse Engineering a Scientific Workflow using the KEPLER Tool (Efrat Jaeger)

Page 38: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 38

A Scientific Workflow in Kepler

Extract mineral composition for row Id.

Igneous Rock Diagrams information.

Rock Name.

Page 39: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 39

A Scientific Workflow in Kepler

Page 40: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 40

A Scientific Workflow in Kepler

Page 41: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 41

Page 42: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 42

Reverse-Engineered the Geological Map Integration in Kepler

Page 43: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 43

DataMapper Sub-Workflow

Page 44: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 44

Result launched via the BrowserUI actor

Page 45: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 45

KEPLER and YOU

• Kepler …– is a community-based, cross-project,

open source collaboration– for “minute made” application

integration– using web (grid) services as basic

building blocks– has a joint CVS repository, mailing

lists, web site, …– is gaining momentum thanks to

contributors and contributions• BSD-style license allows commercial

spin-offs – a pre-packaged, shrink-wrapped

version (“Kepler-to-GO”) coming soon to a place near you…

Page 46: Ontologies in Data and Application Integration – an Update

F I N – Questions?

Page 47: Ontologies in Data and Application Integration – an Update

Additional Material

Page 48: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 48

The KEPLER GUI (Vergil from Ptolemy II)

Drag and drop utilities, director and actor libraries.

Page 49: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 49

Running the workflow

Page 50: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 50

Distributed Workflows in KEPLER

• Web and Grid Service plug-ins– WSDL– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard– SRB– SSH, SCP

• Web Service Harvester– Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors

• XSLT and XQuery transformers to link non-fitting services together

• Web Service Deployment (…ongoing work…)

Page 51: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 51

A Generic Web Service Actor

Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.

Configure - select service operation

Page 52: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 52

Set Parameters and Commit

Set parameters and commit

Page 53: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 53

WS Actor after Instantiation

Page 54: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 54

Web Service Harvester

• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.

Page 55: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 55

Composing 3rd-Party WSs

Output of previousweb service

User interaction &Transformations

Input of next web service

Page 56: Ontologies in Data and Application Integration – an Update

Providing DB Access through Kepler

• Database connection actor: – Opening a database connection and passing it to all actors

accessing this database.

• Database query actor:– A generic actor that queries a database and provides its

result.

• DBConnection type and DBConnectionToken:– A new IOPort type and a token to distinguish a database

connection from any general type.

Page 57: Ontologies in Data and Application Integration – an Update

Database Connection Actor

OpenDBConnection actor:

• Input: database connection information.• Output: A DBConnectionToken, a reference

to a database connection instance, through a DBConnection output port.

Page 58: Ontologies in Data and Application Integration – an Update

Database Query Actor

Database Query actor:

Input: A query string (SQL) and a database connection reference.

Parameters: output type – XML, Record or String. output each row separately or all at once.

Process: Execute query. Produce results according to parameters.

Page 59: Ontologies in Data and Application Integration – an Update

Querying Example

Page 60: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 60

Resource Description Framework (RDF)

Simple data model that consists of– Resources (uniquely identified via URIs)– Properties – Values (resources or character strings)

Data organized into triples (subject, property, value)

SonomaRegion CaliforniaRegionlocatedIn

Subject(Resource)

Value(Resource)

Property(Resource)

locatedIn(SonomaRegion, California)

Page 61: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 61

RDF Schema

Adds a set of pre-defined properties to define classes and properties

Allows instances to be connected to classes

Sub-class and sub-property (is-a) relationships

SonomaRegion CaliforniaRegionlocatedIn

Region

rdf:type rdf:type

locatedInRegion is a classlocatedIn is a propertylocatedIn connects Regions

Page 62: Ontologies in Data and Application Integration – an Update

GEON PI Meeting, VTech March 21—23rd 2004 62

OWL

Adds additional pre-defined properties to further constrain an ontology(See http://www.w3.org/TR/owl-guide/)

Note, RDF(S) and OWL use XMLSome graphic tools exist (e.g., Protégé)

<owl:Class rdf:ID="Vintage"> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasVintageYear"/> <owl:cardinality>1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> </owl:Class>

A Vintage is a class that is a subclass of an unnamed class whose instances always have

one hasVintageYear property.

Note the uglified XML syntax…The good news: meant for

parsers, not humans!