ontologies in data and application integration – an update

Post on 15-Jan-2016

42 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Ontologies in Data and Application Integration – an Update. Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems (DAKS) San Diego Supercomputer Center University of California San Diego. http://www.geongrid.org. Outline. Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Ontologies in Data and Ontologies in Data and Application Integration – an Application Integration – an

UpdateUpdateKai Lin

Bertram Ludäscher

Knowledge-Based Information Systems Lab

Data and Knowledge Systems (DAKS)San Diego Supercomputer CenterUniversity of California San Diego

http://www.geongrid.org

GEON PI Meeting, VTech March 21—23rd 2004 2

Outline

1. Motivation

2. Ontology Cheat Sheet

3. Ontology-enabled Prototypes and Tools

4. Data & Service Registration (Structural + Semantic)

5. Scientific Workflows

GEON PI Meeting, VTech March 21—23rd 2004 3

GEON PI Meeting, VTech March 21—23rd 2004 4

Ontology Cheat Sheet (1/2)

• What is an ontology? An ontology usually … – specifies a theoryspecifies a theory (a set of modelsmodels) by …– definingdefining and relatingrelating …– conceptsconcepts representing features of a domain of interest

• Also an overloaded (sometimes sloppy) term for:– Controlled vocabularies– Database schema (relational, XML, …)– Conceptual schema (ER, UML, … )– Thesauri (synonyms, broader term/narrower term)– Taxonomies– Informal/semi-formal representations

• “Concept spaces”, “concept maps”• Labeled graphs / semantic networks (RDF)

– Formal ontologies, e.g., in [Description] Logic (OWL)• “formalization of a specification” constrains possible interpretation of terms

GEON PI Meeting, VTech March 21—23rd 2004 5

A Multi-Hierarchical Rock Classification “Ontology” (GSC)

Composition

Genesis

Fabric

Texture

GEON PI Meeting, VTech March 21—23rd 2004 6

Ontology Cheat Sheet (2/2)

• What are ontologies used for? – Conceptual models of a domain or application,

(communication means, system design, …)– Classification of …

• concepts (taxonomy) and • data/object instances through classes

– Analysis of ontologies e.g.• Graph queries (reachability, path queries, …)• Reasoning (concept subsumption, consistency checking, …)

– Targets for semantic data registration– Conceptual indexes and views for

• searching,• browsing, • querying, and • integration of registered data

Application Example: Geologic Map Integration

domainknowledge

domainknowledge

Knowledge r

epresentatio

n

Ontologies!?

NevadaNevada

Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy

GEON Metamorphism Equation:

+/- a few hundred million years

GEON PI Meeting, VTech March 21—23rd 2004 8

Geologic Map Integration in the Portal

• After registering datasets, ontologies (here: “classes”), and an application (“OMI”), the datasets can be searched and displayed in an integrated way.

GEON PI Meeting, VTech March 21—23rd 2004 9

Concept-Based Queries and Analysis

• After registering a source with one or more ontologies, concept-based queries and analysis can be launched

• Here: light-weight client-side processing (SVG)

GEON PI Meeting, VTech March 21—23rd 2004 10

Ontologies and Data Management

• Where do ontologies fit within data management architectures?

• Several answers, specifically:– An ontology is similar to a schema or conceptual model if

one exists, but is– Developed independently of a particular application– Probably given in a different language– Inherently more general– Usually not a very good schema (weak structure)

GEON PI Meeting, VTech March 21—23rd 2004 11

Ontologies and Data Management( watch out for Semantic Data Registration later)

Schema Schema Schema Schema

ConceptualModel

ConceptualModel

Ontology

Data

Metadata

DesignArtifact

use concepts from(explicitly or implicitly)

GEON PI Meeting, VTech March 21—23rd 2004 12

Creating and Sharing Concept Maps (here: Seismology concept map & Cmap

tool)

• Lock up scientists for 2+ days• Add CS/KRDB types• Create concept maps• Refine• Iterate from napkin drawings, to

concept maps, to ontologies

GEON PI Meeting, VTech March 21—23rd 2004 13

GEON PI Meeting, VTech March 21—23rd 2004 14

GEON PI Meeting, VTech March 21—23rd 2004 15

GEON PI Meeting, VTech March 21—23rd 2004 16

Graph (RDF) Queries on Ontologies

visualisation

RQL Query:Show all “products”

Query Results

GEON PI Meeting, VTech March 21—23rd 2004 17

Community-Based Ontology Development

• Draft of a geochemistry ontology developed by scientists

Current concept maps and emerging ontologies:1. Igneous Rocks/Plutons2. Seismology3. Geochemistry

GEON PI Meeting, VTech March 21—23rd 2004 18

Protégé (… not so ezOWL yet…)

GEON PI Meeting, VTech March 21—23rd 2004 19

Sparrow (a poor man’s OWL tool …)

Simple ASCII-based RDF and OWL entry and manipulation

Semantic Data Registration(joint work w/ Shawn Bowers)

GEON PI Meeting, VTech March 21—23rd 2004 21

What is Data/Ontology/… Registration?• A A mechanismmechanism by which by which data sources, data sources,

ontologies, services,ontologies, services, … …

• … … are are publishedpublished in a repository/registryin a repository/registry

• for the purpose of “smart” for the purpose of “smart” discoverydiscovery, , queryingquerying, , integrationintegration

GEON PI Meeting, VTech March 21—23rd 2004 22

Things to Register

• Data files (individual files)– Shapefile as a blob (+ file type)

• Collections (of files; nested; eg satellite data)

• Databases (has schema and can be queried)– Shapefile with schema registered

• Ontologies• Services (web + grid services)• Other/external applications

GEON PI Meeting, VTech March 21—23rd 2004 23

Connecting Datasets to Ontologies

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

DataCollectionEventMeasurement

MeasurementContextMeasurableItem

SpeciesCountSpeciesAbundance

AbundanceCollectionEventLocation

LTERSiteSBLTERSite

{naples,…}

⊑ contains.Measurement⊑ measureOf.MeasurableItem ⊓ hasContext.MeasurementContext

⊑ hasTime.DateTime ⊓ hasLocation.Location ⊑ hasUnit.Unit ⊓ hasValue.UnitValue ⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit

… ⊑ Measurement ⊓ measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance ⊑ position.Coordinate ⊑ Location ⊑ LTERSite ⊓ position.SBLTERCoordinate ⊑ SBLTERSite

How can we “register”the dataset to concepts in the Ontology?

Ontology (snippet)

Dataset

GEON PI Meeting, VTech March 21—23rd 2004 24

Step1: Selecting Relevant Concepts

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Concepts from an Ontology

Dataset

• DataCollectionEvent• AbundanceCollectionEvent

• Measurement• Abundance

• SpeciesAbundance

• MeasurableItem• SpeciesCount

• Location• LTERSite

• SBLTERSite• naples

• Species• …

• MeasurementContext• …

GEON PI Meeting, VTech March 21—23rd 2004 25

Step1: Selecting Relevant Concepts

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57

Concepts from an Ontology

Dataset

• DataCollectionEvent• AbundanceCollectionEvent

• Measurement• Abundance

• SpeciesAbundance

• MeasurableItem• SpeciesCount

• Location• LTERSite

• SBLTERSite• naples

• Species• …

• MeasurementContext• …

GEON PI Meeting, VTech March 21—23rd 2004 26

Step2: Generate Object ModelConcepts from an Ontology

AbundanceCollection Event

SpeciesAbundance

containsSpeciesCount

measureOf

Species

hasSpecies

RatioUnit

hasUnit

RatioValue

hasValue

DateTime SBLTERSite

hasTime hasLoc

• DataCollectionEvent• AbundanceCollectionEvent

• Measurement• Abundance

• SpeciesAbundance

• MeasurableItem• SpeciesCount

• Location• LTERSite

• SBLTERSite• naples

• Species• …

• MeasurementContext• …

GEON PI Meeting, VTech March 21—23rd 2004 27

GEON PI Meeting, VTech March 21—23rd 2004 28

GEON PI Meeting, VTech March 21—23rd 2004 29

Applications of Semantic Registration• Mentioned before:

– Smart data discovery, integration etc.

• New application:– Generating data transformation semi-

automatically for chaining together computational services

GEON PI Meeting, VTech March 21—23rd 2004 30

Problem: Service Reusability

• Unless “designed to fit,” independent services are structurally incompatible

• Generally, the source output type will not be a subtype of the target input type

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

(⋠)

GEON PI Meeting, VTech March 21—23rd 2004 31

Service Reusability

• A data transformation mapping () is required to connect the services … artificially creating subtype compatibility

• If such a exists, the services are “structurally feasible”

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

(⋠)

(Ps)(Ps) (≺)

GEON PI Meeting, VTech March 21—23rd 2004 32

Service Reusability

• Idea: – annotate services with semantic types (concept

expressions) primarily for discovery of services

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

Ontologies (OWL)Ontologies (OWL)

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

Desired Connection

Compatible ( )⊑

GEON PI Meeting, VTech March 21—23rd 2004 33

Service Reusability

• Services can be semantically compatible, but structurally incompatible

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Incompatible

Compatible

(⋠)

(⊑)

(Ps)(Ps) (≺)

Ontologies (OWL)Ontologies (OWL)

GEON PI Meeting, VTech March 21—23rd 2004 34

The Ontology-Driven Framework (work w/ Shawn Bowers, SEEK)

SourceServiceSourceService

TargetServiceTargetService

Ps Pt

SemanticType Ps

SemanticType Ps

SemanticType Pt

SemanticType Pt

StructuralType Pt

StructuralType Pt

StructuralType Ps

StructuralType Ps

Desired Connection

Compatible ( )⊑

RegistrationMapping (Output)

RegistrationMapping (Input)

CorrespondenceCorrespondence

Generate (Ps)(Ps)

Ontologies (OWL)Ontologies (OWL)

Transformation

GEON PI Meeting, VTech March 21—23rd 2004 35

Example Generated Data Transformation (in XQuery)

• Based on the structural correspondences and certain assumptions, we derive the transformation query:

<cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> }</cohortTable>

Scientific Workflows(Efrat Jaeger et al.)

GEON PI Meeting, VTech March 21—23rd 2004 37

Reverse Engineering a Scientific Workflow using the KEPLER Tool (Efrat Jaeger)

GEON PI Meeting, VTech March 21—23rd 2004 38

A Scientific Workflow in Kepler

Extract mineral composition for row Id.

Igneous Rock Diagrams information.

Rock Name.

GEON PI Meeting, VTech March 21—23rd 2004 39

A Scientific Workflow in Kepler

GEON PI Meeting, VTech March 21—23rd 2004 40

A Scientific Workflow in Kepler

GEON PI Meeting, VTech March 21—23rd 2004 41

GEON PI Meeting, VTech March 21—23rd 2004 42

Reverse-Engineered the Geological Map Integration in Kepler

GEON PI Meeting, VTech March 21—23rd 2004 43

DataMapper Sub-Workflow

GEON PI Meeting, VTech March 21—23rd 2004 44

Result launched via the BrowserUI actor

GEON PI Meeting, VTech March 21—23rd 2004 45

KEPLER and YOU

• Kepler …– is a community-based, cross-project,

open source collaboration– for “minute made” application

integration– using web (grid) services as basic

building blocks– has a joint CVS repository, mailing

lists, web site, …– is gaining momentum thanks to

contributors and contributions• BSD-style license allows commercial

spin-offs – a pre-packaged, shrink-wrapped

version (“Kepler-to-GO”) coming soon to a place near you…

F I N – Questions?

Additional Material

GEON PI Meeting, VTech March 21—23rd 2004 48

The KEPLER GUI (Vergil from Ptolemy II)

Drag and drop utilities, director and actor libraries.

GEON PI Meeting, VTech March 21—23rd 2004 49

Running the workflow

GEON PI Meeting, VTech March 21—23rd 2004 50

Distributed Workflows in KEPLER

• Web and Grid Service plug-ins– WSDL– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard– SRB– SSH, SCP

• Web Service Harvester– Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors

• XSLT and XQuery transformers to link non-fitting services together

• Web Service Deployment (…ongoing work…)

GEON PI Meeting, VTech March 21—23rd 2004 51

A Generic Web Service Actor

Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.

Configure - select service operation

GEON PI Meeting, VTech March 21—23rd 2004 52

Set Parameters and Commit

Set parameters and commit

GEON PI Meeting, VTech March 21—23rd 2004 53

WS Actor after Instantiation

GEON PI Meeting, VTech March 21—23rd 2004 54

Web Service Harvester

• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.

GEON PI Meeting, VTech March 21—23rd 2004 55

Composing 3rd-Party WSs

Output of previousweb service

User interaction &Transformations

Input of next web service

Providing DB Access through Kepler

• Database connection actor: – Opening a database connection and passing it to all actors

accessing this database.

• Database query actor:– A generic actor that queries a database and provides its

result.

• DBConnection type and DBConnectionToken:– A new IOPort type and a token to distinguish a database

connection from any general type.

Database Connection Actor

OpenDBConnection actor:

• Input: database connection information.• Output: A DBConnectionToken, a reference

to a database connection instance, through a DBConnection output port.

Database Query Actor

Database Query actor:

Input: A query string (SQL) and a database connection reference.

Parameters: output type – XML, Record or String. output each row separately or all at once.

Process: Execute query. Produce results according to parameters.

Querying Example

GEON PI Meeting, VTech March 21—23rd 2004 60

Resource Description Framework (RDF)

Simple data model that consists of– Resources (uniquely identified via URIs)– Properties – Values (resources or character strings)

Data organized into triples (subject, property, value)

SonomaRegion CaliforniaRegionlocatedIn

Subject(Resource)

Value(Resource)

Property(Resource)

locatedIn(SonomaRegion, California)

GEON PI Meeting, VTech March 21—23rd 2004 61

RDF Schema

Adds a set of pre-defined properties to define classes and properties

Allows instances to be connected to classes

Sub-class and sub-property (is-a) relationships

SonomaRegion CaliforniaRegionlocatedIn

Region

rdf:type rdf:type

locatedInRegion is a classlocatedIn is a propertylocatedIn connects Regions

GEON PI Meeting, VTech March 21—23rd 2004 62

OWL

Adds additional pre-defined properties to further constrain an ontology(See http://www.w3.org/TR/owl-guide/)

Note, RDF(S) and OWL use XMLSome graphic tools exist (e.g., Protégé)

<owl:Class rdf:ID="Vintage"> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasVintageYear"/> <owl:cardinality>1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> </owl:Class>

A Vintage is a class that is a subclass of an unnamed class whose instances always have

one hasVintageYear property.

Note the uglified XML syntax…The good news: meant for

parsers, not humans!

top related