scientific data integration with model-based mediation : databases meets * knowledge representation...

33
Scientific Data Integration with Scientific Data Integration with Model-Based Mediation Model-Based Mediation : : Databases Meets Databases Meets * Knowledge Knowledge Representation Representation Bertram Lud Bertram Lud ä ä scher scher [email protected] [email protected] Knowledge-Based Integration Lab Knowledge-Based Integration Lab Data and Knowledge Systems Data and Knowledge Systems San Diego Supercomputer Center San Diego Supercomputer Center U.C. San Diego U.C. San Diego * * or rather or rather rediscovers rediscovers

Upload: phillip-shelton

Post on 31-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

Scientific Data Integration withScientific Data Integration with Model-Based Mediation Model-Based Mediation: :

Databases MeetsDatabases Meets** Knowledge Representation Knowledge Representation

Bertram LudBertram Ludää[email protected]@SDSC.EDU

Knowledge-Based Integration LabKnowledge-Based Integration Lab

Data and Knowledge SystemsData and Knowledge Systems

San Diego Supercomputer Center San Diego Supercomputer Center

U.C. San DiegoU.C. San Diego

* * or rather or rather rediscoversrediscovers

Page 2: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

Integration Example from the Database CommunityIntegration Example from the Database Community

User: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?”

?Information Integration

?Information Integration

addall.com Mediator

addall.com Mediator

“One-World”Mediation

“One-World”Mediation

amazon.comamazon.com A1books.comA1books.comhalf.comhalf.combarnes&noble.combarnes&noble.com

Page 3: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

Another Well-Known Data Integration ExampleAnother Well-Known Data Integration Example

What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood

with below-average crime rate and diverse population?

?Information Integration

?Information Integration

RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats

“Multiple-Worlds”Mediation

“Multiple-Worlds”Mediation

Page 4: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Information Integration from a DB Perspective Information Integration from a DB Perspective

• Information Integration ChallengeInformation Integration Challenge– Given: data sources S_1, ..., S_k (DBMS, web sites, ...) and

user questions Q_1,...,Q_n that can be answered using the S_i

– Find: the answers to Q_1, ..., Q_n

• The Database Perspective: source = “database” The Database Perspective: source = “database” S_i has a schema (relational, XML, OO, ...) S_i can be queried define virtual (or materialized) integrated views V over

S_1,...,S_k using database query languages questions become queries Q_i against V(S_1,...,S_k)

• Why a Database Perspective?Why a Database Perspective?– scalability, efficiency, reusability (declarative queries), ...

Page 5: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Abstract (XML-Based) Mediator ArchitectureAbstract (XML-Based) Mediator Architecture

S_1

MEDIATORMEDIATOR

XML Queries & Results

USER/ClientUSER/Client

Wrapper

XML View

S_2

Wrapper

XML View

S_k

Wrapper

XML View

IntegratedXML View V

Integrated ViewDefinition

IVD(S_1,...,S_k)

Query Q o V (S_1,...,S_k)Query Q o V (S_1,...,S_k)

Page 6: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

XMAS: XML Matching And Structuring language

Integrated View Definition:“Find publications from amazon.com and DBLP,

join on author,group by authors and title”

CONSTRUCT <books> <book>

$a1$t<pubs>

$p { $p } </pubs>

</book> { $a1, $t } </books>WHERE <books.book>

$a1 : <author />$t : <title />

</> IN WRAP(“amazon.com”)AND <authors.author>

$a2 : <author /><pubs> $p : <pub/> </>

</> IN WRAP(“www...DBLP…”)AND value( $a1 ) = value( $a2 )

CONSTRUCT <books> <book>

$a1$t<pubs>

$p { $p } </pubs>

</book> { $a1, $t } </books>WHERE <books.book>

$a1 : <author />$t : <title />

</> IN WRAP(“amazon.com”)AND <authors.author>

$a2 : <author /><pubs> $p : <pub/> </>

</> IN WRAP(“www...DBLP…”)AND value( $a1 ) = value( $a2 )

XMAS

XMAS Algebra

Page 7: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Information Integration & Mediation Information Integration & Mediation for Scientific Data for Scientific Data

... a different set of problems (reality) came our way ...... a different set of problems (reality) came our way ...

Page 8: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

A Neuroscientist’s Information Integration ProblemA Neuroscientist’s Information Integration Problem

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

?Information Integration

?Information Integration

protein localization(NCMIR)

protein localization(NCMIR)

neurotransmission(SENSELAB)

neurotransmission(SENSELAB)

sequence info(CaPROT)

sequence info(CaPROT) morphometry

(SYNAPSE)

morphometry(SYNAPSE)

“Complex Multiple-Worlds”

Mediation

“Complex Multiple-Worlds”

Mediation

Page 9: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

A Geoscientist’s Information Integration ProblemA Geoscientist’s Information Integration Problem

What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?

How does it relate to host rock structures?

?Information Integration

?Information Integration

Geologic Map(Virginia)

Geologic Map(Virginia) GeoChemicalGeoChemical GeoPhysical

(gravity contours)

GeoPhysical(gravity contours)

GeoChronologic(Concordia)

GeoChronologic(Concordia)

Foliation Map(structure DB)

Foliation Map(structure DB)

“Complex Multiple-Worlds”

Mediation

“Complex Multiple-Worlds”

Mediation

Page 10: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

DB mediation techniques

OntologiesKR formalisms

Model-Based Mediation

Information Integration LandscapeInformation Integration Landscape

conceptual distanceone-world multiple-worlds

conceptual complexity/depth

low

high

addallbook-buyer

BLAST

EcoCyc

Cyc

WordNet

GO

home-buyer24x7 consumer

NCBI UMLS

MIA Entrez

RiboWeb

Tambis

BioinformaticsGeoinformatics

Page 11: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

What’s the Problem with XML & Complex Multiple-Worlds?What’s the Problem with XML & Complex Multiple-Worlds?

• XML is SyntaxXML is Syntax– DTDs talk about element nesting– XML Schema schemas give you data types – need anything else? => write comments!

• Domain Semantics is complex:Domain Semantics is complex:– implicit assumptions, hidden semantics sources seem unrelated to the non-expert

• Need Structure and Semantics beyond XML trees!Need Structure and Semantics beyond XML trees! employ richer OO models (UML, EER, ...) make domain semantics and “glue knowledge” explicit use ontologies to fix terminology and conceptualization avoid ambiguities by using formal semantics

Page 12: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

XML-Based vs. Model-Based MediationXML-Based vs. Model-Based Mediation

Raw DataRaw DataRaw Data

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM :=

CM-QL(Src1-CM,...)

Integrated-CM :=

CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

XMLElements

XML Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

Glue Maps

DMs, PMs

Glue Maps

DMs, PMs

Integrated-DTD :=

XML-QL(Src1-DTD,...)

Integrated-DTD :=

XML-QL(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

Page 13: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

What’s the Glue? What’s in a Link? What’s the Glue? What’s in a Link?

• Syntactic Joins Syntactic Joins (X,Y) := X.SSN = Y.SSN equality (X,Y) := X.UMLS-ID = Y.UID

• ““Speciality” JoinsSpeciality” Joins (X,Y,Score) := BLAST(X,Y,Score) similarity

• Semantic/Rule-Based JoinsSemantic/Rule-Based Joins (X,Y,C) :=

X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub

(X,Y,[produces,B,increased_in]) :=

X produces B, B increased_in Y. rule-based

e.g., X=-secretase, B=beta amyloid, Y=Alzheimer’s disease

XY

Page 14: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Model-Based Mediation Methodology ...Model-Based Mediation Methodology ...

• LiftLift Sources to export Sources to export Conceptual ModelsConceptual Models (CMs): (CMs): CM(S) = OM(S) + KB(S) + CON(S)

• Object Model OM(Object Model OM(SS):):– complex objects (frames), class hierarchy, OO constraints

• Knowledge Base KB(Knowledge Base KB(SS):):– explicit representation of (“hidden”) source semantics – logic rules over OM(S)

• Contextualization CON(Contextualization CON(SS):):– situate OM(S) data using “glue maps” (GMs): domain maps DMs (ontology)

= terminological knowledge: concepts + roles process maps PMs

= “procedural knowledge”: states + transitions

Page 15: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

... Model-Based Mediation Methodology... Model-Based Mediation Methodology

• Integrated View Definition (IVD)Integrated View Definition (IVD)– declarative (logic) rules with object-oriented features

– defined over CM(S), domain maps, process maps

– needs “mediation engineers” = domain + KRDB experts

• Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime):– mediator composes the user query Q with the IVD

... rewrites (Q o IVD), sends subqueries to sources

... post-processes returned results (e.g., situate in context)

Page 16: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

S1 S2

S3

(XML-Wrapper) (XML-Wrapper) (XML-Wrapper)

CM-Wrapper CM-Wrapper CM-Wrapper

USER/ClientUSER/Client

CM (Integrated View)

MediatorEngine

FL rule proc.

LP rule proc.

Graph proc.DDB engine

CM(S) =OM(S)+KB(S)+CON(S)

GCM

CM S1

GCM

CM S2

GCM

CM S3

CM Queries & Results (exchanged in XML)

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Domain MapsDMs

Process MapsPMs

“Glue” MapsGMs

semanticcontextCON(S)

Integrated View Definition IVD

Model-Based Mediator Architecture

First results:KIND prototype, formal

DM semantics, PMs[SSDBM00] [VLDB00][ICDE01] [NIH-HB01]

[EDBT02], ...

BIRN-CC, ...

Page 17: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Domain Maps & Ontologies as Domain Maps & Ontologies as “Glue Knowledge Sources” “Glue Knowledge Sources”

• Domain Map Domain Map Ontology Ontology– conceptualization of relevant entities and relationships– formal representation of terminological knowledge

• Use in Model-Based MediationUse in Model-Based Mediation– (derived) concepts as “drop points”, “anchor points”, “context”

for source classes– compile-time use:

• view definition, subsumption, classification,...

– runtime use: • querying/deduction, path queries, ....

• KR Formalisms:KR Formalisms:– Semantic nets, Thesauri, Frame-Logic, Description Logics, ...

Page 18: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Domain Experts’ “Glue Knowledge”Domain Experts’ “Glue Knowledge”

Cerebellum

Source 1 Source 2

Source 3

Cerebellar Cortex

Granule Cell Layer

Purkinje Cell layer

Molecular Layer

has a

Purkinje Cell Dendrite

Dendritic spines

Dendritic shaft

Endoplasmic reticulum

Purkinje Neuron

has a

Page 19: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

NCMIR ANATOM NCMIR ANATOM Domain Map:Domain Map:

• conceptsconcepts• relationsrelations• logic ruleslogic rules

Page 20: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Formalizing Glue Knowledge:Formalizing Glue Knowledge:Domain Map for Domain Map for SYNAPSESYNAPSE and and NCMIRNCMIR

Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)

Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)

Domain Map (DM)

Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).

Domain Expert Knowledge

DM in Description Logic

Page 21: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Source Contextualization & DM RefinementSource Contextualization & DM Refinement

In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...

sources can register new concepts at the mediator ...

Page 22: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

Query Processing Query Processing “Demo”“Demo”

Query resultsin context

ContextualizationCON(Result) wrt. ANATOM.

Integrated View DefinitionIntegrated View DefinitionIntegrated View DefinitionIntegrated View DefinitionDERIVEDERIVEprotein_distributionprotein_distribution((ProteinProtein, , OrganismOrganism, , Brain_regionBrain_region, , Feature_nameFeature_name, , AnatomAnatom, ,

ValueValue) ) IFIFI:I:protein_label_image[protein_label_image[ proteins ->> {Protein}; organism -> Organism; proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>anatomical_structures ->>{AS:{AS:anatomical_structure[anatomical_structure[name->Anatomname->Anatom]]}}] ] , , % from PROLAB% from PROLAB

NAE:NAE:neuro_anatomic_entity[neuro_anatomic_entity[name->Anatom; name->Anatom; % from ANATOM% from ANATOM located_in->>{Brain_region}located_in->>{Brain_region}]], , AS..segments..featuresAS..segments..features[[name->Feature_name; value->Valuename->Feature_name; value->Value]]. .

DERIVEDERIVEprotein_distributionprotein_distribution((ProteinProtein, , OrganismOrganism, , Brain_regionBrain_region, , Feature_nameFeature_name, , AnatomAnatom, ,

ValueValue) ) IFIFI:I:protein_label_image[protein_label_image[ proteins ->> {Protein}; organism -> Organism; proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>anatomical_structures ->>{AS:{AS:anatomical_structure[anatomical_structure[name->Anatomname->Anatom]]}}] ] , , % from PROLAB% from PROLAB

NAE:NAE:neuro_anatomic_entity[neuro_anatomic_entity[name->Anatom; name->Anatom; % from ANATOM% from ANATOM located_in->>{Brain_region}located_in->>{Brain_region}]], , AS..segments..featuresAS..segments..features[[name->Feature_name; value->Valuename->Feature_name; value->Value]]. .

• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

Page 23: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Process Maps with Process Maps with AbstractionsAbstractions and and ElaborationsElaborations::=> => From Terminological to “Procedural Glue”From Terminological to “Procedural Glue”

• nodes ~ states• edges ~ processes, transitions• blue/red edges:

• processes in Src1/Src2• general form of edges:

how about these?

Page 24: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

What’s in an Answer?What’s in an Answer?((What’s in a Link?What’s in a Link? revisited) revisited)

• Semantic/Rule-Based JoinsSemantic/Rule-Based Joins

(X,Y,[produces,B,increased_in]) :=

X produces B, B increased_in Y. rule-based

e.g., X=-secretase, B=beta amyloid, Y=Alzheimer’s disease

• What is the Erdoes number of person P?What is the Erdoes number of person P?

– 3

• Really? Why?Really? Why?– authority based: <VIP> said so

– faith based: don’t know but believe firmly

– query statement Q = ... derived it from DB

– query Q = ... derived it from DB and KB using derivation D logic-based systems often “come with explanations” ultimate goal: “computations as proofs”, “explanation-based computing”

XY

Page 25: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Summary: Mediation Scenarios & TechniquesSummary: Mediation Scenarios & Techniques

Federated Databases XML-Based Mediation Model-Based Mediation

One-World One-/Multiple-Worlds Complex Multiple-Worlds

Common Schema Mediated Schema Common Glue Maps

SQL, rules XML query languages DOOD query languages

Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings

Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps

DB expert DB expert KRDB + domain expert

Page 26: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Technical Issues and ChallengesTechnical Issues and Challenges

• Integration Method and ArchitectureIntegration Method and Architecture– federated DBs, warehouse/wrapper-mediator approach,

GAV/LAV, Grid infrastructure, ...

• Suitable KRDB Formalisms and FrameworksSuitable KRDB Formalisms and Frameworks– XML, DTDs, XML Schema, XPath, XQuery, ...

– RDF(S), Ontologies, Description Logics, DAML+OIL, ...

– querying, deduction, subsumption, classification, ...

• Algorithms and ImplementationAlgorithms and Implementation– query composition, rewriting, reasoning, source capabilities, ...

• Information Integration Scenario and ScopeInformation Integration Scenario and Scope– simple/complex, single/multiple worlds, ...

Page 27: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

The Larger Infrastructure / Interoperability PictureThe Larger Infrastructure / Interoperability Picture

• The GridsThe Grids– Data-Grid (SRB, ...), Computational-Grid

(Globus, ...), “Knowledge-Grid”, ...

• The WebsThe Webs– W3C: HTML, XML, Semantic Web (RDF(S),

DAML+OIL, ...)

• Service & Protocol-Oriented Architectures Service & Protocol-Oriented Architectures – WSDL, SOAP, CORBA, EJB, ...

• The Application LevelThe Application Level– applications (computations + KRDB mediation) are

chained together to form ...=> analytical “Knowledge” Pipelines:

• NIH BIRN: LONI, NSF GriPhyN, DOE SciDAC, PDB, ASC, AVIRIS, ...

=> Data =>=> Computations =>

=> Analysis => => Knowledge =>

=> Data =>=> Computations =>

=> Analysis => => Knowledge =>

Page 28: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Thank You!Thank You!

Questions?Questions?Queries?Queries?

Page 29: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Models and Formal Approaches:Models and Formal Approaches:Relating Theory to the WorldRelating Theory to the World

©2000 by John F. Sowa, http://www.jfsowa.com/krbook/, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA.

All models are wrong, but some are useful!

Page 30: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

OntologiesOntologies

• So what is an Ontology?So what is an Ontology?– definition of things that are relevant to your application– representation of terminological knowledge (“TBox”)– explicit specification of a conceptualization– concept hierarchy (“is-a”)– further semantic relationships between concepts– abstractions of relational schemas, (E)ER, UML classes, XML

Schemas

• Examples:Examples:– NCMIR ANATOM– GO (Gene Ontology)– UMLS (Unified Medical Language System– CYC

Page 31: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Description LogicsDescription Logics

• Terminological Knowledge (TBox)Terminological Knowledge (TBox)– Concept Definition (naming of concepts):

– Axiom (constraining of concepts):

=> a mediators “glue knowledge source”

• Assertional Knowledge (ABox)Assertional Knowledge (ABox)– the marked neuron in image 27

=> the concrete instances/individuals of the concepts/classes that your sources export

Page 32: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Description LogicDescription Logic

• DL definition of “Happy Father” DL definition of “Happy Father” (Example from Ian Horrocks, U Manchester, UK)(Example from Ian Horrocks, U Manchester, UK)

Page 33: Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram LudäscherLUDAESCH@SDSC.EDU

National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center

Some Open Database & Knowledge Some Open Database & Knowledge Representation IssuesRepresentation Issues

• Mix of Query Processing and ReasoningMix of Query Processing and Reasoning– FaCT description logic reasoner for DMs?– or reconcilation of DMs via argumentation-frameworks

(“games”) using well-founded and stable models of logic programs [ICDT97,PODS97,TCS00]

• Modeling “Process Knowledge” => Process MapsModeling “Process Knowledge” => Process Maps– formal semantics? (dynamic/temporal/Kripke models?)– executable semantics? (Statelog?)

• Graph Queries over DMs and PMsGraph Queries over DMs and PMs– expressible in F-logic [InfSystem98]– scalability? (UMLS Domain Map has millions of entries)

• ... ...