semantic mediation and scientific workflows bertram ludäscher data and knowledge systems san diego...

8
Semantic Mediation and Semantic Mediation and Scientific Workflows Scientific Workflows Bertram Bertram Ludäscher Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

Upload: marcia-robinson

Post on 08-Jan-2018

218 views

Category:

Documents


2 download

DESCRIPTION

3 SEEK Kansas 11/02

TRANSCRIPT

Page 1: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

Semantic Mediation and Semantic Mediation and Scientific WorkflowsScientific Workflows

Bertram LudäscherBertram Ludäscher

Data and Knowledge Systems

San Diego Supercomputer Center

University of California, San Diego

Page 2: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

2 SEEK Kansas 11/02SEEK Kansas 11/02

• Data Integration Approaches:– Let’s just share data, e.g., link everything from a web page!– ... or better put everything into an relational or XML database– ... and do remote access using the Grid– ... or just use Web services!

• Nice try. But: – “Find the files where the amygdala was segmented.”– “Which other structures were segmented in the same files?”– “Did the volume of any of those structures differ much from

normal?”– What is the cerebellar distribution of rat proteins with more

than 70% homology with human NCS-1? Any structure specificity? How about other rodents?

Some BIRNing Data Integration Some BIRNing Data Integration QuestionsQuestions

Biomedical InformaticsResearch Network

Page 3: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

3 SEEK Kansas 11/02SEEK Kansas 11/02

Page 4: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

XML-Based (or Relational) vs. Semantic MediationXML-Based (or Relational) vs. Semantic Mediation

Raw DataRaw DataRaw Data

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

XMLElements

XML Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

“Glue Maps” = Domain & Process Maps (ontologies)

Integrated-DTD XML-QL(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

Page 5: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

5 SEEK Kansas 11/02SEEK Kansas 11/02

Making the SM System “Understand” Your Data: Making the SM System “Understand” Your Data: SourceSource ContextualizationContextualization via Ontology Refinement via Ontology Refinement

In addition to registering (“hanging off”) data relative toexisting concepts, a source may also refine the mediator’s domain map...

sources can register new concepts at the mediator ...

Page 6: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

Query Processing Query Processing DemoDemo

Query resultsin context

ContextualizationCON(Result) wrt. ANATOM.

Mediator View DefinitionMediator View DefinitionDERIVEprotein_distribution(Protein, Organism,Brain_region, Feature_name, Anatom, Value) WHEREI:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from PROLAB

NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. • provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)

Page 7: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

7 SEEK Kansas 11/02SEEK Kansas 11/02

A Scientific Workflow: A Scientific Workflow: Promoter IdentificationPromoter Identification

Questions:Are chr#’s in common?Are chr#’s locations in common?Are there conserved upstream sequences?Are gene locations conserved across species

Questions: RNA POLII promoter?GpC Island present?Are there common TAF’s across genomic gi#?

Questions: Are there other common genes?

gi#’s from clusfavor

cDNA gi#Gene name

blast

blast human

Genomic gi#Chr #

Gene location

TAF’sLocation on Genomic gi#’s

Probabilities of matchProbabilities of random match

TRANSFAC

GC Island locationExon/intron location

Repeats locationPromoter location

GRAIL

Validates polII promoter location

promoter locationShared TAF’s across clusterCommon consensus sequence

Data Consolidation

Consensus sequences

CLUSTAL

blast other species

Genomic gi#Chr #

Gene location

blast

Matthew Coleman, LLNL, 2002

Genomic gi# cDNA gi#

blast

CLUSTAL

TRANSFAC

Page 8: Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego

8 SEEK Kansas 11/02SEEK Kansas 11/02

SDM Demo & ArchitectureSDM Demo & Architecture

Translation Approach:Abstract Workflow (AWF) => Executable Workflow (EWF)