brain data & knowledge grid (or: towards services for knowledge-based mediation of neuroscience...

26
Brain Data & Knowledge Grid (or: Towards Services for Knowledge- Based Mediation of Neuroscience Information Sources) National Center for Microscopy and Imaging Research (NCMIR) Mark Ellisman Maryann Martone Steve Peltier Steve Lamont ... Data-Intensive Computing Environments San Diego Supercomputer Center (SDSC) Reagan Moore Chaitan Baru Amarnath Gupta Bertram Ludäscher Richard Marciano Arcot Rajasekar Ilya Zaslavsky ... University of California, San Diego

Upload: easter-miller

Post on 29-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Brain Data & Knowledge Grid(or: Towards Services for Knowledge-Based

Mediation of Neuroscience Information Sources)

National Center for Microscopy and

Imaging Research (NCMIR)

Mark Ellisman Maryann Martone

Steve PeltierSteve Lamont

...

Data-Intensive Computing Environments San Diego Supercomputer Center (SDSC)

Reagan MooreChaitan Baru

Amarnath GuptaBertram LudäscherRichard MarcianoArcot RajasekarIlya Zaslavsky

...

University of California, San Diego

Page 2: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Infrastructure for Sharing Neuroscience Data

CCB, Montana SUSurface atlas, Van Essen

Lab NCMIR, UCSD

stereotaxic atlas LONIMCell, CNL, Salk

SOURCES:• NCMIR, U.C. San Diego• Caltech Neuroimaging• Center for Imaging Science, John Hopkins• Center for Computational Biology, Montana State• Laboratory of Neuro Imaging (LONI), UCLA• Computatuonal Neurobiology Laboratory, Salk Inst.• Van Essen Laboratory, Washington University• …

Data Management Infrastructure (DICE/NPACI)• MIX Mediation in XML • MCAT information discovery• SRB data handling • HPSS storage• ...

Knowledge-based GRID

infrastructure

? ? ??

Data Management Infrastructure (“Data Grid”)GTOMO, Telemicroscopy, Globus, SRB/MCAT, HPSS

Page 3: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Sharing Resources on the Brain Data Grid

• Scientific groups ...– create data products (e.g., text data, images, simulation data …)

– put them in collections

– add metadata (who created it, what is the data about …)

– make it available for sharing (on the web, in data caches, in HPSS, …)

• Technical challenges ...– size & packaging of data

– heterogeneity: data types, storage technologies, transport mechanisms, authentication, ...

– access levels: collection, object, fragment; data-specific functions (“data blades”)

• Data Grid technologies can help ...– distributed data management, e.g., Storage Request Broker/Metadata

Catalog (SRB/MCAT), computing (Globus), ...

– focus is on resource sharing (data, networks, cycles)

Page 4: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Integration Issue: Semantic Integration/Mediation

??? SEMANTIC INTEGRATION ???

SYNTACTIC/STRUCTURAL Integration

• Integrated Views (Src-XML => Intgr-XML)

• Schema Integration (DTD =>DTD)

• Wrapping, Data Extraction (Text => XML)

MIX

Mediation of

Information

using XML

SYSTEM INTEGRATION

SR

B/M

CA

T

TCP/IP grid-ftp HTTPstorage, query capabilities

protocols & services

Dis

trib

ute

dQ

ue

ry P

roce

ssin

g

Globus JDBC DOM CORBA

Page 5: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Standard Mediator/Wrapper Architecture

GRID federationservices ???

INTEGRATED VIEW

Client/User-Query

(Neuro)Science (Re)Sources

DB Files WWW

Lab1 Lab2 Lab3

Wrapper Wrapper Wrapper

XML Q/A

SRB/MCAT, DOM, X(ML)Querystructure

transport

syntax

storage}domain

semantics ???Integration logic

protocol translation

Page 6: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

The Need for Semantic Integration

protein localization

What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?

How about other rodents?

morphometry neurotransmission

???Mediator ??????Mediator ???

Web

CaBP, Expasy

Wrapper WrapperWrapper Wrapper

??? Integrated View ???

??? Integrated View Definition ???

Data, relationships,

constraints are modeled (CMs)

Cross-source relationships are

modeled

Semantic (knowledge-

based) mediation services

Cross-source queries

Page 7: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Hidden Semantics: Protein Localization

<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”>

<name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”>

<density> <structure fraction=“0.8”>

<name>spine</><amount name=“RyR”>0</>

</> <structure fraction=“0.2”>

<name>branchlet</><amount name=“RyR”>30</>

</>

Molecular layer ofCerebellar Cortex

Purkinje Cell layer ofCerebellar Cortex

Fragment of dendrite

Page 8: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Hidden Semantics: Morphometry

<neuron name=“purkinje cell”><branch level=“10”>

<shaft>…

</shaft> <spine number=“1”>

<attachment x=“5.3” y=“-3.2” z=“8.7” />

<length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>

<length>1.79</> </head>

</spine> …

Branch level beyond 4 is a branchlet

Must be dendritic because Purkinje cells

don’t have somatic spines

Page 9: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Knowledge-Based (Semantic) Mediation

• Multiple Worlds Integration Problem:– compatible terms not directly joinable– complex, indirect associations among attributes– unstated integrity constraints

• Approach:– a “theory” under which terms can be “semantically joined”

=> lift mediation to the level of conceptual models (CMs)

=> formalize domain knowledge, ICs become rules over CMs

=> Knowledge-Based/Model-Based (Semantic) Mediation

Page 10: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

XML-Based vs. Model-Based Mediation

Raw DataRaw DataRaw Data

IF THEN IF THEN IF THEN

LogicalDomainConstraints

Integrated-CM :=

CM-QL(Src1-CM,...)

. . ....

....

........ (XML)Objects

Conceptual Models

XMLElements

XML Models

C2 C3

C1

R

Classes,Relations,is-a, has-a, ...

DOMAIN MAP

Integrated-DTD :=

XML-QL(Src1-DTD,...)

No DomainConstraints

A = (B*|C),DB = ...

Structural Constraints (DTDs),Parent, Child, Sibling, ...

CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, OIL, DAML, …}

Page 11: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Knowledge-Based Mediator Prototype

USER/ClientUSER/Client

S1 S2

S3

XML-Wrapper

CM-Wrapper

XML-Wrapper

CM-Wrapper

XML-Wrapper

CM-Wrapper

GCM

CM S1

GCM

CM S2

GCM

CM S3

CM (Integrated View)

MediatorEngine

FL rule proc.

LP rule proc.

Graph proc.XSB Engine

Domain MapDM

Integrated View Definition IVD

Logic API(capabilities)

CM Queries & Results (exchanged in XML)

CM Plug-In

Page 12: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Mediation Services: Source Registration (System Issues)

Source

Data Type

Access Protocol

Query Capability

table tree file

SRB HTTP

JDBC

SQL XMLQL

DOODARC

Result Delivery

Tuple-at-a-time Set-at-a-

time

Stream

Binary for Viewer Selections SPJ

Page 13: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Mediation Services: Source Registration (Semantics Issues)

• Domain Map Registration– provide concept space/ontology

• … as a private object (“myANATOM”)• … merge with others (give “semantic bridges”)• … and check for conflicts

• Conceptual Model Registration– schema: classes, associations, attributes– domain constraints – “put data into context” (linking data to the domain

map)

Next

Page 14: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

ANATOM Domain Map ANATOM

Back

Page 15: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

anatom_dom(X) :- (ucsd_has_a(X,_) ; ucsd_has_a(_,X) ; ucsd_isa(X,_) ; ucsd_isa(_,X)).senselab_dom(X) :- (sl_has_a(X,_) ; sl_has_a(_,X) ; sl_isa(X,_) ; sl_isa(_,X)).

% map Senselab anatom terms to equivalent UCSD ANATOMsl2ucsd(X,X) :- senselab_dom(X), anatom_dom(X).sl2ucsd('A',axon).sl2ucsd('AH',axon).sl2ucsd('Dad',spiny_branchlet). % should map to a PATH not just the end of the pathsl2ucsd('Dam',main_branches). % some of the main_branches based on the branch levelsl2ucsd('Dap',main_branches).sl2ucsd('Dbd',spiny_branchlet).sl2ucsd('Dbm',main_branches).sl2ucsd('Dbp',main_branches).sl2ucsd('Ded',spiny_branchlet).sl2ucsd('Dem',main_branches).sl2ucsd('Dep',main_branches).sl2ucsd('T',axon).

% keep has_a edge if at least one node is known from UCSDhas_a(X,Y) :- sl2ucsd(_,X), ucsd_has_a(X,Y).has_a(X,Y) :- sl2ucsd(_,Y), ucsd_has_a(X,Y).% keep all and only UCSD is_a relsisa(X,Y) :- ucsd_isa(X,Y). Back

Senselab (Yale) and NCMIR (UCSD) “Semantic Bridge”

Page 16: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Neuron

Spiny Neuron

Substantia Nigra Pc

AxonSoma Dendrite

GABA

Neurotransmitter

Compartment

Dopamine R

Substance P

MyNeuron

Medium Spiny Neuron

Substantia Nigra PrGlobus Pallidus Int.

Globus Pallidus Ext.

MyDendrite

OR

ALL:has

AND

=

exp

exp

Neostriatum

Refinement of a Domain Map (Ontology): Putting Data in Context via Registration of new Classes & Relationships

Page 17: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Mediation Services: Integrated View Definition

DERIVEprotein_distribution(Protein, Organism, Brain_region, Feature_name,

Anatom, Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism;

anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from

PROLAB

NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].

• provided by the domain expert and mediation engineer• declarative language (here: Frame-logic)

Page 18: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Example Query Evaluation (I)

• Example: protein_distribution– given: organism, protein, brain_region– Use DOMAIN-KNOWLEDGE-BASE:

• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities

– Source PROLAB:• join with anatomical structures and collect the value of attribute

“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism

– Mediator:• aggregate over all parents up to brain_region• report distribution

Page 19: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Example Query Evaluation (II)

@SENSELAB: X1 := select output from parallel fiber ;@MEDIATOR: X2 := “hang off” X1 from Domain Map;

@MEDIATOR: X3 := subregion-closure(X2);

@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);

@MEDIATOR: X5 := compute aggregate(X4);

"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

Page 20: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Mediation Services: Client Registration

Client

Update Client

Fat Result Viewer

Query Client

CheckData

MergeBeforeInsert

DeriveBeforeInsert

Client-side

Buffer

Client-sideProcessing

Navigate/

Ad-hoc

QueryCapabilityQuery on

Schema

Thin Result Viewer

Send Full DataServer-side

Buffer

ContextSensitive

Server-Push/Client-Pull

Page 21: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Example Client: Query Formulation and Result Display

• combination of ad hoc and navigational queries• client side visualization (left)• results are shown in semantic context (right)

Page 22: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Mediation Services: Semantic Annotation Tools

line drawing ==annotation==> (spatial) database for mediation

Page 23: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

XMLSources

RDBSources

FileSources

HTMLSources

Query interface (down API): • SDLIP, SOAP, ... • (subsets of) SQL, X(ML)-Query, CPL,...• DOM• SRB-based access

Result delivery interface (up API): • SDLIP, SOAP, ...• pull (tuple/set-at-a-time, DOM) vs. push (stream)• synchronous/asynchronous• direct data/data reference

Wrapper Layer

Digital Libraries (Collections)

SpatialSources

Source registration:• domain knowledge • model & schema • query & computation capabilities

Query processing:• view unfolding • semantic optimization• capability-based rewriting

Source model lifting:• domain knowledge reconciliation• model transformation

Query formulation:• user query• integrated view definition

Optimizer

Model Reasoner

DeductiveEngine

Mediator Layer

Mediation Services

Mediator Architecture Blueprint

Boston

Univ.

NCMIRUCSD

Yale Univ.

Montana Univ.

SDLIP ARCIMS

Page 24: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Coming up: Knowledge-Based/Semantic Mediation of Brain Data

CCB, Montana SUSurface atlas, Van Essen

Lab

NCMIR, UCSDstereotaxic atlas LONI

MCell, CNL, Salk

ANATOM

PROTLOC

Result (VML/SVG)

Result (XML/XSLT)

Knowledge-Based Mediation

Page 25: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

Some Open Issues

• Data/Knowledge Modeling– Extensibility: how to handle a source with new data types and

operations?• Temporal Data: instrument readings, video microscopy• Spatial Data: Integrating with spatial database systems• Image database systems

– Conflict Management• Grades of certainty• Alternate Hypothesis

• Integrating Services– Registration and warping of my image slice to a reference

• Integrating into Larger Applications– M-Cell simulation– Telemicroscopy– Visualization

Page 26: Brain Data & Knowledge Grid (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) National Center for Microscopy and

• Model-Based Mediation with Domain Maps, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Data Engineering (ICDE), Heidelberg, 2001

• Knowledge-Based Mediation of Heterogeneous Neuroscience Information Sources, Amarnath Gupta, Bertram Ludäscher, Maryann Martone, Intl. Conference on Scientific and Statistical Databases (SSDBM), Berlin, 2000.

• Model-Based Information Integration in a Neuroscience Mediator System, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Very Large Data Bases (VLDB), Cairo, 2000.

References