brain data & knowledge grid (or: towards services for knowledge-based mediation of neuroscience...
TRANSCRIPT
Brain Data & Knowledge Grid(or: Towards Services for Knowledge-Based
Mediation of Neuroscience Information Sources)
National Center for Microscopy and
Imaging Research (NCMIR)
Mark Ellisman Maryann Martone
Steve PeltierSteve Lamont
...
Data-Intensive Computing Environments San Diego Supercomputer Center (SDSC)
Reagan MooreChaitan Baru
Amarnath GuptaBertram LudäscherRichard MarcianoArcot RajasekarIlya Zaslavsky
...
University of California, San Diego
Infrastructure for Sharing Neuroscience Data
CCB, Montana SUSurface atlas, Van Essen
Lab NCMIR, UCSD
stereotaxic atlas LONIMCell, CNL, Salk
SOURCES:• NCMIR, U.C. San Diego• Caltech Neuroimaging• Center for Imaging Science, John Hopkins• Center for Computational Biology, Montana State• Laboratory of Neuro Imaging (LONI), UCLA• Computatuonal Neurobiology Laboratory, Salk Inst.• Van Essen Laboratory, Washington University• …
Data Management Infrastructure (DICE/NPACI)• MIX Mediation in XML • MCAT information discovery• SRB data handling • HPSS storage• ...
Knowledge-based GRID
infrastructure
? ? ??
Data Management Infrastructure (“Data Grid”)GTOMO, Telemicroscopy, Globus, SRB/MCAT, HPSS
Sharing Resources on the Brain Data Grid
• Scientific groups ...– create data products (e.g., text data, images, simulation data …)
– put them in collections
– add metadata (who created it, what is the data about …)
– make it available for sharing (on the web, in data caches, in HPSS, …)
• Technical challenges ...– size & packaging of data
– heterogeneity: data types, storage technologies, transport mechanisms, authentication, ...
– access levels: collection, object, fragment; data-specific functions (“data blades”)
• Data Grid technologies can help ...– distributed data management, e.g., Storage Request Broker/Metadata
Catalog (SRB/MCAT), computing (Globus), ...
– focus is on resource sharing (data, networks, cycles)
Integration Issue: Semantic Integration/Mediation
??? SEMANTIC INTEGRATION ???
SYNTACTIC/STRUCTURAL Integration
• Integrated Views (Src-XML => Intgr-XML)
• Schema Integration (DTD =>DTD)
• Wrapping, Data Extraction (Text => XML)
MIX
Mediation of
Information
using XML
SYSTEM INTEGRATION
SR
B/M
CA
T
TCP/IP grid-ftp HTTPstorage, query capabilities
protocols & services
Dis
trib
ute
dQ
ue
ry P
roce
ssin
g
Globus JDBC DOM CORBA
Standard Mediator/Wrapper Architecture
GRID federationservices ???
INTEGRATED VIEW
Client/User-Query
(Neuro)Science (Re)Sources
DB Files WWW
Lab1 Lab2 Lab3
Wrapper Wrapper Wrapper
XML Q/A
SRB/MCAT, DOM, X(ML)Querystructure
transport
syntax
storage}domain
semantics ???Integration logic
protocol translation
The Need for Semantic Integration
protein localization
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?
How about other rodents?
morphometry neurotransmission
???Mediator ??????Mediator ???
Web
CaBP, Expasy
Wrapper WrapperWrapper Wrapper
??? Integrated View ???
??? Integrated View Definition ???
Data, relationships,
constraints are modeled (CMs)
Cross-source relationships are
modeled
Semantic (knowledge-
based) mediation services
Cross-source queries
Hidden Semantics: Protein Localization
<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”>
<name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”>
<density> <structure fraction=“0.8”>
<name>spine</><amount name=“RyR”>0</>
</> <structure fraction=“0.2”>
<name>branchlet</><amount name=“RyR”>30</>
</>
Molecular layer ofCerebellar Cortex
Purkinje Cell layer ofCerebellar Cortex
Fragment of dendrite
Hidden Semantics: Morphometry
<neuron name=“purkinje cell”><branch level=“10”>
<shaft>…
</shaft> <spine number=“1”>
<attachment x=“5.3” y=“-3.2” z=“8.7” />
<length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>
<length>1.79</> </head>
</spine> …
Branch level beyond 4 is a branchlet
Must be dendritic because Purkinje cells
don’t have somatic spines
Knowledge-Based (Semantic) Mediation
• Multiple Worlds Integration Problem:– compatible terms not directly joinable– complex, indirect associations among attributes– unstated integrity constraints
• Approach:– a “theory” under which terms can be “semantically joined”
=> lift mediation to the level of conceptual models (CMs)
=> formalize domain knowledge, ICs become rules over CMs
=> Knowledge-Based/Model-Based (Semantic) Mediation
XML-Based vs. Model-Based Mediation
Raw DataRaw DataRaw Data
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM :=
CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
XMLElements
XML Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
DOMAIN MAP
Integrated-DTD :=
XML-QL(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, OIL, DAML, …}
Knowledge-Based Mediator Prototype
USER/ClientUSER/Client
S1 S2
S3
XML-Wrapper
CM-Wrapper
XML-Wrapper
CM-Wrapper
XML-Wrapper
CM-Wrapper
GCM
CM S1
GCM
CM S2
GCM
CM S3
CM (Integrated View)
MediatorEngine
FL rule proc.
LP rule proc.
Graph proc.XSB Engine
Domain MapDM
Integrated View Definition IVD
Logic API(capabilities)
CM Queries & Results (exchanged in XML)
CM Plug-In
Mediation Services: Source Registration (System Issues)
Source
Data Type
Access Protocol
Query Capability
table tree file
SRB HTTP
JDBC
SQL XMLQL
DOODARC
Result Delivery
Tuple-at-a-time Set-at-a-
time
Stream
Binary for Viewer Selections SPJ
Mediation Services: Source Registration (Semantics Issues)
• Domain Map Registration– provide concept space/ontology
• … as a private object (“myANATOM”)• … merge with others (give “semantic bridges”)• … and check for conflicts
• Conceptual Model Registration– schema: classes, associations, attributes– domain constraints – “put data into context” (linking data to the domain
map)
Next
anatom_dom(X) :- (ucsd_has_a(X,_) ; ucsd_has_a(_,X) ; ucsd_isa(X,_) ; ucsd_isa(_,X)).senselab_dom(X) :- (sl_has_a(X,_) ; sl_has_a(_,X) ; sl_isa(X,_) ; sl_isa(_,X)).
% map Senselab anatom terms to equivalent UCSD ANATOMsl2ucsd(X,X) :- senselab_dom(X), anatom_dom(X).sl2ucsd('A',axon).sl2ucsd('AH',axon).sl2ucsd('Dad',spiny_branchlet). % should map to a PATH not just the end of the pathsl2ucsd('Dam',main_branches). % some of the main_branches based on the branch levelsl2ucsd('Dap',main_branches).sl2ucsd('Dbd',spiny_branchlet).sl2ucsd('Dbm',main_branches).sl2ucsd('Dbp',main_branches).sl2ucsd('Ded',spiny_branchlet).sl2ucsd('Dem',main_branches).sl2ucsd('Dep',main_branches).sl2ucsd('T',axon).
% keep has_a edge if at least one node is known from UCSDhas_a(X,Y) :- sl2ucsd(_,X), ucsd_has_a(X,Y).has_a(X,Y) :- sl2ucsd(_,Y), ucsd_has_a(X,Y).% keep all and only UCSD is_a relsisa(X,Y) :- ucsd_isa(X,Y). Back
Senselab (Yale) and NCMIR (UCSD) “Semantic Bridge”
Neuron
Spiny Neuron
Substantia Nigra Pc
AxonSoma Dendrite
GABA
Neurotransmitter
Compartment
Dopamine R
Substance P
MyNeuron
Medium Spiny Neuron
Substantia Nigra PrGlobus Pallidus Int.
Globus Pallidus Ext.
MyDendrite
OR
ALL:has
AND
=
exp
exp
Neostriatum
Refinement of a Domain Map (Ontology): Putting Data in Context via Registration of new Classes & Relationships
Mediation Services: Integrated View Definition
DERIVEprotein_distribution(Protein, Organism, Brain_region, Feature_name,
Anatom, Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism;
anatomical_structures ->>{AS:anatomical_structure[name->Anatom]}] , % from
PROLAB
NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value].
• provided by the domain expert and mediation engineer• declarative language (here: Frame-logic)
Example Query Evaluation (I)
• Example: protein_distribution– given: organism, protein, brain_region– Use DOMAIN-KNOWLEDGE-BASE:
• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities
– Source PROLAB:• join with anatomical structures and collect the value of attribute
“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism
– Mediator:• aggregate over all parents up to brain_region• report distribution
Example Query Evaluation (II)
@SENSELAB: X1 := select output from parallel fiber ;@MEDIATOR: X2 := “hang off” X1 from Domain Map;
@MEDIATOR: X3 := subregion-closure(X2);
@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);
@MEDIATOR: X5 := compute aggregate(X4);
"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"
Mediation Services: Client Registration
Client
Update Client
Fat Result Viewer
Query Client
CheckData
MergeBeforeInsert
DeriveBeforeInsert
Client-side
Buffer
Client-sideProcessing
Navigate/
Ad-hoc
QueryCapabilityQuery on
Schema
Thin Result Viewer
Send Full DataServer-side
Buffer
ContextSensitive
Server-Push/Client-Pull
Example Client: Query Formulation and Result Display
• combination of ad hoc and navigational queries• client side visualization (left)• results are shown in semantic context (right)
Mediation Services: Semantic Annotation Tools
line drawing ==annotation==> (spatial) database for mediation
XMLSources
RDBSources
FileSources
HTMLSources
Query interface (down API): • SDLIP, SOAP, ... • (subsets of) SQL, X(ML)-Query, CPL,...• DOM• SRB-based access
Result delivery interface (up API): • SDLIP, SOAP, ...• pull (tuple/set-at-a-time, DOM) vs. push (stream)• synchronous/asynchronous• direct data/data reference
Wrapper Layer
Digital Libraries (Collections)
SpatialSources
Source registration:• domain knowledge • model & schema • query & computation capabilities
Query processing:• view unfolding • semantic optimization• capability-based rewriting
Source model lifting:• domain knowledge reconciliation• model transformation
Query formulation:• user query• integrated view definition
Optimizer
Model Reasoner
DeductiveEngine
Mediator Layer
Mediation Services
Mediator Architecture Blueprint
Boston
Univ.
NCMIRUCSD
Yale Univ.
Montana Univ.
SDLIP ARCIMS
Coming up: Knowledge-Based/Semantic Mediation of Brain Data
CCB, Montana SUSurface atlas, Van Essen
Lab
NCMIR, UCSDstereotaxic atlas LONI
MCell, CNL, Salk
ANATOM
PROTLOC
Result (VML/SVG)
Result (XML/XSLT)
Knowledge-Based Mediation
Some Open Issues
• Data/Knowledge Modeling– Extensibility: how to handle a source with new data types and
operations?• Temporal Data: instrument readings, video microscopy• Spatial Data: Integrating with spatial database systems• Image database systems
– Conflict Management• Grades of certainty• Alternate Hypothesis
• Integrating Services– Registration and warping of my image slice to a reference
• Integrating into Larger Applications– M-Cell simulation– Telemicroscopy– Visualization
• Model-Based Mediation with Domain Maps, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Data Engineering (ICDE), Heidelberg, 2001
• Knowledge-Based Mediation of Heterogeneous Neuroscience Information Sources, Amarnath Gupta, Bertram Ludäscher, Maryann Martone, Intl. Conference on Scientific and Statistical Databases (SSDBM), Berlin, 2000.
• Model-Based Information Integration in a Neuroscience Mediator System, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Very Large Data Bases (VLDB), Cairo, 2000.
References