describing the quality of knowledge contained in biological and medical knowledge bases mor peleg...
Post on 19-Dec-2015
213 views
TRANSCRIPT
Describing the Quality of Knowledge Contained in Biological and Medical
Knowledge Bases
Mor PelegDepartment of Management Information
Systems
University of Haifa
The Value of Information in Networked Contexts
February 3, 2004
Knowledge Base (KB)
•KB - a centralized repository for information, stored in machine-readable format, usually on-line
•A KB system contains:–A knowledge model –Tools for information collection,
organization, and retrieval
Components of a biological model
Sequence componentsDB entries
Cellular location
Gene productsNormal/mutated
Biological process
ProteolysisTransportGene regulation
Molecular function
Clinical phenotypes
Goals
•Piece together biological data–Of various types, sources, and quality
•Develop a qualitative model at first–Data is noisy and incomplete
•Create a quantitative model eventually
•Store knowledge to allow–systematic evaluation by scientists – input for computer algorithms
Evaluation of other models
•We evaluated 11 models – from biology, business, and software eng.
•We wanted to combine the best aspects of two of the models
Workflow Biomedical Biological Model + Concept Model Process
Model (TAMBIS+UMLS)•Framework developed using Protégé-
2000
Peleg, Yeh, and Altman, Bioinformatics, vol. 18, pp. 825-837, 2002
Mapping business workflow to biological systems
Business Workflow model Biological Process Model
Process model
Structural modelOrganizational model
Biomolecular complex(Replication complex)
Biopolymer(Helicase)
Role(DNA unwinding)
member
Organizational Unit(Medical School)
Human Role(Dean)
member
Process model
Graphic dynamic & functional model
Queries: All links supported only by speculationAll processes inhibited by neuraminadase and occur in membraneAll mutations that cause misfunctional processes and have clinical phenotype
Mapping to Petri Nets
• Explicitly represent states
• Verification of properties – liveness, boundedness
• Reasoning on dynamics– without t1, can we reach
P4 from P1?
• Simulation
• Which of 2 models is correct?
P1
t1
t4
P2
t2
P3
t3
P4
AND
AND
Conclusion
• Our framework– pieces together qualitative biological data of
various types, sources, and quality– supports identification of the quality of data in
the KB» Useful for identifying areas for more experimentation» Useful to decide which conflicting facts are more
credible
– supports reasoning (queries)– enables verification of dynamic properties and
simulation (prediction)
Do other models posses the desired properties?
Model graf nesting
static
function
dynamic
bio info verify
Simulation tools
Computational model
Query types*
GO + + - 1,3
TAMBIS + + DL 1,3
EcoCyc + + + + frames 1,2,3
Rzhetsky + + + + frames 1,2,3
State-charts
+ + C + statechart
OMT/UML
+ + + + C +/- statechart
OPM + + + + I +/- Semi-formal
PIF/PSL + I KIF
BPML + C XML
Workflow
+ + + + I + + Petri Nets
Petri Net + + I In some
+ + Petri Nets
DMD: 1,2,4,
5Meta: 2,3
our model
+ + + + I + + + Petri Nets
1-5
C= components, I = integrated * we considered models that can represent biological-specific information
Participant-Role Diagrams
<role>
Individualmolecule
Complex
Collection
Functional
role
Diseaserole
Participants Relations
Rolesrole
Complex-subunit
Collection-participant
Molecule-domain
specialization
Peleg, Gabashvili, and Altman. P IEEE 90(12): 1875-1886, 2002