auditing redundant import in reuse of a top level …zh2132/vdos2013-zhe-slides.pdfontology for drug...

20
Zhe He 1 Christopher Ochs 1 Larisa Soldatova 2 Yehoshua Perl 1 Sivaram Arabandi 3 James Geller 1 1 New Jersey Institute of Technology, 2 Brunel University, 3 Ontopro LLC. Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI) ICBO 2013 Workshop on Vaccine and Drug Ontology Studies 1

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Zhe He1

Christopher Ochs1

Larisa Soldatova2

Yehoshua Perl1

Sivaram Arabandi3

James Geller1 1New Jersey Institute of Technology, 2Brunel University, 3Ontopro LLC.

Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery

Investigations Ontology (DDI)

ICBO 2013 Workshop on Vaccine and Drug Ontology Studies

1

Page 2: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Outline

• Introduction

– Environment

– Motivation

– Ontology for Drug Discovery Investigations (DDI)

– Abstraction Networks & Partial Area Taxonomy

• Algorithm Hide

– Hiding Redundant BFO (Basic Formal Ontology) classes from DDI

• Future work

• Conclusions

2

Page 3: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Environment

• BioPortal: a large repository of over 340 biomedical ontologies covering a wide range of domains.

• Many ontologies in BioPortal are released in OWL or OBO format.

• OWL (Web Ontology Language): based on Description Logic, maintained by a working group of W3C.

• OBO (Open Biological and Biomedical Ontologies ) Foundry: a collaborative experiment involving developers of ontologies who are establishing a set of principles for ontology development.

3

Page 4: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Motivation

• Use a top-level ontology as a template for a domain ontology is recommended.

• OBO Foundry recommends importing BFO (Basic Formal Ontology).

• The top-domain ontologies OGMS (Ontology for General Medical Science) and BioTop (Beisswanger et al. 2008) reuse BFO.

• Some domain ontologies reuse OGMS, thereby indirectly reusing BFO.

4

Page 5: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Motivation (cont.)

• Ontologies need to go through Quality Assurance before being put to use.

– Discovering modeling errors and inconsistencies in the design

– Unused imported top-level classes diminish the usability of the ontology.

– Currently, there is no mechanism to remove unused imported classes.

– Redundant imported top-level classes should be hidden.

5

Page 6: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Ontology for Drug Discovery Investigations

• DDI was developed to support automatic drug discovery investigations run by a Robot Scientist “Eve” (Qi et al. 2010).

• DDI is used for reasoning with data about the biological activity of compounds in regards to various drug targets.

• DDI uses BFO (Basic Formal Ontology) and RO (Relations Ontology) as design templates and extends BFO and OBI (Ontology for Biomedical Investigations).

• Some imported BFO classes were left unused in DDI.

– connected_temporal_region

– temporal_instant

– temporal_interval

6

Page 7: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Abstraction Networks

• An abstraction network is a secondary network that provides a compact view of the structure and content of the primary ontology.

• Abstraction of an ontology is the process by which subsets of classes are each replaced by a higher-level conceptual entity (node).

Ontology Abstraction Network

Subset of classes modeled by a node

7

Page 8: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Partial Area Taxonomy

• Partial area taxonomy is an abstraction network developed by our research group that summarizes sets of structurally and semantically similar classes.

• Partial area taxonomies have been derived for

– SNOMED CT (Wang et al. 2007)

– Ontology of Clinical Research (OCRe) (Ochs et al. 2012)

– Sleep Domain Ontology (SDO) (Ochs et al. 2013)

– Cancer Chemoprevention Ontology (CanCo) (He et al. 2013)

– etc.

8

Page 9: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Area Taxonomy

Area: Set of all classes that are explicitly defined or inferred as being in exactly the domain of a given set of object properties.

9

Page 10: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Partial Area Taxonomy Root: Class with no superclasses in area Partial area: Root + all descendants in area

10

Page 11: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Algorithm Hide

• Hide is a post order recursive algorithm requiring linear time.

• Hide identifies imported classes that are not used in the domain ontology.

• Applicability: – Ontologies in OWL or OBO format

– Both domain ontology and top-level ontology are trees.

– Top-level ontology does not have object properties.

• A Class is redundant if: – Imported from the top-level ontology AND

– In Root partial area of the taxonomy AND

– A leaf in the domain ontology (at some stage of the algorithm) AND

– Not used as range of an object property

11

Page 12: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

12

Partial Area Taxonomy for DDI

Page 13: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Entity Node of DDI Taxonomy

• 81 classes in Entity root partial area of DDI taxonomy

• BFO has 38 classes.

• 32 out of 81 classes are imported from BFO.

• 6 BFO classes are used as domains of object properties.

• Hence, we reviewed 32 classes for redundancy.

13

Page 14: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

BFO Classes in Entity

Node Before Hiding

Entity (2 children)

continuant (3 children)

dependent_continuant (2 children)

independent_continuant (3 children)

material_entity (10 children)

fiat_object_part

object

object_aggregate

object_boundary

site (3 children)

spatial_region (4 children)

one_dimentional_region

two_dimentional_region

three_dimentional_region

zero_dimentional_region

occurent (3 children)

processual_entity (6 children)

fiat_process_part

process (2 children)

process_aggregate

process_boundary

processual_context

spatiotemporal_region (2 children)

connected_spatiotemporal_region (2 children)

spatiotemporal_instant

spatiotemporal_interval

scattered_spatiotemporal_region

temporal_region (2 children)

connected_temporal_region (2 children)

temporal_instant

temporal_interval

scattered_temporal_region Legend LL Leaf LL Parent of classes that are all leaves LL Grandparent of grandchildren that are all leaves

14

Page 15: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

BFO Classes in Entity Partial Area After Hiding

• 18 unused BFO classes are hidden.

• Meaning 18/32 = 56% BFO classes in Entity partial area are hidden.

Entity (2 children)

continuant (3 children)

dependent_continuant (2 children)

independent_continuant (3 children)

material_entity (10 children)

site (3 children)

spatial_region (4 children)

one_dimentional_region

two_dimentional_region

three_dimentional_region

zero_dimentional_region

occurent (3 children)

processual_entity (6 children)

process (2 children)

15

Page 16: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Future Work

• As many as 35 out of 186 ontologies we investigated in BioPortal reuse BFO classes.

• Some ontologies have a Directed Acyclic Graph (DAG) hierarchy, e.g. SDO (Sleep Domain Ontology) (Arabandi 2010).

• Need to consider cases where both top-level and domain ontologies are DAG hierarchies.

• Some top-domain ontologies have object properties, e.g. BioTop.

• Need to design algorithm to deal with issues regarding redundant import of relationships in the reuse of top-domain ontologies.

16

Page 17: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Conclusions

• We described a recursive linear algorithm for hiding unused imported top-level ontology classes of an OWL-based ontology.

• The algorithm was demonstrated by hiding 18 (56%) BFO imported classes from the DDI.

• Hiding of unused imported top-level classes should be part of the Quality Assurance process of OWL-based ontologies.

17

Page 18: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

References • Qi, D., R. D. King, et al. (2010). "An ontology for description of drug

discovery investigations." J Integr Bioinform 7(3).

• Arabandi, S. (2010). “Developing a Sleep Domain Ontology.” AMIA TBI/CRI Summit. San Francisco, CA.

• Beisswanger, E, S. Schulz, et al. “BioTop: An Upper Domain Ontology for the Life Sciences.” Appl Ontology 3(4): 205-212.

• Wang, Y., et al. (2007). "Structural methodologies for auditing SNOMED." J Biomed Inform 40(5): 561-581.

• Ochs, C., A. Agrawal, et al. (2012). "Deriving an Abstraction Network to Support Quality Assurance in OCRe." AMIA Annu Symp Proc: 681-689

• Ochs, C. , Z. He, et al. (2013). “Choosing the Granularity of Abstraction Networks for Orientation and Quality Assurance of the Sleep Domain Ontology.” The 4th International Conference on Biomedical Ontology Proc.

• He, Z., C. Ochs, et al. (2013). “A Family-based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal.” To appear in AMIA Annu Symp Proc.

18

Page 19: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Thank you!

Any Questions?

19

Page 20: Auditing Redundant Import in Reuse of a Top Level …zh2132/VDOS2013-Zhe-Slides.pdfOntology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery

Algorithm of Hide • Algorithm Hide(R, O, T, v)

• IF isInternal(O, v) THEN

• FOR EACH Class w IN subclasses(R, v) {

• Hide(R, O, T, w)

• }

• END IF

• IF NOT(isInternal(O,v)) THEN

• IF isClassFrom(v, O, T) AND NOT(in_op_range(v, O))

• THEN

• hide(v, O)

• END IF

• END IF

• RETURN

• Main Program

• // Initially, call Hide on the root class r of the root partial area R.

• Hide(R, O, T, r)

Function Name Function Description

isInternal(O, v) Boolean function that returns true if class v has any subclasses in ontology O.

subclasses(R, v) Returns iterator to the set of subclasses of class v in root partial area R.

isClassFrom(v, O, T) Boolean function that returns true if the class v in ontology O is imported from Top-Level ontology T.

in_op_range(v, O) Boolean function that returns true if class v is in the range of an object property of ontology O.

hide(v, O) Hides class v from ontology O and therefore also removes all subclass relationships from v.

Domain ontology: O Top-Level ontology: T Root Partial Area of O: R Class in O - v

20