Ontology Mapping Tool for Diabetes By Madhuri Gopal

Download Ontology Mapping Tool for  Diabetes By        Madhuri Gopal

Post on 20-Jan-2016




0 download

Embed Size (px)


Ontology Mapping Tool for Diabetes By Madhuri Gopal. Topics covered: Project overview Design Principles Technology Stack Approach and Methodology Execution Framework Modules Covered Results. Project Overview : Background The aim of the project is to overcome semantic - PowerPoint PPT Presentation


<ul><li><p>Ontology Mapping Tool for Diabetes By Madhuri Gopal </p></li><li><p>Topics covered:</p><p>Project overviewDesign PrinciplesTechnology StackApproach and Methodology Execution FrameworkModules CoveredResults</p></li><li><p>Project Overview: BackgroundThe aim of the project is to overcome semantic heterogeneity in the WWW by using ontology mapping techniques that find the semantic correspondences between similar elements of two ontologies. We are aiming to map ontology that are created from standard documents on Diabetes medical domain. Our approach will enable better decision making support for queries on these documentsChallenges in the existing systemsIdentification of a safer drug regimen requires searching through a space of indicated regimens that outnumbers the pages Google searches 1000 to 1.A single criterion is insufficient to guide the selection of a safer regimen.Fragmented gathering and storage of clinical dataLack of formal standardized knowledge representation of clinical data.</p></li><li><p> Design Principles Open Close Principle Software entities like classes, modules and functions should be open for extension but closed for modifications.</p><p> Dependency Inversion Principle a) High-level modules should not depend on low-level modules. Both should depend on abstractions. b) Abstractions should not depend on details. Details should depend on abstractions.</p><p> Interface Segregation Principle Clients should not be forced to depend upon interfaces that they don't use. </p></li><li><p>Design Principles contd Single Responsibility Principle A class should have only one reason to change.</p><p> Liskov's Substitution Principle Derived types must be completely substitutable for their base types. </p></li><li><p> Technology Stack</p><p> The architecture followed is a 2 tier architecture. Front-End : Java Back-end : Ontology (.owl files) </p></li><li><p>Development Hardware </p><p>Processor: Intel(R) Core 2 Duo CPU T6400 @ 2.00 GHZ Memory(RAM) : 4 GBSystem type: 32-bit Operating System </p><p>Tools used</p><p>Protg - Ontology Creation (Stanford Open Source Tool)PDPTools Neural networks Simulator ( Stanford Open Source Tool) </p></li><li><p>Approach and Methodology Software prototyping (Incremental prototyping) methodology is used for development.</p><p> The final product is built as separate prototypes. At the end the separate prototypes are merged in an overall design</p><p> Steps are: a) Identification of basic requirements. b) Development of the initial prototype c) Review of prototype d) Revision and Enhancement of the Prototype </p></li><li><p>Execution FrameworkEclipse IDE is used as the execution framework.</p><p>All the required plugins (jar files) from protg/plugins/edu.stanford.smi.protegex.owl and OWL API ( open source API) are included in the build path of the Java project for accessing the ontology built using Protg ( Stanford open source tool).</p><p>The IAC Neural networks is implemented using PDPTools suite of neural networks software ( Stanford tool for Parallel Distributed Processing) which runs in Matlab . All required inputs are taken from java environment by connectivity between Eclipse and Matlab </p></li><li><p> Overall Architecture</p></li><li><p>Modules covered 1) Creation of diabetes ontology from American Association of Clinical Endocrinologists (Benchmark document ) and from Wikipedia</p><p>Name Similarity Matrix calculated for all terms in both ontologies using the Levenshtein Distance formula ( Dynamic Programming Technique)</p><p>Profile Similarity Matrix calculated using term frequency inverse document frequency (tf.idf statistical data mining algorithm ) .</p><p>4) Conversion of ontology terms to a vector space model and computation of Cosine Similarity matrix.</p></li><li><p>Modules covered contd.5) Structural similarity matrix for calculation of structural similarity between ontologies using basic structural features such as depth from root, number of children , number of instances.</p><p>Similarity Aggregator for aggregating the name similarity , profile similarity and structural similarity </p><p>7) Harmony function estimation for filtering out the most useful similarities and eliminating the erroneous similarity.</p><p>IAC neural networks algorithm that satisfies a constraint satisfaction problem for improving the mapping between the two ontologies. </p></li><li><p>Ontology Creation - Using Protg </p></li><li><p>Ontology 1</p></li><li><p>Ontology 2</p></li><li><p>Ontology Mapping</p></li><li><p> Ontology Mapping Input: 2 homogeneous ontologies O1 and O2 expressed in formal ontology language (OWL/RDF) .</p><p> Output: 4 Tuple: M(e1i , e2j , r, s) where M is the mapping e1i is an element in O1 e2j is an element in O2 r mapping between e1i and e2j s confidence measure of mapping normalized from [0..1] </p></li><li><p> IR Based Similarity Generator Input: Ontologies O1 ,O2</p><p> Output : 3 similarity matrices that contain similarity scores for each pair of elements in ontologies. Similarity Matrices : Name SimilarityProfile SimilarityStructural Similarity</p></li><li><p> Name Similarity</p><p>This is calculated based on the edit distance between the name(id) of the elements</p><p>NameSim(e1i, e2j) = 1- { EditDist(e1j , e2j) / Max(l(e1i) , l(e2j)) }</p><p> where : EditDist - LevenShtein distance between elements. l(e1i) and l(e2j)- length of strings e1i and e2j.</p></li><li><p>Sample Output for two Ontologies with 6 elements each </p></li><li><p>Name similarity matrix of dimension 37*26</p></li><li><p> Profile Similarity:</p><p>The profile similarity is defined in 3 steps:</p><p> Profile Enrichment Profile Propagation Profile Mapping</p></li><li><p> Profile Enrichment and Propagation Profile of a class Class ID + Comments + Properties Profiles + Instances Profiles</p><p>Profile of a property Property ID + Property Domain + Property Range</p><p>Profile of an instance Instance ID+ Descriptive information </p></li><li><p> Profile MappingCosine similarity between the profiles of the 2 elements e1i and e2j is calculated in a vector space model . ProfileSim(e1i, e2j) = ( Vei1 Ve2j) / ( |Vei1||Ve2j| )</p><p> where: Ve1i and Ve2j are 2 vectors representing the profile of elements e1i and e2j respectively.</p></li><li><p>Property domain range of Ontology1</p></li><li><p>Property Domain Range of Ontology 2</p></li><li><p>Cosine Similarity Matrix </p></li><li><p> Structural similarity This is applicable for classes alone as they have hierarchical information StructSim(e1i,e2j) = ( 1-diffk(e1i,e2j) / N where: e1i , e2j are 2 class elements in the ontology O1 and O2 respectively N total number of structure features diffk(e1i , e2j) denotes the difference for feature k.</p><p> diff(e1i,e2j) = (sf(e1i) - sf(e2j)) / max (sf(e1i) , sf(e2j)) where: sf(e1i), sf(e2j) denote the value of a structural feature of the element </p></li><li><p> Identical Ontologies Similarity Calculation</p></li><li><p>Structural Similarity Matrix </p></li><li><p> Harmony Harmony estimates the importance and reliability of different similarities. Harmony (h) = #s_max / min(#e1 ,#e2)</p><p> where : #s_max - number of pairs of elements having the highest similarity in both the row and column in the similarity matrix.</p><p> #ei - number of elements of ontology Oi </p></li><li><p>Similarity matrices Harmony Estimation</p></li><li><p> Adaptive Similarity Aggregator Input: Individual similarity matrices</p><p>Output : Aggregated similarity matrix</p><p> FinalSim(e1i,e2j) = hk * Simk( e1i,e2j) / n where: hk - kth similarity matrix harmony n- Total number of similarity matrices</p></li><li><p>Final Aggregated Similarity Matrix</p></li><li><p> IAC neural NetworkWith Constraint Satisfaction </p></li><li><p>H11H12H1nSYNAPSIS 1H21H22H2nSYNAPSIS 2H31H32H3nArchitecture</p></li><li><p>Neural Networks Constraint Satisfaction Sample Output</p></li><li><p> Thank You</p></li></ul>