bitmap index in ontology mapping for data integration

15
Arab J Sci Eng (2013) 38:859–873 DOI 10.1007/s13369-012-0373-4 RESEARCH ARTICLE – COMPUTER ENGINEERING AND COMPUTER SCIENCE Bitmap Index in Ontology Mapping for Data Integration Sharifullah Khan · Muhammad Bilal Received: 26 June 2010 / Accepted: 17 May 2011 / Published online: 5 October 2012 © King Fahd University of Petroleum and Minerals 2012 Abstract Selecting a relevant data source among the avail- able ones in a data integration system plays vital role in optimizing query performance. The sources are heteroge- neous and autonomous and can join and leave an integration system arbitrarily. Some sources may not contribute signif- icantly to a user query because they are not relevant to it. Executing a user query against all available sources con- sumes resources unreasonably and makes the query process- ing expensive. The existing techniques for source selection take significant time in traversing source descriptions. Con- sequently, query response time degrades in coping with the growing number of available sources. Semantic heterogene- ities of data add further complexity to source selection. As a first step, we employed ontologies for identifying the rele- vant data elements of individual sources for particular queries and semantic relationships among these data elements. Then, we mapped local ontologies with domain ontology through a bitmap index. In spite of traversing the local ontologies, our proposed system utilizes the bitmap index to perform rele- vance reasoning in order to optimize the user query response. A prototype system has been designed and implemented to validate the system. We evaluated the prototype system for query response time and it was improved due to the incorpo- ration of a bitmap index. S. Khan (B ) · M. Bilal School of Electrical Engineering and Computer Science, National University of Sciences and Technology (NUST), Islamabad, Pakistan e-mail: [email protected] M. Bilal e-mail: [email protected] Keywords Query processing · Source selection · Relevance reasoning · Ontology · Bitmap index · Semantic similarity · Data integration 1 Introduction A data integration system provides a uniform query interface that gives a user transparent access for querying data sources located remotely. Retrieving data from these interrelated data sources is a non-trivial task due to their characteristics, i.e. autonomy, heterogeneity, and geographical distribution [15]. Moreover, the data sources can join and leave the sys- tem arbitrarily and all the data sources may not have the required information. In data integration, a mediated schema for explicit description of the data sources are used in data integration systems [1, 2, 57]. The mediated schema has two 123

Upload: muhammad-bilal

Post on 12-Dec-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873DOI 10.1007/s13369-012-0373-4

RESEARCH ARTICLE – COMPUTER ENGINEERING AND COMPUTER SCIENCE

Bitmap Index in Ontology Mapping for Data Integration

Sharifullah Khan · Muhammad Bilal

Received: 26 June 2010 / Accepted: 17 May 2011 / Published online: 5 October 2012© King Fahd University of Petroleum and Minerals 2012

Abstract Selecting a relevant data source among the avail-able ones in a data integration system plays vital role inoptimizing query performance. The sources are heteroge-neous and autonomous and can join and leave an integrationsystem arbitrarily. Some sources may not contribute signif-icantly to a user query because they are not relevant to it.Executing a user query against all available sources con-sumes resources unreasonably and makes the query process-ing expensive. The existing techniques for source selectiontake significant time in traversing source descriptions. Con-sequently, query response time degrades in coping with thegrowing number of available sources. Semantic heterogene-ities of data add further complexity to source selection. Asa first step, we employed ontologies for identifying the rele-vant data elements of individual sources for particular queriesand semantic relationships among these data elements. Then,we mapped local ontologies with domain ontology through abitmap index. In spite of traversing the local ontologies, ourproposed system utilizes the bitmap index to perform rele-vance reasoning in order to optimize the user query response.A prototype system has been designed and implemented tovalidate the system. We evaluated the prototype system forquery response time and it was improved due to the incorpo-ration of a bitmap index.

S. Khan (B) · M. BilalSchool of Electrical Engineering and Computer Science,National University of Sciences and Technology (NUST),Islamabad, Pakistane-mail: [email protected]

M. Bilale-mail: [email protected]

Keywords Query processing · Source selection ·Relevance reasoning · Ontology · Bitmap index ·Semantic similarity · Data integration

1 Introduction

A data integration system provides a uniform query interfacethat gives a user transparent access for querying data sourceslocated remotely. Retrieving data from these interrelateddata sources is a non-trivial task due to their characteristics,i.e. autonomy, heterogeneity, and geographical distribution[1–5]. Moreover, the data sources can join and leave the sys-tem arbitrarily and all the data sources may not have therequired information. In data integration, a mediated schemafor explicit description of the data sources are used in dataintegration systems [1,2,5–7]. The mediated schema has two

123

Page 2: Bitmap Index in Ontology Mapping for Data Integration

860 Arab J Sci Eng (2013) 38:859–873

roles: (1) it provides the user access to the data with a uni-form query interface to facilitate the formulation of a queryon all sources; (2) it serves as a shared vocabulary set forsource description (i.e. contents) of every data source. As afirst step, we employed an integrated ontology [8–10] as themediated schema to reconcile the semantic heterogeneitiesamong the data elements in order to map heterogeneous datafragments into a common frame of reference, enabling thecorrect mix of data from different sources.

To answer a query, posed in terms of the mediated schema,needs to be reformulated into queries that refer directly to theschemas of underlying data sources. Reformulation requiressource description to relate each data source to the medi-ated schema [2,5–7,11]. Source description is the metada-ta of a data source. This metadata can be further dividedinto source metadata and content metadata. In order to makesource description of a data source interoperable in a hetero-geneous environment, the source description is required to bedescribed as ontology [1,5,12]. In this research, we employedontology for each data source and called it local ontology.Generally query reformulation is further sub-divided into twosteps: (a) relevant source selection (b) query rewriting. Thefocus of this research is on relevant source selection. Inter-ested readers can see details on query rewriting in [13]. Oncerelevant sources are being identified then query rewriting isperformed and source specific queries are generated only forthose sources that have been found relevant and can contrib-ute some results to the user’s query.

Ontology mapping is required in data integration systemsto map a concept found in the domain ontology into otherontologies for query processing [11]. Executing a user queryagainst all available data sources, located remotely, is anexpensive solution due to the fact that an available sourcemay not contribute any significant information to the userquery result [5,14,15]. Data source selection is vital becauseexecuting a query on all available data sources degrades theperformance of the query and leads to unreasonable wastageof resources of the data integration systems. In this research,we defined relevance reasoning as a process of identifyingrelevant sources on the basis of the semantic similarity ofcontent metadata and pruning irrelevant data sources. Dataquality criteria [16] for data source selection, where a usercan assign different confidence scores to the available datasources, was not considered in relevance reasoning and allavailable sources were assumed trusted in this research. Tocope with the data quality of sources, our system can beextended, so we leave the issue for future work.

In the data integration system where data sources can joinand leave dynamically, relevance reasoning for source selec-tion becomes expensive as the number of available sourcesgrows and queries get complex. Relevance reasoning forall the available sources degrades the user query perfor-mance. The main contribution of this research is to optimize

the source selection process by employing bitmap indexingbecause indexing structures are used in databases to opti-mize data access [17,18]. We used, for the first time, bit-map index to map local ontologies with domain ontology.In spite of traversing the local ontologies, our proposed sys-tem utilizes the bitmap index to perform relevance reasoningin order to optimize response of the user query. Our pro-posed approach ultimately leads to scalable data integrationsystems where sources can join and leave the system arbi-trarily and the query execution engine can synchronize itselfwith any change and submits the sub-query to the relevantand available data sources. We designed and implementedthe prototype system for validation and evaluated it throughquery response time. Query response time in source selectionwas improved due to the incorporation of bitmap indexing.

The rest of the document is organized as follows: Sect. 2discusses various state of the art algorithms for source selec-tion and Sect. 3 highlights the proposed system architecture.Section 4 elaborates the proposed semantic matching pro-cess and Sect. 5 describes our proposed methodology forsource registration, bitmap index management, and relevancereasoning for source selection. Section 6 discusses a walk-through example to illustrate the proposed system. The evalu-ation of the system is described in Sect. 7. Section 8 concludesthe research work and defines future research directions.

2 Related Work

Data integration systems for query processing have attractedsignificant attention in the literature over the last decade[6,5,11,19–21]. This section starts with the discussion andevaluation of the state of the art algorithms used in data inte-gration systems for the identification of relevant data sourcesduring query processing. We divide the state of the art broadlyinto two subsections.

2.1 Sources Description Expressed in Relational Model

The Bucket Algorithm was used in the Information Man-ifold (IM) [13,22], a system for browsing and querying ofmultiple networked information sources. IM system providesa mechanism to describe the contents and the capabilitiesof data sources in source descriptions in relational model.Bucket algorithm uses source descriptions to create queryplans that can access several information sources to answer aquery. Bucket algorithm first computes the buckets, and thenreformulates the source-specific queries using the buckets ofthose data sources which are relevant. The main complexitiesinvolved in the bucket algorithm include: (a) if the number ofsound data sources is small, the Bucket algorithm may gen-erate a large number of candidate solutions and then rejectthem. (b) The exponential conjunctive query containment test

123

Page 3: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 861

is used to validate each candidate solution. The Inverse-RulesAlgorithm was used in InfoMaster [4,13]. The system createsa virtual data warehouse. Inverse-Rules algorithm rewritesthe definitions of data sources by constructing a set of rules.A set of rules are reformulated for defining the contents andthe capabilities of each data source. During rules construc-tion heterogeneities among the data sources are dealt. Theserules guide the algorithm that how to compute records fromdata sources using source definitions. The algorithm dynam-ically determines an efficient way to answer the user’s queryusing as few sources as necessary. In simple words, they arenot reformulating the query rather they are reformulating thesource definitions so that the original query can be easilyanswered on the reformulated rules.

The MiniCon algorithm [13,23] improved the Bucketalgorithm. The main focus of developing MiniCon algorithmis to pay attention to performance aspects of query reformu-lation algorithms. MiniCon algorithm finds the maximallycontained rewriting of a conjunctive query using a set ofconjunctive views. The algorithm pays attention to the inter-action of the variables in the user query and in the sourcedefinitions to prune the sources that are rejected later inthe containment test. This timely detection of irrelevant datasources improves the performance of MiniCon algorithm dueto small number of combinations to be checked. The Shared-Variable-Bucket algorithm [24] recovers the deficiencies ofthe Bucket algorithm and develops an efficient algorithm forquery reformulation. The key idea underlying this algorithmis to examine the shared variables and reduce the bucket con-tents to reduce view combinations. This reduction ultimatelyoptimized second phase of the algorithm.

In the CoreCover algorithm [25] views are materializedfrom source relations. The main aim of this algorithm is tofind those rewritings, which are guaranteed to produce anoptimal physical plan. Their divergence is mostly towards thequery optimization, therefore different cost models are alsoconsidered in this algorithm. The algorithm is trying to findan equivalent rewriting rather than a contained rewriting. Thealgorithm is different from other query reformulation algo-rithms in the following perspectives. Firstly, it is trying to findan equivalent rewriting whereas all the other algorithms arefinding a maximally-contained source-specific rewriting ofthe query. Secondly, closed-world assumption is taken to findan equivalent rewriting whereas all the other algorithms aretaking open-world assumption. Thirdly, reformulation stageof query processing has to guarantee an optimal plan for thequery.

2.2 Source Description Expressed in Conceptual Model

The SIM [2] system uses a knowledge representation sys-tem i.e., domain ontology and source description. The sys-tem directly links local concepts and objects defined in local

terminology to domain concepts and objects in sourcedescription for the query reformulation. There is no seman-tic matching methodology for source selection because localconcepts and objects are directly linked to domain conceptsand objects. The strategy restricts scalability and flexibilityof the integration system in the case of a large number ofsources and dynamic context. Similar approach for biomed-ical data sources were used in SEMEDA [26]. In [27] thesystem uses ontology-based query reformulation to providea user with meaningful answers to his/her queries. However,this approach is limited to querying a single structured datasource and it does not reformulate a query into subqueries.

Transparent Access to Multiple Bioinformatics Informa-tion Sources (TAMBIS) system [28,29] supports query gen-eration over diverse data sources through domain ontology,which provides sources transparency. User queries are gener-ated through ontology-based graphical user interfaces. How-ever, the ontology is used in the system as a dictionary andclassification of biological concepts not as a schema pri-marily [30]. The ontology is not used for schema mappingbetween the underlying sources. Its procedural mapping inquery translation limits its query optimization capabilities,because it does not depend on sources query capabilities.Optimization is only performed through reordering of querycomponents based on their function’s cost. Biological andChemical Information Integration System (BACIIS) [1,31]is an ontology-based integration system. It provides sourcestransparency in query generation over diverse data sources.The domain ontology is used as mediated schema for the sys-tem. Moreover, the system also maintains the source descrip-tion, which maps the schema of individual sources to thedomain ontology. It reformulates query based on the datarelevance at compile-time. However, it uses static query opti-mization and does not employ cost-based optimization.

BoiMediator [32] also employed a knowledge base thatcontains the mediation schema represented as a hierarchyof concepts and a hierarchy of relations between concepts,annotations to explain how relations between data sources areobtained and maintained, and a catalog describing for eachavailable data source the elements of the mediation schemathey contain. However, the system handles a small numberof sources that can be queried and described in the knowl-edge base. In BioRegistry [15], the most appropriate datasources, for a given user query are identified and selectedamong all the existing data sources. The system employs aninformation retrieval (IR) approach through formal conceptanalysis (FCA) where data sources instead of documents aresearched and indexation is based on metadata reflecting infor-mation about sources (source metadata) rather than on dataextracted from the documents. Their approach is limited tobinary relationship between sources and metadata conceptsand the relevance between a query and data sources is estab-lished by sharing concepts in their sets of metadata concepts.

123

Page 4: Bitmap Index in Ontology Mapping for Data Integration

862 Arab J Sci Eng (2013) 38:859–873

The semantic relationship among concepts has not been con-sidered in their system, which is an essential component ofsemantic integration.

Wang et al. [12] used Resource Description Framework(RDF) [33] ontology as mediated schema for explicit descrip-tion of the data sources semantics, provided a shared vocab-ulary for the specification of the semantics. The system has aset of source descriptions that specify the semantic mappingbetween the mediated schema and the source schemas anduses these source descriptions to reformulate a user queryinto a query over the source schema. The system supports asmall number of pre-selected sources. Other examples suchas BIS [34], BioDataServer [35], HKIS [14] also illustratethat automatic source-query matching in mediation platformsyet only addresses a small number of pre-selected sources.

2.3 Critical Analysis

The existing algorithms/systems emphasize on selecting rel-evant sources before query rewriting in order to avoid execut-ing a user query on all available data sources. However, noneof the algorithms/systems pay attention to fast and efficienttraversal of source descriptions in source selection. Rele-vant source selection becomes complex when (i) number ofsources grows, there metadata information also grows and(ii) user queries get complex. Then relevance reasoning onall available data sources degrades the performance of theuser query. How to reduce the search space of metadata in

the process of relevance reasoning to make this whole pro-cess more efficient? The main contribution of this research isto provide a mechanism for optimizing relevance reasoningon source selection that identifies the most effective and rel-evant data sources in a data integration system.

3 Proposed Architecture

In order to execute a user’s query in a scalable data integrationsystem proposed in [1,5,15,36], the query execution processneeds to be optimized. We have proposed an ontology-drivenrelevance reasoning architecture to improve response timefor a user query during relevance reasoning. The proposedarchitecture, as shown in Fig. 1, comprises following differ-ent components.

Domain ontology The Domain (a.k.a. global) ontology isa knowledge-base in the proposed architecture. This helpsin generating user queries and enabling semantic inference.Major components of the domain ontology are: (1) domainknowledge, represents domain of discourse in the form ofResource Description Framework (RDF) triples [33]. EachRDF triple is uniquely identified by the global unique iden-tifier (GUID). GUIDs are used in semantic indexing schemefor relevance reasoning; (2) concepts and relationships hier-archies, represents semantic relationships among conceptsand relationships, respectively. These hierarchies help inresolving semantic heterogeneity that exists in a domain;

Fig. 1 Proposed architecturefor relevance reasoning in a dataintegration system

123

Page 5: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 863

(3) rule-base, a rule is an object that can be applied to deduceinference from RDF triples. Every rule is identified by itsname and consisted of two parts: (a) an antecedent, whichis known as body of the rule and (b) a consequent which isknown as head of the rule. The rule-base is an object thatconsists of rules; (4) rules-index, computes and maintainsdeduced inferences by applying a specific set of rule-basesin order to optimize reasoning.

Ontology management service Ontology management ser-vice facilitates the creation and maintenance of the domainontology. It provides a set of application program interfaces(APIs) to perform the following functionality: (1) publishesthe domain knowledge in the form of RDF triples by assign-ing GUIDs to the RDF metadata triples and mapping GUIDsover the bitmap index; (2) defines semantic operators andconstructs concept and relationship hierarchies; (3) providesa mechanism to create and drop a rule-base and modifiesthe set of rules from a rule-base; (4) enables the creation andmaintenance of the rules-index and synchronizes it after rulesare modified into the rule-base.

Source descriptions storage (SDS) Source description isthe metadata of a data source. In order to make source descrip-tion of a data source interoperable in a heterogeneous envi-ronment, they are described in a conceptual model in theform of a local ontology [5]. The metadata of a data sourceis expressed as RDF triples in the local ontology. These RDFtriples are assigned local unique identifiers (LUIDs) using asequence generating object of each data source. In a nutshell,we can say that source descriptions storage is a set of localontologies.

Source registration service Source registration servicefacilitates the creation and maintenance of a local ontologyfor a data source in the source description storage. It providesa set of application program interfaces (APIs) to performthe following functionalities: (1) creates a unique sequencenumber generating object for the incoming data source, (2)creates a local ontology to hold the RDF triples advertised bya data source, (3) registers the local ontology into the sourcedescription storage, (5) inserts the RDF triples of the datasource into its corresponding local ontology.

Bitmap index storage A bitmap index is a cross-tab struc-ture of bits [17,18]. We employ bitmap index for efficient tra-versal during relevance reasoning. A bitmap index is dividedinto bitmap-segments. Internally, data in the bitmap segmentis represented in the form of bits. Each data source retains onebitmap segment over the bitmap index. In the proposed archi-tecture, data sources are represented on vertical side of theindex whereas RDF triples of the domain ontology are repre-sented on horizontal side of the index. A bit state is unset i.e.,0 if a data source does not contain the corresponding RDFtriple and is set i.e., 1 if a data source contains correspondingRDF triple. A sequence number generating object is used toassign a unique identifier to each bitmap segment.

Index management service Index management servicefacilitates the creation and maintenance of a bitmap segmentfor a data source in the bitmap index storage. It provides a setof application program interfaces (APIs) to perform follow-ing functionalities: (1) bitmap segment creation creates thebitmap segment for an incoming data source and initializesall bits of the bitmap segment to 0 (means unset); (2) bitmapsynchronization updates the bitmap segment of a data sourceconsistent against its local ontology; (3) shuffle bit shufflesthe bits of a bitmap segment during synchronization.

Index look-up service Index look-up service facilitates anefficient traversal of the bitmap index. It provides a set ofapplication program interfaces (APIs) to perform followingfunctionalities: (1) relevant source identification traversesthe bitmap index against the RDF triple and identifies thebitmap segments where the bit is set; (2) irrelevant sourcepruning traverses the bitmap index against the RDF tripleand identifies the irrelevant bitmap segments where the bit isunset.

Ontology reasoning service Ontology Reasoning Serviceenables the reasoning and inference capabilities to the pro-posed architecture. It provides a set of application programinterfaces (APIs) to perform the following functionalities.(1) Semantic matching is the process of finding semanticsimilarity among the different terms (concepts and relation-ships) in order to resolve the semantic heterogeneity. (2)Inference and reasoning provides reasoning and inferenceto the semantic matching process by incorporating rules,rules-base, and rules-index. (3) Semantic query generationgenerates queries against the domain ontology using seman-tic operators during the semantic matching. Note that thesequeries are different from the user query so these should notbe mixed.

Relevance reasoning service Relevance reasoning serviceidentifies relevant and effective data sources for a query usingindex look-up service from bitmap index. It provides a set ofapplication program interfaces (APIs) to perform followingfunctionalities. (1) Semantic query expansion expands a userquery to semantically relevant RDF triples. (2) Relevancereasoning identifies relevant and effective data sources fora given user’s query. (3) Relevance ranking ranks the datasources for a given user query based on the semantic simi-larity score obtained.

4 Relevance Levels and Term Similarity

During the semantic matching, the terms of user’s querytriples are matched with the terms of source triples. As aresult, one of the five relevance levels can be obtained foreach term. These relevance levels are given numeric scoresfor the purpose of quantification that will help us to rank asource for a given query. Following are the definitions and

123

Page 6: Bitmap Index in Ontology Mapping for Data Integration

864 Arab J Sci Eng (2013) 38:859–873

explanations of our proposed relevance levels and operatorsused in semantic matching process. The prefixes nust, seecs,eme, and nbs refer to URLs http://www.nust.edu.pk, http://www.seecs.nust.edu.pk/, http://www.nbs.nust.edu.pk/, andhttp://ceme.nust.edu.pk/ respectively throughout the paper.

4.1 Relevance Levels

Exact matching (α): A term is exact match of another term ifand only if both are lexically equal to each other. For examplea term nust:Instructor is an exact match of niit:Instructor. Anumeric score of 1 is assigned to any exact matching termsas soon as they appear in RDF triple.

Synonym matching (β): It is unrealistic to assume that thesame name will be used for a concept in a domain. An explicitspecification of synonyms using some operator is required.Therefore synonyms are the terms that are different lexicallybut have the same meaning. For example a term nust:Instruc-tor is synonym of the another term seecs:Teacher. A numericscore of 0.8 is assigned to any synonym matching terms assoon as they appear in RDF triple. We are using owl:sameAsoperator for specifying β mappings in the rule-base of thedomain ontology.

Subclass matching (γ ): In some scenarios taxonomiesmight be used for the purpose of knowledge representationwhere generic concepts subsume specific concepts. In orderto cope with subsumption relationship, some operators arerequired for explicit specification. Therefore, a term is a sub-class of another term if and only if it is subsumed by thatterm. For example nust:Employee might subsume the see-cs:Instructor. A numeric score of 0.6 is assigned to any ofthe subclass matching terms as soon as they appear in RDFtriple. We are using rdf:subClassOf operator for specifyingγ mappings in the rule-base of domain ontology.

Degree of likelihood (ω): In some situations data sourcesmight contain concepts that are not totally disjoint or differentrather they would be related to some other term with somedegree of likelihood. For example the term nust:Instructormight be relevant to nust:TeacherAssistant with some degreeof likelihood. This type of mappings cannot be specifiedusing previously defined operators. A numeric score of 0.5 isassigned to any likelihood based similar terms as soon as theyappear in RDF triple. We are using owl:equivalentOf oper-ator for specifying ω mappings in the rule base of domainontology.

Disjoint (φ) A term is disjoint from another term if andonly if they are different from each other. For example theterm nust:Instructor is disjoint from nust:Student. A numericscore of 0.0 is assigned any disjoint terms as soon as theyappear in any components of RDF triple. These relevancelevels and their scoring strategies can be summarized inTable 1.

Table 1 Relevance levels and scoring strategy

# Level Notation Score

1 exactmatch α 1.02 sameAs β 0.83 subClassOf γ 0.64 equivalentOf ω 0.55 disjointFrom φ 0.0

4.2 Terms Similarity

We use the same semantic matching strategy for both con-cepts and relationships. We have concept hierarchy and relationship hierarchy. Terms include both concepts and rela-tionship. We extract the relationship between the query andsource terms using their respective hierarchies and thenassign standard relevance score as defined in the Table 1.An RDF triple contains the subject, predicate, and object.Subject and object are considered as concepts; thereby theirsimilarity is computed using concepts hierarchy, whereas tocalculate the predicate similarity, the relationship hierarchyis used.

RDF triple similarity To calculate the relevance betweenuser query and source RDF triples, we combine both aspectsof term similarity (i.e., concepts and relationships). The over-all RDF triple similarity can be calculated as shown in theEq. 1. Where qT denotes the query triple and s denotes sourcetriples. qt and st are query and source terms respectively thatare to be matched, Sim (qT , s) the overall similarity of asingle query triple for a given source. Here i and j repre-sent ith and jth source RDF triples and query triple termsrespectively.

sim (qT , s) =n∏

i=0

2∏

j=0

simt (q jt , si j

t ) (1)

Source ranking A user query and source RDF triples arematched to find the similarity of each query triple with datasource triples. Once RDF triple similarity has been computed,source score of the whole query is being computed using theformula given in Eq. 2. Based on the score obtained for aquery, data sources are ranked.

simsrc =n∏

i=0

sim (qi , s) (2)

In the above equation, simsrc is the total score of a sources for a user query (obtained by multiplying the similarityscore of all query triples). qi denotes the query triples and ndenotes the total number of query triples.

123

Page 7: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 865

5 Proposed Semantic Matching Methodology

This section discusses our proposed methodology for rel-evance reasoning to identify the most relevant and effec-tive data sources using a bitmap index. Our proposedmethodology can be divided into three main workflows.These workflows help to understand the intricacies of theproposed architecture. Below is the detail discussion of eachworkflow.

5.1 Ontology Management Workflow

Ontology management workflow manages the domain ontol-ogy in the architecture. Ontology management service playsa prominent part in this workflow. Four major activities car-ried out by ontology management workflow include:

• Domain knowledge representation• Concept and relationship hierarchy representation• Rules and rules-based management• Rules–index management

Domain knowledge representation is the registration of theRDF triples over the domain ontology. These RDF triplesare stored in the domain ontology and GUIDs are assignedusing a unique sequence number generation object. GUIDsare allocated positions over the bitmap index. Transac-tions are permanently recorded to the domain ontology. Thepseudo-code for insertion of RDF triple in the domain isshown in Algorithm 1.

Algorithm 1: Pseudo-code for RDF Triple registration of Domain Ontology

1. For each RDF triple of domain ontology

2. Assign GUID to RDF triple

3. Add RDF triple to the domain ontology

4. Extend bitmap index

5. Increase the length of bitmap pattern by one

6. Assign location to the RDF triple reserved over the bitmap index

7. Perform commit to apply changes persistently to domain ontology

Concept and relationship hierarchy representationinvolves the definition of semantic operators and then usingthese operators to build their respective hierarchies. Theseoperators include sameAs, equivalentOf, subClassOf, anddisjointFrom, as explained in the previous section. RDF tri-ples are added to the domain ontology to represent the con-cept and relationship hierarchy. Bitmap index is not main-tained for these RDF triples.

Rules and rules-based management involves the creationof the rules-base and then inserting rules into the rules-base. In order to reduce mappings among the hierarchies andincrease the inference capabilities of rule-base, two rules areinserted for each relevance level defined in the previous sec-tion. These rules are InverseOf and TransitiveOf . InverseOfrule tells the rule-base that if a terms A is related to another

term Bwith relation R, and then B is inversely related to A.InverseOf rule for sameAs relevance level operator is shownin the N3 representation of the semantic web rule languageas below.

TransitiveOf rule tells the rule-base that if a term A isrelated to another term B with some relation R, and the sameterm B is further related to another term C using the rela-tion R, it implies that the term A is related to term C usingthe same relation R. TransitiveOf rule for sameAs relevancelevel operator is shown in the N3 representation of the in thesemantic web rule language as below.

Rules–index management involves the creation and man-agement of the rules-index for a rules-base. Once the rules areinserted into the rules-base, the corresponding rules-index isrefreshed to pre-compute RDF triples.

5.2 Source Registration Workflow

Source registration workflow registers the data sources in thedata integration system. Three major activities carried out bysource registration workflow include:

• Local ontology creation• Bitmap segment creation• Bitmap synchronization

Local ontology creation involves the creation of local ontol-ogy for incoming data source, a unique sequence number gen-erator object along with the insertion of RDF triples over thecreated ontology. Source registration service plays a promi-nent part in local ontology creation. Ontology is created forthe incoming data source and is registered with the sourcedescriptions storage. The RDF triples, advertised by the datasource, are assigned unique identifiers (LUIDs) and are addedto the local ontology. Transactions are permanently recordedto the source descriptions storage. The pseudo-code for localontology creation and its RDF triples insertion is shown inAlgorithm 2.

Algorithm 2: Pseudo-code for Local Ontology Creation

1. Creating ontology for incoming source in Source Descriptions Storage

2. Creating unique sequence generator for incoming source RDF triples

3. Assign LUIDs to the RDF triples

4. Add RDF triple to the local ontology in Source Descriptions Storage

5. Perform commit to apply changes persistently to domain ontology

Bitmap segment creation involves the cloning of bitmappattern and the creation of bitmap segment for incomingdata sources over the bitmap index. The index management

123

Page 8: Bitmap Index in Ontology Mapping for Data Integration

866 Arab J Sci Eng (2013) 38:859–873

service plays a prominent role in bitmap segment creation.The bitmap pattern is stored over the domain ontology. Itis cloned for the newly created bitmap segment. Initially allthe bits are initialized to unset i.e., 0. A unique identifier isassigned to the bitmap segment and is added to the bitmapindex. The pseudo-code for local ontology creation and itsRDF triples insertion is shown in Algorithm 3.

Algorithm 3: Pseudo-code for Bitmap Segment Creation

1. Check whether bitmap segment exists for the incoming source

2. If (no)

a. Clone bitmap pattern from domain ontology RDF triples

b. Initialize bits to zero (0)

c. Assign a unique number to the bitmap segment

d. Add bitmap segment to the bitmap index for incoming source

3. Perform commit to apply changes persistently in index

Bitmap Synchronization involves plotting the RDF triplesof a data source consistently and correctly by shuffling thebits in its bitmap segment. The index management serviceplays a prominent role by spawning a listener process thatlistens for any invalidation (those changes in local ontologythat are not propagated and plotted over the bitmap index) inthe source descriptions storage. If any invalidation is found,it starts index synchronization. During synchronization RDFtriples of the data source are fetched. Every RDF triple isdecomposed into terms (subject, predicate, and object) andgiven to ontology reasoning service. The ontology reason-ing service performs reasoning and inference that helps theindex management service to extracts GUIDs for the corre-sponding RDF triple. The position of the GUIDs is identifiedover the bitmap index and the bits are shuffled accordingly.The pseudo-code for the bitmap synchronization is shown inAlgorithm 4.

Algorithm 4: Pseudo-code for Bitmap Synchronization

1. For each incoming RDF triple advertised by a data source

a. Decompose RDF triple into its components

b. Perform reasoning for semantic similarity

c. Extract GUID for the corresponding RDF triple

d. Identify its position over the bitmap index

e. Fetch the bitmap segment for the data source

f. Shuffle the bit to 1 at the corresponding position in a the bitmap segment

2. Perform commit to apply changes persistently in index

5.3 Relevance Reasoning Workflow

Relevance reasoning workflow includes the steps that are car-ried out to identify the relevant and effective data sources forthe user’s query. Relevance reasoning service plays a prom-inent part in this workflow. It incorporates with the indexlook-up service and ontology reasoning service during rele-vance reasoning to perform the following activities.

• Semantic Query Expansion• Source Selection

• Source Ranking

Semantic query expansion A user submits the query in RDF,which is passed to the relevance reasoning service. The RDFtriples that are entered by the user into a query are calledasserted query triples. A user can submit queries in domainontology terms as well as local ontology terms of their under-lying data sources. Relevance reasoning service expands theuser query to all possible combinations using ontology rea-soning service. Every term of the query triple is expandedusing semantic relevance level operators for synonyms, lex-ical variants, subsumption, and degree of likelihood. Thisexpansion results in the addition of some extra triples to theuser query. These RDF triples are called inferred query tri-ples. The pseudo-code for the semantic query expansion isshown in Algorithm 5.

Algorithm 5: Pseudo-code for Query Expansion in Relevance Reasoning

1. InferredTriplesList = Ø

2. For each RDF triple in AssertedTripleList of user’s query

a. Isolate subject, object, and property of current RDF triple

i. Calculate semantic similarity and add relevant term for the subject of RDF triple

ii. Calculate semantic similarity and add relevant term for the property of RDF triple

iii. Calculate semantic similarity and add relevant term for the object of RDF triple

b. Take Cartesian product of terms

c. Populate InferredTriplesList of the Cartesian product

d. Return InferredTriplesList

Source selection Once the query is expanded with seman-tically relevant RDF triples, the GUIDs are reconciled fromthe domain ontology. GUIDs help to find out the positionof RDF triples over the bitmap index. These positions arepassed to the index look-up service, which traverses the bit-map segments of each source at the corresponding positionsand identifies the data sources for which the bits are set. Thepseudo-code for the source selection is shown in Algorithm 6.

Algorithm 6: Pseudo-code for Source Selection in Relevance Reasoning

1. RelevantSourceList = Ø

2. For each RDF triple in users query [Asserted + Inferred]

a. Reconcile GUID for incoming RDF triple from domain ontology

b. Identify Bitmap location of the RDF triple using GUID

c. Pass bitmap location to Index look-up service

d. Traverse bitmap segments at corresponding location to identify relevant sources

e. Add sources to RelevantSourceList

3. Return RelevantSourceList

Source ranking The identified data sources are rankedaccording to their relevance to the user query on the basisour proposed scoring scheme shown in Table 1. Initially termsimilarity is computed for a component of a query RDF triplein a given source. Once term similarity is computed it is usedin the Equation 1 to compute RDF triple similarity. Finallysource similarity is computed by Equation 2 and sources areranked according to the score obtained for a given user query.

6 Walk-Through Example

We are using a portion of the famous university ontology asan example. In the scenario, we have domain ontology with

123

Page 9: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 867

isTeaching

TeachingAssistant

Instructor

isAssisting

Department

Course

Student

worksIn

hasMajor

isAdvisorOf isRegisteredIn

Fig. 2 Structure of the NUST_DB

name NUST_DB as shown in Fig. 2, and three data sourcesnamed EME_DB, NBS_DB, and SEECS_DB. The RDF tri-ples of the domain ontology are shown in Table 2.

The RDF triples of the domain ontology forms basis forthe bitmap indexing in our proposed architecture. The patternof the index can be illustrated as shown in Table 3.

In order to manage concepts and relationship hierarchies,the semantic matching operators defined are sameAs, equiv-alentOf, subClassOf, and disjointFrom. The concepts likenust:Instructor is mapped with the concept seecs:Lecturerusing subClassOf operator in order to specify subsump-tion relationships. The term nust:Course is mapped with

the term nust:Subject using sameAs operator in order tospecify synonyms and lexical variants. Similarly nust:Instruc-tor is mapped with nust:TeachingAssistant using equiva-lentOf operator in order to specify degree of likelihoodand so on. Relationship hierarchies are also managedaccordingly. These hierarchies can be illustrated as shownin Fig. 3.

Three local ontologies are being created for the datasources with the naming convention like < DataSource >

_RDF_Data. There are semantic heterogeneities betweenthe contents of the data sources. Table 4 describes the RDFtriples of the sources stored in their respective ontologies.

Once the local ontologies are being created, the indexmanagement service comes into play and creates the bitmapsegments in the bitmap index for the data sources and plots(synchronizes) the RDF triples of the data sources in theirrespective bitmap segments. During synchronization, indexmanagement service also resolves the semantic heterogene-ities. The structure of the bitmap index can be illustrated asshown in the Table 5.

Suppose, a user query contains RDF triple i.e., <In-structor is Teaching Course>. Relevance reasoning service

Table 2 RDF triples of thedomain Ontology GUID RDF triples

nust-1000001 <nust : Instructor,nust:isTeaching,nust :Course>nust-1000002 <nust : Instructor,nust:isAdvisorOf,nust :Student>nust-1000003 <nust :Student,nust:isRegisteredIn,nust :Course>nust-1000004 <nust :Student,nust:hasMajor,nust :Department>nust-1000005 <nust : Instructor,nust:worksIn,nust :Department>nust-1000006 <nust :Teacher Assitant,nust:isAssisting,nust :Course>

Table 3 Structure of bitmap index

Source-segment Position-1 Position-2 Position-3 Position-4 Position-5 Position-6

xxxxxxxxx nust-1000001 nust-1000002 nust-1000003 nust-1000004 nust-1000005 nust -1000006

isTeaching

Teaching Teaches

sameAs sameAs

TeachingAssistant

Student

subClassofisAssistin

Subject

Course

sameAs

Instructor

Professor

Teacher

Lecturer

Prof

subClassof

sameAs

subClassof

subClassof

ExactMatch

Fig. 3 Concept and relationship hierarchies managed using semantic operators over domain ontology

123

Page 10: Bitmap Index in Ontology Mapping for Data Integration

868 Arab J Sci Eng (2013) 38:859–873

Table 4 RDF triples of the datasources Local Link-ID RDF triples

EME_RDF_Dataeme -1011 < eme:Professor, eme:Teaches, eme:Subject >eme -1012 < eme:Professor, eme:Advises, eme:Student >eme -1013 < eme:Student, eme:RegisteredIn, eme:Subject >

NBS_RDF_Datanbs - 2011 < nbs:Teacher, nbs:isAdvisorOf, nbs:Student >nbs - 2012 < nbs:Teacher, nbs:WorksIn, nbs:Department >nbs - 2013 < nbs:Student, nbs:hasMajor, nbs:Department >

SEECS_RDF_Dataseecs - 3011 < seecs:Lecturer, seecs:isTeaching, seecs:Course >seecs - 3012 < seecs:TeachingAssistant, seecs:isAsssting, seecs:Course >

Table 5 Structure of bitmap index after sources are registered

Source-segment nust-1000001 nust-1000001 nust-1000001 nust-1000001 nust-1000001 nust-1000001

ME-DB 1 1 1 0 0 0NBS-DB 0 1 0 1 1 0SEECS-DB 1 0 1 0 0 1

Table 6 Buckets created for the RDF triples

Semantic Operator Used Subject Bucket for “Instructor” Property Bucket for “isTeaching” Property Bucket for “Course”Terms Deduced Terms Deduced Terms Deduced

exactMatch Instructor isTeaching CoursesameAs NULL Teaching, Teaches SubjectsubClassOf Professor, Prof, Lecturer, Teacher NULL NULLequivalentOf TeachingAssistant isAssisting NULL

decomposes this triple into its terms and creates three buck-ets i.e., one for the subject, one for the property, and one forthe object. Each term is given to ontology reasoning serviceto calculate its semantic similarity in their respective hier-archies to find relevant terms. The buckets are populated asshown in the Table6.

The Cartesian product of subject, property and object istaken to construct inferred triple list. Table 7 shows theirCartesian product.

In order to execute a query over the bitmap index, GUIDsare needed. The RDF triple is rejected, if no GUID is avail-able for it in the domain ontology. In this example, GUIDnust-1000001 and nust-1000006 are fetched from the domainontology. These GUIDs are passed to the index look-up ser-vice to identify relevant and effective data sources. The indexlook-up service traverses the bitmap index for only theseGUIDs and returns all bitmap segments where the bits areset i.e., EME-DB, and SEECS-DB. In order to sort the datasources based on their relevance to the query triples, seman-tic similarity scoring is incorporated as shown in Table 1.First the term similarity is computed for the query tripleswith data source triples using the concept and relationshiphierarchies.

EME-DB scores 0.6 for matching subject of the querytriple Instructor with subject of the source triples Profes-sor. The concept’s hierarchy returns subClassOf relation-ship between these terms. Next properties of the queryand source triples are matched and scores 0.8 for matchingthe respective properties isTeaching and Teaches, becausethey are connected by sameAs relationship. Finally objectof the query and source triples are matched, which scores0.8 for matching the respective objects Course and Sub-ject. SEECS-DB scores 0.6 for matching the subject ofthe query triple: Instructor with the subject of the sourcetriple: Lecturer. The concept hierarchy returns subClas-sOf relationship for this match. This data source scores1 for matching the property isTeaching with query prop-erty isTeaching. Finally scores 1 for matching the respec-tive objects Course and Course. SEECS-DB also contains atriple that is relevant to the query triple with some degreeof likelihood i.e., nust-1000006. The relevance of a datasource for every query triple is calculated by putting theterm similarity scores into the Equation 1 and is shown inTable 8.

Finally, the overall similarity score of a data source for auser’s query is calculated by using the Eq. 2 and is shown in

123

Page 11: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 869

Table 7 Inferred RDF triplesfor a user’s query triple Subject Property Object

< Instructor > < isTeaching > < Course >< Instructor > < Teaching > < Course >< Instructor > < Teaches > < Course >< Instructor > < isAssisting > < Course >

…< Instructor > < isAssisting > < Subject >< Professor > < isTeaching >, < Course >< Professor > < Teaching > < Course >< Professor > < Teaches > < Course >

…< Professor > < isAssisting >, < Subject >< Prof > < isTeaching >, < Course >< Prof > < Teaching > < Course >< Prof > < isAssisting > < Subject >< Lecturer > < isTeaching >, < Course >< Lecturer > < Teaching > < Course >< Lecturer > < Teaches > < Course >

…< Teacher > < Teaching > < Course >< Teacher > < Teaches > < Course >

…< Teacher > < isAssisting > < Subject >< TeachingAssistant > < isTeaching >, < Course >< TeachingAssistant > < Teaching > < Course >< TeachingAssistant > < Teaches > < Course >< TeachingAssistant >, < isAssisting > < Course >

…< TeachingAssistant >, < isAssisting > < Subject >

Table 8 Semantic similarity calculation of a data source for a user query triple

Relevant data source GUIDs Term similarity Source similarityfor query triple (qT )

sim (subject) sim (property) sim (object)

EME-DB nust-1000001 0.6 0.8 0.8 0.384SEECS-DB nust-1000001 0.6 1 1 0.6

nust-1000006 0.5 0.5 1 0.25

Table 9 Semantic similaritycalculation of a data source foruser query

Relevant data source GUIDs Source similarity for query triple(qT )

EME-DB nust-1000001 0.384Total source similarity for user query (simE M E ) (0.384)SEECS-DB nust-1000001 0.6

nust-1000006 0.25Total source similarity for user query (simSE EC S) (0.85)

Table 9. These sources are sorted and given to query rewritingcomponent.

In a nutshell, we have explained our proposed architec-ture of relevance reasoning for source selection in data inte-gration. Different workflows are highlighted and semanticmatching methodology has been explained using the walk-through example.

7 Results and Evaluation

In this section, we evaluate the results of the proposed proto-type system. We identify main evaluation criteria, the detailsof data set, the query structure and results of the experimentscarried through the system. The main aim of this evalua-tion is to validate whether the proposed architecture for the

123

Page 12: Bitmap Index in Ontology Mapping for Data Integration

870 Arab J Sci Eng (2013) 38:859–873

relevance reasoning can scale up to a large number of datasources and complex queries.

7.1 Evaluation Criteria

The main focus of this research was to provide a mechanismfor optimal relevance reasoning on source selection in dataintegration system. To achieve this goal, we are interested inevaluating the efficiency of the proposed prototype system.Our evaluation criteria was response time of query executionin order to ensure that the manipulation of RDF triples doesnot mitigate query response time during relevance reasoningas the number of sources increases for the system and thequeries get complex.

7.2 Testbed Specification

The specifications of system used in the evaluation of pro-totype were processor 2.4 GHz, RAM 1G, HDD 80 GB,OS Windows 2003 (service pack 2), Oracle spatial database10 g R2 NDM, and language PL/SQL. The experiments werecarried out with a corpus of manually generated 100 datasources. Each data source contained 30–50 RDF triples. Thefamous university ontology was used in the experiment as thedomain ontology [10]. We have executed 35 different queriesrelated to the students, faculty, and research associates data.Among these 35 queries, we selected 3 queries; having 3, 6,and 9 RDF triples respectively to test the system efficiency bychecking query response time. These three selected queriesare as below:

Query 1: find name of all instructors who are teaching acourse to the same student, whom they are advisers.

Query 2: find instructor-name, instructor-gender, and areaof specialization of all instructors whether they are in staffor students.

Fig. 5 Time complexity of system for query with 6 triples

Fig. 6 Time complexity of system for query with 9 triples

Query 3: find instructor-name, instructor-gender, and areaof specialization of all instructors whether they are in staffor student and student does not have major department asadviser working department.

Fig. 4 Time complexity ofsystem for query with 3 triples

123

Page 13: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 871

Fig. 7 Performance gain of thesystem with respect to directontology traversal

7.3 Experiments for Response Time of Query Execution

For evaluating the performance, we evaluated the proto-type system for query response time from three dimensions.Firstly, queries were executed against the local ontologiesof data sources. We assessed the time taken by the relevancereasoner to traverse local ontologies for relevant source selec-tion. Second, as our proposed methodology employs bitmapindex where source descriptions are mapped semantically inthe bitmap segments as bits, we submitted the queries to rel-evance reasoner using bitmap index and assessed the timetaken using bitmap index. Finally, we extended the bitmapindex and implemented function-based indexing [37,38] overit and then analyzed the performance of the system. Figures4, 5, and 6 illustrate the performance of the system with the3 selected queries containing 3, 6, and 9 triples shown in theprevious subsection.

The observations showed that there is a comparative per-formance gain running queries on source descriptions withbitmap index than running them directly to source descrip-tions. Significant performance gain observed while searchingrelevant sources using extended bitmap index from both pre-viously discussed approaches. Figure 7 shows a comparisonof performance gain using extended bitmap index than thesimple bitmap index.

8 Conclusion and Future Directions

The proposed system’s methodology has three workflows;(1) Ontology Management Workflow, (2) Source Registra-tion Workflow, and (3) Query Execution Workflow. Thisdivision helps to understand the functionality of various com-ponents in the system along with their inter-dependenceon each other. The ontology management workflow andthe source registration workflow set the stage for relevance

reasoning in the system. The ontology management work-flow publishes the domain knowledge in the form of RDFin domain ontology. It creates the concept and relationshiphierarchies using the semantic operators. It also creates therule-base to define rule and manage rule index to performinference and reasoning during the semantic matching pro-cess. Source registration workflow manages the local ontol-ogies of data source in the source description storage. Asthe new sources enter and leave the system, Index manage-ment service synchronizes the bitmap index to reflect thenew status of the source description storage. In order toanswer the queries precisely, bitmap index need to be syn-chronized/updated with source description storage.

Query execution workflow takes the user’s query formu-lated in RDF triples and identifies the most effective andrelevant data sources for the given query. During relevancereasoning queries are expanded using the inference drawnfrom the ontology reasoning service. It calculates the seman-tic similarity between the query and source RDF triples andidentifies the relevant and effective data sources. Relevantdata sources are ranked based on the similarity score theyobtained for the user query. The sorted list of relevant andeffective data sources are returned to the query rewriting com-ponent that reformulates the queries for these relevant datasources.

The main contribution of this research is to optimize rel-evance reasoning on source selection that identifies the mosteffective and relevant data sources for user’s query beforeexecuting it. In our proposed system, we plotted the localontologies of the data sources over the bitmap index. In spiteof traversing the local ontologies in relevance reasoning, weuse bitmap index to perform the relevance reasoning. Thetime complexity evaluation showed that bitmap indexing per-forms the relevance reasoning for source selection in a com-paratively shorter time.

The proposed relevance processes for source selectioncan be extended through data quality criteria for data sourceselection, where a user can assign different confidence scoresto the available data sources. Moreover, our focus in thisresearch was on centralized bitmap indexing in data inte-gration systems where single domain ontology is presidingover some node and queries are reformulated over it. AsP2P DBMSs are evolving and data integration is also getting

123

Page 14: Bitmap Index in Ontology Mapping for Data Integration

872 Arab J Sci Eng (2013) 38:859–873

popular in these domains, therefore in the future this sys-tem can be extended to meet the requirements of P2P dataintegration. Index partitions can reside on each peer and col-lectively they all will participate in relevance reasoning forsource selection during query processing.

Acknowledgments We would like to thank anonymous reviewers forthoughtful and helpful comments on earlier drafts of the paper whichhelped improve the paper.

References

1. Ben-Miled, Z.; Li, N.; et al.: On the integration of a large numberof life science web databases. In: Proceedings of the 1st Interna-tional Workshop on Data Integration in the Life Sciences (DILS),pp. 172–186. Springer, Berlin (2004)

2. Arens, Y.; Knoblock, C.A.; Shen, W.: Query reformulation fordynamic information integration. J. Intell. Inf. Syst. 6(2/3), 99–130 (1996)

3. Wache, H.; Vogele, T.; et al.: Ontology-based integration of infor-mation—a survey of existing approaches. In: Proceedings of theIJCAI Workshop on Ontologies and Information Sharing, pp. 108–118. CEUR (2001)

4. Halevy, A.; Rajaraman, A.; Ordille, J.: Data integration: the teen-age years. In: Proceedings of the 32nd International Conference onVery large Data Bases, VLDB ’06, pp. 9–16. VLDB Endowment(2006)

5. Khan, S.; Morvan, F.: Identifying relevant sources in query refor-mulation. In: iiWAS, pp. 357–366 (2006)

6. Mena, E.; Kashyap, V.; et al.: Observer: An approach for queryprocessing in global information systems based on interoperationacross pre-existing ontologies. In: CoopIS, pp. 14–25 (1996)

7. Zhong, J.; Zhu, H.; et al.: Conceptual graph matching for seman-tic search. In: Proceedinds of the 10th International Conferenceon Conceptual Structure (ICCS). LNCS, vol. 2393, pp. 92–106,Bulgaria. Springer, Berlin (2002)

8. Gruber, T.: A translation approach to portable ontology specifica-tions. Knowl. Acquisit. 5(2), 199–220 (1993)

9. Gruber, T.R.: Towards principles for the design of ontologies usedfor knowledge sharing. Int. J. Human Comput. Stud. 43(5–6), 907–928 (1995)

10. Gruber, T.R.; Olsen, G.R.: An ontology for engineering mathemat-ics. In: Proceedings of the 4th International Conference on Prin-ciples of Knowledge Representation and Reasoning, pp. 258–269,San Mateo, CA, USA (1994)

11. Noy, N.F.: Semantic integration: a survey of ontology-basedapproaches. SIGMOD Record 33, 65–70 (2004)

12. Wang, J.; Lu, J.; et al.: Integrating heterogeneous data source usingontology. J. Softw. 4(8), 843–850 (2009)

13. Halevy, A.Y.: Answering queries using views: a survey. VLDB J.10(4), 270–294 (2001)

14. Boulakia, S.C.; Lair, S.; et al.: Selecting biomedical data sourcesaccording to user preferences. Bioinformatics 20(Suppl.), i86–i93(2004)

15. Messai, N.; Devignes, M.D.; et al.: Querying a bioinformatic datasources registry with concept lattices. In: Proceedings of the 13thInternational Conference on Conceptual Structures (ICCS). LNAI,vol. 3596, pp. 323–336. Springer, Berlin (2005)

16. Naumann, F.; Leser, U.; Freytag, J.C.: Quality-driven integration ofheterogeneous information systems. In: Proceedings of 25th Inter-national Conference on Very Large Data Bases (VLDB), Scotland,pp. 447–458 (1999)

17. Li, X.; Bian, F.; Zhang, H.; et al.: Mind: a distributed multi-dimen-sional indexing system for network diagnosis. In: INFOCOM 2006.Proceedings of 25th IEEE International Conference on ComputerCommunications, pp. 1 –12 (2006)

18. Köhler, J.; Philippi, S.; et al.: Ontology based text indexing andquerying for the semantic web. Knowl. Based Syst. 19(8), 744–754 (2006)

19. Cruz, I.F.; Xiao, H.: The role of ontologies in data integration. J.Eng. Intell. Syst. 13, 245–252 (2005)

20. Jamadhvaja, M.; Senivongse, T.: An integration of data sourceswith uml class models based on ontological analysis. In: Proceed-ings of the First International Workshop on Interoperability of Het-erogeneous Information Systems, IHIS ’05, pp. 1–8. ACM, NewYork (2005)

21. Halevy, A.: Why your data won’t mix. ACM Queue 3, 50–58(2005)

22. Levy, A.Y.; Rajaraman, A.; Ordille, J.: Querying heterogeneousinformation sources using source description. In: Proceedings ofthe 22nd International Conference on Very Large Data Bases(VLDB), pp. 251–262 (1996)

23. Pottinger, R.; Halevy, A.: Minicon: A scalable algorithmfor answering queries using views. VLDB J. 10, 182–198(2001)

24. Mitra, P.: An algorithm for answering queries efficiently usingviews. In: Proceedings of the Australasian Database Conference,pp. 99–106. IEEE Computer Society, Los Alamitos (2001)

25. Afrati, F.N.; Li, C.; Ullman, J.D.: Generating efficient plansfor queries using views. SIGMOD Record 30, 319–330(2001)

26. Kohler, J.; Philippi, S.; Lange, M.: SEMEDA: Ontology-basedsemantic integration of biological databases. Bioinformatics19(18), 2420–2427 (2003)

27. Necib, C.B.; Freytag, J.C.: Using ontologies for database queryreformulation. In: Local Proceedings of the 8th East European Con-ference on Advances in Databases and Information Systems (AD-BIS), Hungary (2004)

28. Goble, C.A.; Paton, N.W.; et al.: Transparent access to multiplebioinformatics information sources. IBM Syst. J. 40(2), 532–551(2001)

29. Paton, N.W.; Stevens, R.; et al.: Query processing in the TAM-BIS bioinformatics source integration system. In: Proceedingsof the 11th International Conference on Sceintific and SatisticalDatabase Management (SSDBM), pp. 138–147. IEEE CS, Ohio(1999)

30. Hernandez, T.; Kambhampati, S.: Integration of biological sources:current systems and challenges ahead. SIGMOD Record 33(3), 51–62 (2004)

31. Ben-Miled, Z.; Li, N.; et al.: Complex life science multidatabasequeries. Proc. IEEE, 90(11), 1754–1763 (2002)

32. Shaker, R.; Mork, P.; et al.: The biomediator system as a tool forintegrating biological databases on the web. In: Proceedings of theVLDB Workshop on Information Integration on the Web (IIWEB),Toronton, Canada, pp. 77–82 (2004)

33. Rdf primer: W3C recommendation. http://www.w3.org/TR/rdf-primer/ February (2004)

34. Lacroix, Z.; Boucelma, O.; Essid, M.: The biological integrationsystem. In: Proceedings of the 5th ACM International Workshop onWeb Information and Data Management, WIDM ’03, pp. 45–49.ACM, New York (2003)

35. Freier, A.; Hofestadt, R.; Lange, M.; Scholz, U.; Stephanik, A: Bio-dataserver: a SQL-based service for the online integration of life.Silico Biol. 2(2), 37–57 (2002)

36. Khan, S.; Morvan, F.: Integrating biomedical sources on the inter-net. In: Proceedings of ISCA 19th International Conference onParallel and Distributed Computing Systems (PDCS), California,USA (2006)

123

Page 15: Bitmap Index in Ontology Mapping for Data Integration

Arab J Sci Eng (2013) 38:859–873 873

37. Understanding indexes and clusters: Oracle9i database perfor-mance tuning guide and reference release 2 (9.2). http://download.oracle.com/docs/cd/B10501_01/server.920/a96533/data_acc.htm.February 03, 2011

38. Oracle database semantic technologies. http://www.oracle.com/technetwork/database/options/semantic-tech/index.html February03, 2011

123