evaluating ontology mapping techniques: an experiment in public safety information sharing
Post on 05-Sep-2016
Embed Size (px)
geneous and incompatible. This problem has been amajor focus of the research and practitioner communitiesalike. There is a large body of work in the database and
Available online at www.sciencedirect.com
Decision Support Systems 45 (1. Introduction
The public safety community in the United Statescomprises of several thousand state and local policeagencies, hundreds of federal agencies ranging from theFBI to the Government Printing Office police, andthousands of courts, prosecutor's offices, and probationdepartments. Each of these agencies has one or more
internal information system as well as links to nationalinformation systems. In the past few years, there has beena thrust on the interoperability and information sharingbetween systems in agencies responsible for local law-enforcement and national security. However, a majorityof these agencies have records management systems thatare incapable of sharing information with each other in anefficient manner because they are not interoperable .
The interoperability problem exists because there is aneed to connect information systems that are hetero-Ontology-based interoperability approaches in the public safety domain need to rely on mapping between ontologies as eachagency has its own representation of information. However, there has been little study of ontology mapping techniques in thisdomain. We evaluate current mapping techniques with real-world data representations from law-enforcement and public safety datasources. In addition, we implement an information theory based tool called MIMapper that uses WordNet and mutual informationbetween data instances to map ontologies. We find that three tools: PROMPT, Chimaera, and LOM, have average F-measures of0.46, 0.49, and 0.68 when matching pairs of ontologies with the number of classes ranging from 1373. MIMapper performs betterwith an average F-measure of 0.84 in performing the same task. We conclude that the tools that use secondary sources (likeWordNet) and data instances to establish mappings between ontologies are likely to perform better in this application domain. 2007 Elsevier B.V. All rights reserved.
Keywords: Ontology mapping; Public safety information sharing; Mutual information; Intelligence and security informaticsinformation system. In the past few years, there has been a thAbstract
The public safety community in the United States consists of thousands of local, state, and federal agencies, each with its ownrust on the seamless interoperability of systems in these agencies.Evaluating ontology mapping tesafety inform
Department of Management Information
Available onlineCorresponding author.E-mail addresses: firstname.lastname@example.org (S. Kaza),
email@example.com (H. Chen).
0167-9236/$ - see front matter 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.dss.2007.12.007iques: An experiment in publicion sharing
ms, University of Arizona, United States
2008) 714728www.elsevier.com/locate/dssinformation integration fields that ranges from matchingdatabase schemas to answering queries from multiple
upporsystems. Recently, research has focused on the use ofontologies as they might play a key role in achievingseamless connectivity between systems . However,there has not been much ontology related research in thepublic safety domain due to the lack of standard onto-logies and the sensitive nature of information.
Many ontology-based information sharing ap-proaches rely on mapping between ontologies fromdifferent sources. Mapping tools use different techni-ques to suggest matches between ontology elements,and vary in input requirements, output formats, andmodes of interaction with the user. Due to their diversity,there has been little work on the comparative evaluationof mapping techniques in the information integrationliterature . Thus, there is a lack of understanding oftheir pitfalls with real-world data. Mapping betweenontologies is especially important in the public safetydomain since each agency has a different representationfor information. Comparative evaluation of appropriatemapping techniques with real ontologies from the publicsafety community allows us to study them in detail andtake a few more steps to the goal of seamless con-nectivity between information sources.
In this study, we survey public safety informationsharing initiatives and discuss the ontological implicationsof the systems and standards. We also perform a survey ofontology mapping and evaluate commonly used tools thatuse various techniques to map ontologies. In addition, wedesign a mapping tool that uses WordNet and mutualinformation between data instances and compare its per-formance to other techniques. In doing this, the study hastwo primary aims: firstly, address important issues on theuse of ontology-based information integration techniqueswith public safety and law-enforcement data, andsecondly, identify the set of techniques (and notnecessarily the tools themselves) that perform well withreal law-enforcement data representations.
The study attempts to answer the following questions:
How can we ontologically define important law-enforcement and other public safety data elements?
How do the common ontology-mapping techniquesperformwith real law-enforcement data representations?
How can we use data instances to identify mappingsbetween similar ontology classes?
The next section discusses relevant previous work inthis area. Section 3 presents the research design, testbed,and the design of the new mapping tool used in ourwork. The experiment results are presented and dis-cussed in Section 4. Section 5 concludes and presents
S. Kaza, H. Chen / Decision Sthe possible future directions for this study.2. Related work
2.1. Challenges in information sharing andinteroperability
The problem of bringing together heterogeneous anddistributed computer systems is known as the interoper-ability problem . Data and database heterogeneity isclassified into structural (data is in different structures)and semantic (data in different sources has differentmeaning) heterogeneity. Even though structural con-flicts pose major problems in information integration,they have been addressed to a great extent in the schemamatching literature (see Rahm et al.  for acomprehensive survey). Semantic heterogeneity andconflicts have been identified as one of the mostimportant and pressing challenges in informationsharing . The conflicts have been categorized intomany different taxonomies by previous studies includ-ing data level and schema level conflicts ;confounding, scaling, and naming conflicts ; anddata entry and representation conflicts . Semanticconflicts in the public safety information domaininclude:
Data representation/expression conflicts: Data arerecorded in different formats, e.g., different repre-sentations of date (050798 vs. '7-May-98').
Data precision conflicts: The domain of data valuesdiffer, e.g., five levels of security alerts (DHSadvisory system) vs. four level systems (many policeagency alert systems).
Data unit conflicts: the units of data differ, e.g.,measurement of height in inches vs. in feet.
Naming conflicts: The labels of elements differ, e.g.,an element might be called Person in one datasource whereas it might be called Individual inanother.
Aggregation conflicts: An aggregation is used in onedata source to identify a set of entities in anothersource, e.g., date is represented as month, date, andyear as separate attributes in one source and as com-bined attributes in another source.
Another important challenge in sharing public safetyrelated data is addressing privacy, security, and policyconcerns. Federal, state, and local regulations requirethat agreements between agencies within their respec-tive jurisdictions receive advance approval from theirgoverning hierarchy. This precludes informal informa-tion sharing agreements between those agencies. The
715t Systems 45 (2008) 714728requirements vary from agency to agency according to
the statutes by which they were governed. Sharinginformation among these agencies requires manymonths of negotiation along with formal agreements.Though extremely important, these concerns are not theprimary focus of this paper.
2.2. Public safety information sharing
The domain of public safety information sharing hasnot been adequately explored in the research commu-nity. This stems from the sensitivity and policy issuesassociated with obtaining this information for researchpurposes. However, there have been some successfulnational, state, and local efforts to share data amongpublic safety agencies. Table 1 lists a few of the majorpublic safety information sharing efforts along with theirpotential ontological contributions. Selected initiativesare also discussed in detail in the following sub-sections.
2.2.1. National level sharing initiativesMost national level initiatives are championed by the
Government. Perhaps the most important recent one is adata model standardization effort by the U.S. Departmentof Justice, called the Global Justice XML Data Model
model for the exchange of information within the justiceand public safety communities. It defines the semanticsand the structure of common data elements that are used inthe community. The data elements are based on approxi-mately thirty-five different data dictionaries, XMLschema documents, and data models used in the justiceand public safety communities. The purpose of theGJXDM is to provide a consistent, extensible, maintain-able XML schema reference specification for dataelements and types that represent the data requirementsfor a majority of the community. It is