Evaluating ontology mapping techniques: An experiment in public safety information sharing
Post on 05-Sep-2016
Embed Size (px)
geneous and incompatible. This problem has been amajor focus of the research and practitioner communitiesalike. There is a large body of work in the database and
Available online at www.sciencedirect.com
Decision Support Systems 45 (1. Introduction
The public safety community in the United Statescomprises of several thousand state and local policeagencies, hundreds of federal agencies ranging from theFBI to the Government Printing Office police, andthousands of courts, prosecutor's offices, and probationdepartments. Each of these agencies has one or more
internal information system as well as links to nationalinformation systems. In the past few years, there has beena thrust on the interoperability and information sharingbetween systems in agencies responsible for local law-enforcement and national security. However, a majorityof these agencies have records management systems thatare incapable of sharing information with each other in anefficient manner because they are not interoperable .
The interoperability problem exists because there is aneed to connect information systems that are hetero-Ontology-based interoperability approaches in the public safety domain need to rely on mapping between ontologies as eachagency has its own representation of information. However, there has been little study of ontology mapping techniques in thisdomain. We evaluate current mapping techniques with real-world data representations from law-enforcement and public safety datasources. In addition, we implement an information theory based tool called MIMapper that uses WordNet and mutual informationbetween data instances to map ontologies. We find that three tools: PROMPT, Chimaera, and LOM, have average F-measures of0.46, 0.49, and 0.68 when matching pairs of ontologies with the number of classes ranging from 1373. MIMapper performs betterwith an average F-measure of 0.84 in performing the same task. We conclude that the tools that use secondary sources (likeWordNet) and data instances to establish mappings between ontologies are likely to perform better in this application domain. 2007 Elsevier B.V. All rights reserved.
Keywords: Ontology mapping; Public safety information sharing; Mutual information; Intelligence and security informaticsinformation system. In the past few years, there has been a thAbstract
The public safety community in the United States consists of thousands of local, state, and federal agencies, each with its ownrust on the seamless interoperability of systems in these agencies.Evaluating ontology mapping tesafety inform
Department of Management Information
Available onlineCorresponding author.E-mail addresses: firstname.lastname@example.org (S. Kaza),
email@example.com (H. Chen).
0167-9236/$ - see front matter 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.dss.2007.12.007iques: An experiment in publicion sharing
ms, University of Arizona, United States
2008) 714728www.elsevier.com/locate/dssinformation integration fields that ranges from matchingdatabase schemas to answering queries from multiple
upporsystems. Recently, research has focused on the use ofontologies as they might play a key role in achievingseamless connectivity between systems . However,there has not been much ontology related research in thepublic safety domain due to the lack of standard onto-logies and the sensitive nature of information.
Many ontology-based information sharing ap-proaches rely on mapping between ontologies fromdifferent sources. Mapping tools use different techni-ques to suggest matches between ontology elements,and vary in input requirements, output formats, andmodes of interaction with the user. Due to their diversity,there has been little work on the comparative evaluationof mapping techniques in the information integrationliterature . Thus, there is a lack of understanding oftheir pitfalls with real-world data. Mapping betweenontologies is especially important in the public safetydomain since each agency has a different representationfor information. Comparative evaluation of appropriatemapping techniques with real ontologies from the publicsafety community allows us to study them in detail andtake a few more steps to the goal of seamless con-nectivity between information sources.
In this study, we survey public safety informationsharing initiatives and discuss the ontological implicationsof the systems and standards. We also perform a survey ofontology mapping and evaluate commonly used tools thatuse various techniques to map ontologies. In addition, wedesign a mapping tool that uses WordNet and mutualinformation between data instances and compare its per-formance to other techniques. In doing this, the study hastwo primary aims: firstly, address important issues on theuse of ontology-based information integration techniqueswith public safety and law-enforcement data, andsecondly, identify the set of techniques (and notnecessarily the tools themselves) that perform well withreal law-enforcement data representations.
The study attempts to answer the following questions:
How can we ontologically define important law-enforcement and other public safety data elements?
How do the common ontology-mapping techniquesperformwith real law-enforcement data representations?
How can we use data instances to identify mappingsbetween similar ontology classes?
The next section discusses relevant previous work inthis area. Section 3 presents the research design, testbed,and the design of the new mapping tool used in ourwork. The experiment results are presented and dis-cussed in Section 4. Section 5 concludes and presents
S. Kaza, H. Chen / Decision Sthe possible future directions for this study.2. Related work
2.1. Challenges in information sharing andinteroperability
The problem of bringing together heterogeneous anddistributed computer systems is known as the interoper-ability problem . Data and database heterogeneity isclassified into structural (data is in different structures)and semantic (data in different sources has differentmeaning) heterogeneity. Even though structural con-flicts pose major problems in information integration,they have been addressed to a great extent in the schemamatching literature (see Rahm et al.  for acomprehensive survey). Semantic heterogeneity andconflicts have been identified as one of the mostimportant and pressing challenges in informationsharing . The conflicts have been categorized intomany different taxonomies by previous studies includ-ing data level and schema level conflicts ;confounding, scaling, and naming conflicts ; anddata entry and representation conflicts . Semanticconflicts in the public safety information domaininclude:
Data representation/expression conflicts: Data arerecorded in different formats, e.g., different repre-sentations of date (050798 vs. '7-May-98').
Data precision conflicts: The domain of data valuesdiffer, e.g., five levels of security alerts (DHSadvisory system) vs. four level systems (many policeagency alert systems).
Data unit conflicts: the units of data differ, e.g.,measurement of height in inches vs. in feet.
Naming conflicts: The labels of elements differ, e.g.,an element might be called Person in one datasource whereas it might be called Individual inanother.
Aggregation conflicts: An aggregation is used in onedata source to identify a set of entities in anothersource, e.g., date is represented as month, date, andyear as separate attributes in one source and as com-bined attributes in another source.
Another important challenge in sharing public safetyrelated data is addressing privacy, security, and policyconcerns. Federal, state, and local regulations requirethat agreements between agencies within their respec-tive jurisdictions receive advance approval from theirgoverning hierarchy. This precludes informal informa-tion sharing agreements between those agencies. The
715t Systems 45 (2008) 714728requirements vary from agency to agency according to
the statutes by which they were governed. Sharinginformation among these agencies requires manymonths of negotiation along with formal agreements.Though extremely important, these concerns are not theprimary focus of this paper.
2.2. Public safety information sharing
The domain of public safety information sharing hasnot been adequately explored in the research commu-nity. This stems from the sensitivity and policy issuesassociated with obtaining this information for researchpurposes. However, there have been some successfulnational, state, and local efforts to share data amongpublic safety agencies. Table 1 lists a few of the majorpublic safety information sharing efforts along with theirpotential ontological contributions. Selected initiativesare also discussed in detail in the following sub-sections.
2.2.1. National level sharing initiativesMost national level initiatives are championed by the
Government. Perhaps the most important recent one is adata model standardization effort by the U.S. Departmentof Justice, called the Global Justice XML Data Model
model for the exchange of information within the justiceand public safety communities. It defines the semanticsand the structure of common data elements that are used inthe community. The data elements are based on approxi-mately thirty-five different data dictionaries, XMLschema documents, and data models used in the justiceand public safety communities. The purpose of theGJXDM is to provide a consistent, extensible, maintain-able XML schema reference specification for dataelements and types that represent the data requirementsfor a majority of the community. It is intended to allowdatabase schema developers to base the components andstructures of the schema elements on a universallyaccepted standard. From the ontological perspective, theGJXDM is an important step in the right direction. Itprovides a comprehensive collection of data classes inaddition to a taxonomy outlining relationships amongthem. This will allow future public-safety ontologyengineers to base their ontology on a commonly usedstandard, thus enabling easy interoperability.
An established national level data sharing initiative isthe Regional Information Sharing Systems (RISS, http://www.iir.com/riss/) program that links law-enforcementagencies throughout the nation, providing secure com-
ce relas (Apsoonral, st
e ages (Ma
feder2006)hundternathunds300 steral a
and lagencand lsal age
716 S. Kaza, H. Chen / Decision Support Systems 45 (2008) 714728(GJXDM, http://it.ojp.gov/jxdm). The GJXDM is anXML based standard intended to be a data reference
Table 1National and regional public safety information sharing efforts
National GJXDM (DOJ) Data standard 40 justiagencieexpand
Data standard 13 fede2006)
RISS (BJA) Data standards andcommunication (comm.)network
N-Dex (FBI) Data warehouse 20 pilot(March
Comm. network Severalsome in
NCIC (FBI) Data warehouse andcomm. network
Regional COPLINK (multipleregions including AZand CA)
Data warehouse andanalysis suite
Over afew fed
ARJIS (San Diego, CAregion)
Distributed query andcommunication network
Distributed query andcommunication network
Distributed query andcommunication network
LINX (VA and others) Data warehouse 27 state
www.hrlinx.communications, information sharing resources, and investi-gative support. Existing information systems in agencies
cipation Ontological contributions
ted federal, state, and localril 2005), but likely to
Classes, properties, and comprehensivetaxonomy for the justice community
ate, and local agencies (May Classes and properties for homeland andborder security agencies
ncies and 920 federaly 2006)
Message exchange ontologies andclasses for law-enforcement data
al, state, and local agencies Classes for incident reports
red federal, state, local, andional agencies
Message exchange ontologies
red federal, state, and local Primarily individual and vehicle relatedclasses
ate and local agencies and agencies
Comprehensive classes and propertiesfor law-enforcement data and analysisresults
ocal agencies and a fewies
Law-enforcement message exchangeontologies
ocal agencies and 10 federal First responder (police, fire, EMS) dataand message exchange ontologies
ncies Message exchange ontologies forindividual, vehicle, and pawn records
ocal agencies Incident report classes
upporconnect as nodes to the RISS net-work and use stan-dardized XML data exchange specifications to exchangeinformation on individuals, vehicles, and incident reports.
In addition to data standardization efforts, othernational level information sharing efforts include theNational Law Enforcement Telecommunications Sys-tem (NLETS) and the FBI's National Crime InformationCenter (NCIC) network. Both of these systems aretelecommunications networks that enable agencies tosubmit and access public safety information. Informa-tion in these systems is entered manually and most localand state law-enforcement information is not availablethrough them. Even though the systems are in wide use,their narrow scope and the lack of efficient methods ofmapping information to them limit their use.
2.2.2. Regional level sharing initiativesThe initiatives described in this sub-section have
attempted to address the problem of interoperabilityamong the state and local public safety agencies. Manyof these systems are capable of scaling up to the nationallevel and adapting to larger scale public safety relatedapplications. One of the most successful efforts in regionaldata sharing is the COPLINK  system. COPLINK isbased on mapping multiple database schemas into onestandard schema. It allows diverse police departments toshare data seamlessly through an easy-to-use interface thatintegrates different data sources including legacy recordsmanagement systems. During the writing of this paper, thesystem was being used by over a 300 law-enforcementagencies in more than twenty states.
Another regional sharing effort is the AutomatedRegional Justice Information System (ARJIS, www.arjis.org) adapted by over fifty agencies in the San Diego, C.A.region. The ARJIS system uses a query translation ap-proach to send distributed queries to various agencies onthe network. A third notable regional sharing effort isCapital Wireless Integrated Network (CapWIN, www.capwin.org) for first responders in the Washington, D.C.region.
As can be seen in Table 1, from an ontological per-spective, multiple agencies in the community cancontribute to different aspects of a public safety ontology.The survey also shows the presence of multiple informa-tion sharing efforts, each with its own representation ofinformation. This increases the need tomap between thesevarious representations for systems to interoperate.
2.3. The role o...