mapping from hr-xml resume to a semantic data modeljcardoso/projects/b2biss/papers... ·...
TRANSCRIPT
Mapping from HR-XML Resume to a Semantic
Data Model
João Sobrinho
University of Madeira, Department of Mathematics and Engineering,
9000-390 Funchal, Portugal, [email protected]
Abstract. Nowadays, companies using XML-based standards have the need
of quick and accurate business transactions. HR-XML have developed a
many specifications in the area of human-resources. Companies like
Convergys are using this standard in B2B integration. Although XML gained
a strong importance in providing syntactic interoperability, we believe that
companies have already started to evolve for a semantic level. For example,
Vodafone, Oracle and Microsoft are already implementing semantic features
in their services and achieved good results. To support this evolvement, tools
are already available to allow companies to migrate from a syntactic to a
semantic knowledge models. In this document we propose a solution to map
syntactic schema to semantic models. We start with a HR-XML Resume,
development of an ontology, and present the mappings that need to be
established to automatically transform the syntactic data to an OWL
representation (semantic).
1 Introduction
The increasing amount of data that must be exchanged between companies im-
poses a need for a well defined communication format that allows the parties in-
volved in a transaction to easily understand each other. The most commonly
used format in our days is XML (eXtensible Markup Language). Using XML,
companies can define strict protocols for specific areas which allow accurate da-
ta transactions. Some of the most widely used standards are the cXML[7], Ro-
settaNet[8], ebXML[9], HR-XML[1] and FIXML[10]. These standards are
usually created in a common effort from a set of players. For example FIXML
was created by a group of financial companies that include Barclays, AXA and
Microsoft.
HR-XML Consortium is one of the organizations that develop standards
based on XML. This consortium develops standards for enabling e-business and
the automation of human resources-related data exchanges [1]. The HR-XML
standards are developed by the needs and priorities of its members that include
Oracle, IBM, etc. Some of the standards already completed are: Competency
Types, Contact Method, Education History, Resume, Staffing Exchange
Protocol, among many others. All the schemas for these standards can be
downloaded from the HR-XML site at http://www.hr-xml.org.
The HR-XML standard have been chosen for this work due the following
reasons: it has a large number of members supporting the standards, allowing us
to know that it is a widely used standard; it is free, so we would not have to
spend any money consulting it and understanding; its simplicity and easiness of
use, which is better for explaining our work.
On this work will be used an example of a resume, focusing on the details of
the publications. We chose it because it seemed to be understandable for
common people and it is not a specific domain, allowing a bigger range of
people to apprehend our objectives.
We will also do some mappings between the syntactic data in XML to the
semantic data from the ontology with the JXML2OWL [2] tool developed last
year by our colleagues Rodrigues, T. and Rosa, P. (2006). This tool allows
creation and edition of transformations from XML files to OWL files. These
transformations allow enterprises to convert data in an unknown structure into a
well-defined and understood structure to the enterprises.
We believe that in a proximate future, companies will migrate to semantic
data. For that, they will need a tool that allows transforming the data from
syntactic to semantic format. During the change from syntactic to semantic data,
companies still have to be able to communicate with other companies that will
still be using syntactic data. For that job, there is B2BISS [13] that allows
companies to receive automatically XML files and transform them to predefined
OWL instances.
The paper is structured as follows. Section 2 will present an overview about
HR-XML. Section 3 will present the ontology structure and the types of rela-
tionships established among its elements. Section 4 will present the types of re-
lations that exist between the Resume XML elements and our ontology, defining
which mappings should be done. Section 5 presents a brief overview about the
tools used in the mapping. Section 6 will present the results of the mappings in
the form of OWL instances. Finally, the last section will present our conclusion.
2 HR-XML
“The HR-XML Consortium is an independent, non-profit organization
dedicated to the development and promotion of a standard suite of XML
specifications to enable e-business and automation of human resources-related
data exchanges.” [1]
This consortium creates open data exchange standards in order to minimize
the risk to the companies of adopting ad-hoc solutions, and cut on the expenses
of the negotiation and agreement of a solution. By using these standards,
companies may also spare costs on implementing systems to interchange data.
These standards also provide a valuable resource to communicate and
collaborate on the high-level technology standards that are critical in the
development of Web-based solutions [4].
Today, HR-XML specifications include: Application Acknowledgment,
Assessments, Background Checking, Benefits Enrollment, Payroll Instructions,
Resume, Staffing Exchange Protocol, among many others.
The following code shows a small example of a HR-XML Resume file.
<Resume>
<StructuredXMLResume>
<ContactInfo>
<PersonName>
<FormattedName>John A. Example</FormattedName>
</PersonName>
<ContactMethod>
<Telephone>
<FormattedNumber>+1 404 122 1234 </FormattedNumber>
</Telephone>
<Fax>
<FormattedNumber>+1 404 123 1234</FormattedNumber>
</Fax>
<InternetEmailAddress>
</InternetEmailAddress>
<InternetWebAddress>
http://www.cc.gatech.edu/~jaexample/
</InternetWebAddress>
</ContactMethod>
</ContactInfo>
</StructuredXMLResume>
</Resume>
This example only contains some data related to the contact information.
By 2003, the Consortium had a membership of more than one hundred
companies [4] represented in twenty two different countries. [3] Some of the
companies with membership in HR-XML are important players on their areas,
like: Convergys which is a global leader in integrated billing, employee care,
and customer care services [4]; Cisco Systems, well known in the networks
domain; IBM, in computer systems; Oracle, one of biggest players related to
data bases.
3 Defining the Ontology
In order to perform the mappings from the XML to the OWL ontology, we
were purposed to find real-world examples being used in companies. For the
XML we were able to find a standard, but then it became hard to find an OWL
ontology that allowed us to do all the mappings we wanted to. Due to these
limitations, we determined that would be better to create a new ontology.
Among the ontologies that have been founded using Swoogle [5], there was
an ontology named „eBiquity Publication Ontology‟, related to publications with
many details that could not be adapted to the XML instance file. This ontology
can be founded at http://ebiquity.umbc.edu/ontology/publication.owl.
In order to develop a new ontology, we had available two approaches: the
bottom-up and the top-down approach. In the first one, we would have to study
the real-world in order to understand it, and then, model it in OWL. Using this
approach, we would achieve an ontology that would model a real-world
company, decreasing the amount of simulated environment. Following the
second approach, we start at an example of an XML file and model the ontology
with the details contained in the XML. Using this approach, we do not get as
well close to the real world as the previous one, but it becomes faster and
cheaper to realize, because there is need to understand the world related to the
domain we were modeling. We used the top-down because we could find any
company that were using the XML standard and we had not time to go to a
company that could possibly use it to better understand the details related to the
domain we are studying. We did not find these problems significant because did
not seem important to what we are trying to prove.
To develop a new ontology, we started by analyzing the elements in the
Resume specification. After this, we have chosen the publication part to prove
the concept of mapping between a real-world syntactic data example to a
semantic data one. Then, it was time to develop the ontology related to the
publication domain.
The ontology we created has eight classes:
1. PublicationRecord
2. Publication
3. Article
4. Book
5. ConferencePaper
6. Copyrights
7. Author
8. Employee
In the following sections we present these classes in more detail.
#PublicationRecord. This class represents a set of publications created be an
employee. This class does not have any Datatype property associated.
#Publication. This class represents all the types of publications that exists. This
class has four datatype properties that are common to all publications. Those
datatype properties are: title, publicationName, coments and abstract. Table 1
shows more details about the datatype properties of this class.
Table 1. #Publication Datatype Properties
Datatype Property Description
publicationName The name of the author in this publication.
title The title of the publication.
coments Coments about the publication.
abstract A brief summary of the related book.
#Article. This class represents the articles published. This is a sub-class of the
class Publication. We have defined six datatype properties for this class, which
are: articlePublicationDate, journal, volume, issue, pageNumbers and language.
Table 2 lists all the datatypes, along with some details about them.
Table 2. #Article Datatype Properties
Datatype Property Description
articlePublicationDate The date of publication of the article.
Journal The journal where the article has been published.
Volume The number of the volume of the journal.
Issue The meaning is based on context.
pageNumbers The page number or page range where an article
appears.
language Defines the language in which the publication is
written.
#Book. This class represents the books published. This is a sub-class of the class
Publication. We have defined five datatype properties for this class, which are:
edition, ISBN, bookPublicationDate, publisherName and publisherLocation.
Table 3 lists all the datatypes, along with some details about them.
Table 3. #Book Datatype Properties
Datatype Property Description
edition The edition of the book.
ISBN The International Standard Book Number of the
book.
bookPublicationDate Contains the date of publication of the book.
publisherName Defines the name of the publisher.
publisherLocation Contains the location of the publisher.
#ConferencePaper. This class represents the conference papers published. This
is also a sub-class of Publication. We have defined three datatype properties for
this class, which are: eventName, conferenceDate and conferenceLocation.
Table 4 lists all the datatypes, including some details about them.
Table 4. #ConferencePaper Datatype Properties
Datatype
Property
Description
eventName The name of the conference where the paper was
submitted.
conferenceDate The date or range date of the conference.
conferenceLocation The location of the conference.
#Copyrights. This class represents the copyrights of the publications. We have
defined two datatype properties for this class, which are: copyrightYear and
copyrightContent. Table 5 lists all the datatypes, along with some details about
them.
Table 5. #Copyrights Datatype Properties
Datatype
Property
Description
copyrightYear Contains the date the copyright was originally issued.
copyrightContent The name of the copyright holder and a short
description.
#Authors. This class represents the author of the publications. We have defined
four datatype properties for this class, which are: name, phoneNumber,
faxNumber and mailAddress. Table 6 lists all the datatypes, along with some
details about them.
Table 6. #Author Datatype Properties
Datatype
Property
Description
name Contains the name of the author
phoneNumber Defines the phone number of the author.
faxNumber Defines the fax number of the author.
mailAddress Contains the e-mail address.
#Employee. This class represents an employee. This is a sub-class from author.
We have defined one datatype properties for this class, which is: position. This
property defines the position occupied in the company where he works.
After the definition of the classes, we defined the relationships between them.
Those relationships are defined by the object properties. With object properties
we can define new classes or restrict other classes already defined. In this
ontology we have only restricted the existing classes using cardinality
restrictions. Table 7 shows those restrictions.
Table 7. Object Properties and Restrictions
Name Domain Range Restrictions
publishes author publication Minimum
cardinality
is 1.
copyrightedBy publication copyrights Maximum
Cardinality
is 1.
belongsToRecord publication publicationRecords Minimum
cardinality
is 1.
On the first restriction, we are defining that an author must have at least one
publication. The second one says that the publications cannot more than one
copyright. The last restriction defines that a publication record must contain one
or more publications.
publication
article conferencePaper book
author
employee
publicationRecordcopyrights
subClassOf subClassOf subClassOf subClassOf
copyrightedBy belongsToRecord
publishes
Figure 1 - Draw of the ontology
The image in Figure 1 shows the classes in the ontology and the relations
between them. This is a better way to understand the ontology we developed.
4 Establishing the Syntactic-to-Semantic Relationships
In order to map a syntactic context to a semantic one, we need to perform
several minor mappings. We define a mapping as a relation from one XML
node or attribute to an OWL class or datatype property. Mappings can also be
used to establish a rela-tionship between instances of specific classes.
In order to specify the mappings, we will use the concept of XPath. The
XPath lan-guage is based on a tree representation of the XML document, and
provides the abili-ty to navigate around the tree, selecting nodes by a variety of
criteria. In popular use (though not in the official specification), an XPath
expression is often referred to simply as an XPath. More details can be obtained
in [6].
Table 8 shows the correspondence between the XML values (Xpaths) and the
OWL attributes.
Table 8. Object Properties and Restrictions
XPaths from the XML File Ontology Datatype
Properties
/Resume/StructuredXMLResume/ContactInfo/Per
sonName/FormattedName #author/name
/Resume/StructuredXMLResume/ContactInfo/Co
ntactMethod/Telephone/FormattedNumber #author/phoneNumber
/Resume/StructuredXMLResume/ContactInfo/Co
ntactMethod/Fax/FormattedNumber #author/faxNumber
/Resume/StructuredXMLResume/ContactInfo/Co
ntactMethod/InternetEmailAddress #author/mailAddress
/Resume/StructuredXMLResume/PublicationHist
ory/Article/Title #article/title
/Resume/StructuredXMLResume/PublicationHist
ory/Article/Name/FormattedName #article/publicationName
/Resume/StructuredXMLResume/PublicationHist
ory/Article/PublicationDate/YearMonth
#article/articlePublicatio
nDate
/Resume/StructuredXMLResume/PublicationHist
ory/Article/JournalOrSerialName #article/journal
/Resume/StructuredXMLResume/PublicationHist
ory/Article/Volume #article/volume
/Resume/StructuredXMLResume/PublicationHist
ory/Article/Issue #article/issue
/Resume/StructuredXMLResume/PublicationHist
ory/Article/PageNumber #article/pageNumbers
/Resume/StructuredXMLResume/PublicationHist
ory/Article/PublicationLanguage #article/language
/Resume/StructuredXMLResume/PublicationHist
ory/Book/Title #book/title
/Resume/StructuredXMLResume/PublicationHist
ory/Book/Name/FormattedName #book/publicationName
/Resume/StructuredXMLResume/PublicationHist
ory/Book/PublicationDate/YearMonth
#book/bookPublicationD
ate
/Resume/StructuredXMLResume/PublicationHist
ory/Book/Abstract #book/abstract
/Resume/StructuredXMLResume/PublicationHist
ory/Book/Copyright/CopyrightDates/OriginalDat
e/Year
#copyrights/copyrightYe
ar
/Resume/StructuredXMLResume/PublicationHist
ory/Book/Copyright/CopyrightText
#copyrights/copyrightCo
ntent
/Resume/StructuredXMLResume/PublicationHist
ory/Book/Edition #book/edition
/Resume/StructuredXMLResume/PublicationHist
ory/Book/ISBN #book/ISBN
/Resume/StructuredXMLResume/PublicationHist
ory/Book/PublisherName #book/publisherName
/Resume/StructuredXMLResume/PublicationHist
ory/Book/PublisherLocation #book/publishLocation
/Resume/StructuredXMLResume/PublicationHist
ory/ConferencePaper/Title #conferencePaper/title
/Resume/StructuredXMLResume/PublicationHist
ory/ConferencePaper/Name/FormattedName
#conferencePaper/public
ationName
/Resume/StructuredXMLResume/PublicationHist
ory/ConferencePaper/EventName
#conferencePaper/event
Name
/Resume/StructuredXMLResume/PublicationHist
ory/ConferencePaper/ConferenceDate/AnyDate
#conferencePaper/confer
enceDate
/Resume/StructuredXMLResume/PublicationHist
ory/ConferencePaper/ConferenceLocation
#conferencePaper/confer
enceLocation
These relations define the bridge for the data between the XML structure and
the OWL classes and datatype properties.
To process the relations list above we used XSL Transformations [15]. XSL
Transformations (XSLT) is used to transform one XML document into another
one, without modifying the first one. The output document is generated based on
the source one. [16]
The following example shows the XSL code needed to transform node
containing the name of the person into the datatype property name from the
class #author of our ontology.
<xsl:variable name="root" select="/"/>
<xsl:variable name="hrauthors0">
<xsl:for-each select=
"/Resume/StructuredXMLResume/ContactInfo/PersonName/FormattedName">
<xsl:if test="normalize-space(.) != '' ">
<authorId>
<xsl:value-of select="translate(normalize-space(.), ' ', '')"/>
</authorId>
</xsl:if>
</xsl:for-each>
</xsl:variable>
The object properties are also defined in the JXML2OWL tool. These
mappings must be defined after the links between the XML structure and the
OWL classes. In the domain class of each relation, we have to add object
properties to that mapping, and then select which map of the range type we want
to associate as an object property instance. In the purposed mapping, we have to
create three mappings for object properties. Table 9 describes those mappings.
Table 9. Object Properties Mappings
Domain Class Range Class Object Property
#author #publication publishes
#publication #copyrights copyrightedBy
#publication #publicationRecord belongsToRecord
Some of these mappings could not be done. This situation will be explained
in the next chapter.
6 Mapping Results
Previously we have created a new ontology and all the mappings between the
XML file and the OWL classes. With these, it is possible to generate a file with
instances of the mapped classes.
The next example shows a portion of the XML file used to generate the OWL
instances.
<Resume>
<StructuredXMLResume>
<PublicationHistory>
<Book>
<Title>XML in a Seashell</Title>
<Name>
<FormattedName>John A. Example</FormattedName>
</Name>
<PublicationDate>
<YearMonth>2001-02</YearMonth>
</PublicationDate>
<Abstract>A very readable introduction to XML for readers with existing
knowledge of markup and Web technologies. </Abstract>
<Copyright>
<CopyrightDates>
<OriginalDate>
<Year>2001</Year>
</OriginalDate>
</CopyrightDates>
<CopyrightText>Copyright 2nd edition</CopyrightText>
</Copyright>
<Edition>2nd Edition</Edition>
<ISBN>0596000222</ISBN>
<PublisherName>O'Malley Associates</PublisherName>
<PublisherLocation> Garden City, NY, US </PublisherLocation>
</Book>
</PublicationHistory>
</StructuredXMLResume>
</Resume>
The following example shows an instance created of the class book and also
the instance of the object property #copyrightedBy.
<hr:book rdf:ID="_hrbookXMLinaSeashell">
<hr:copyrightedBy rdf:resource="#_hrcopyrightsCopyright2ndedition"/>
<hr:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
XML in a Seashell
</hr:title>
<hr:publisherName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
O'Malley Associates
</hr:publisherName>
<hr:publishLocation rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Garden City, NY, US
</hr:publishLocation>
<hr:bookPublicationDate
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
2001-02
</hr:bookPublicationDate>
<hr:publicationName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
John A. Example
</hr:publicationName>
<hr:ISBN rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
0596000222
</hr:ISBN>
<hr:edition rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
2nd Edition
</hr:edition>
<hr:abstract rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
A very readable introduction to XML for readers with existing knowledge of mar-
kup and Web technologies.
</hr:abstract>
</hr:book>
<hr:copyrights rdf:ID="_hrcopyrightsCopyright2ndedition">
<hr:copyrightYear rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
2001
</hr:copyrightYear>
<hr:copyrightContent rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Copyright 2nd edition
</hr:copyrightContent>
</hr:copyrights>
As we can see on the example, there has been created one instance of each
#copyrights and #book classes. In the second line of the instance of the #book
class, is the reference to the instance of the class #copyrights. This instantiates
the object property #copyrightedBy.
Now let‟s think about a scenario where we need to associate an instance
created from the XML file to an instance that does not have any mapping to the
XML file. For example, in the ontology we developed, there is the class
#publications that is related to the class #publicationRecord by the object
property #belongsToRecord. This relation means that a publication is related to
the publication record. The ontology schema also has cardinality restriction
saying that a publication is always related to at least one publication record.
The instance of this relation cannot be created using JXML2OWL because
this tool does not allow the creation of instances of classes that are not mapped
to the XML File. A solution to this problem could be the creation of an
instantiation of a class without having to map it to the XML File.
Using the actual version of the tool, it is also possible to create the instance
of the object property, but we have to create a mapping for the class
#publicationRecord. This map can be with anything of the XML file, because it
does not exist any datatype property to be added. And then, it would be created
an instance of the class, and the object property could be instantiated.
Applying the previous solution, it would be added some lines to the previous
example of the book instance. In the book instance we would have also this line:
<hr:belongsToRecord rdf:resource="#_hrpublicationRecordJohA.Example"/>
And at the end, we would have to add the following class instance:
<hr:belongsToRecord rdf:ID="_hrpublicationRecordJohA.Example">
</ hr:belongsToRecord>
The last solution seems to be the best to apply at this moment, due to the
limitations of JXML2OWL. But in my perspective, it seems it would be a
„cleaner‟ solution being able to create an instance without having to create a
mapping for a class.
There is another problem to solve. There is the case in the developed
ontology where the class #author has an object property that relates it to the
class #publication. This last class has three subclasses: #article, #book and
#conferencePaper. When we are building the mappings between the XML and
the OWL, it exists correspondences to the classes #author, #article, #book and
#conferencePaper, but it does not exist any mapping to the class #publication,
only to their subclasses. The ontology also has an object property from the class
#author to the class #publication meaning that all authors must write at least one
publication.
In theory it should be possible to instantiate the mentioned object property in
JXML2OWL, having mappings to the class #author and to any of the subclasses
of the class #publication. But the tool is not allowing the instantiation of the
object property because it does not find any mapping to the class #publication.
It should be possible to instantiate this object property because the classes
#article, #book and #conferencePaper are subclasses from #publication, and
then they inherit the object property #publishes.
The OWL instance file with the instantiation of the object property should be
like the following:
<hr:author rdf:ID="_hrauthorJohnA.Example">
<hr:publishes rdf:resource="#_hrbookXMLinaSeashell"/>
<hr:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
John A. Example
</hr:name>
<hr:phoneNumber rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
+1 404 122 1234
</hr:phoneNumber>
<hr:faxNumber rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
+1 404 123 1234
</hr:faxNumber>
<hr:mailAddress rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
</hr:mailAddress>
</hr:author>
As we can see in this example, the element hr:publishes states the relation
between the instance of #author and the instance of #book.
The only solution we achieved is to add this functionality to the JXML2OWL
tool because this is one of the basic notions of Semantic Web: inheritance.
6 JXML2OWL Mapping Tool
The mapping from the syntactic to the semantic data model can be executed in
various forms. It can be done by hand, or using the support of software. In our
case, we used a specific tool created exclusively to perform these kinds of map-
pings. This tool is the JXML2OWL[2], that stands for Java XML to OWL.
JXML2OWL is a graphical tool built in Java that performs mappings from
XML to OWL. The tool is easy to use and is based in the drag-and-drop tech-
nique. In order to generate the OWL instances, the tool uses the XSLT technol-
ogy. Therefore, the user might save an XSLT file that when processed in any
XSLT processor will produce an OWL document composed of instances.
JXML2OWL is composed of one main window divided in three major parts.
The Left side of the window presents the tree structure of the source XML. The
right side presents the OWL tree structure. Being so, the OWL elements to be
mapped are almost in front of the XML elements, which facilitates the process
of relating them. In the bottom of the window, there is a section where the user
can specify the parameters of the Datatype Properties and Object Properties
mappings. A screen-shot of the tool can be seen in Figure 2.
The main advantage of this tool is the fact that the user only needs to execute
a mapping once. In the future when the user intends to transform a XML file
that has the same structure as one that has been previously mapped, it only has
to execute the XSLT in that XML file, using an external XSLT processor, and
the instances are created.
Figure 2 - JXML2OWL Mapper
To create a Class Mapping using JXML2OWL, the execution occurs as fol-
lows[2]:
1. Mapper selects an XML item or an OWL concept and maps it to an
OWL concept or XML item respectively.
2. System creates the respective mapping rule and updates transforma-
tion.
To create a Datatype Property mapping using JXML2OWL, the execution
occurs as follows[2]:
1. Mapper selects one existent Class mapping.
2. Mapper specifies the Datatype property name, and the range XML
item for the selected property.
3. System creates the respective mapping rule instance, and updates
transformation.
To create an Object Property mapping using JXML2OWL, the execution oc-
curs as follows[2]:
1. Mapper selects one existent Class mapping.
2. Mapper specifies the Object Property name.
3. Mapper specifies the Range Class Association for selected property.
4. System creates the respective mapping rule instance, and updates
transformation.
Although having a few bugs, this tool facilitates the mapping from XML to
OWL. It comes with a user manual that succinctly describes how to perform the
main tasks. The download can be done from: http://jxml2owl.sourceforge.net. In
the site the user can also view two videos that show how to use the tool.
7 Conclusions
Today companies are using syntactic data to perform the electronic data
exchange between them. In the domain of human-resources related data, HR-
XML is used by many important companies to transmit the data among them
and their clients and partners.
We believe that in near future companies will need to shift from syntactic to
semantic data in order to increase their profits. Much work has been developed
with this aim, including transforming from XML data to OWL instances by
Rodrigues, T. and Rosa, P. (2006) trying to facilitate the job of the companies
that will try to do this shifting. But still, it has not been already tested with real-
world examples of syntactic data. For that, we used one HR-XML specification
and developed an ontology to simulate a real world scenario.
With the developed ontology, we encountered two problems to be solved and
possible solutions to them.
The first problem has to do with the fact that the used tool needs to have an
instance of every class mapped to the XML file in order to be connected to the
ones we are mapping. With the developed ontology, there was a class that did
not have any match in the XML File, and then, could not be associated with the
classes being mapped and instantiated. For this problem, we founded some
solutions. Without changing anything on the tool, we can link the class that has
no mapping in the XML file with some node, just to create an instance of that
class. We think that this is not the best solution, but is the one that can be used
with the actual resources. A cleaner solution, would be adding the functionality
of linking an instance to other existing instance of that class in other file. This
problem would be also solved if we could just create an instance of a class
without having to map something to it.
For the second problem, we believe it is a basic functionality since it is
connected with inheritance which is in the base of semantics. This problem
became from the fact that, in the developed ontology, we have an object
property that has as range a class that is never instantiated. This class has
subclasses which have mappings to them, having instances then. When we try to
instantiate the object property, we are not allowed to do that because the
instance of the range class from the object property is not instantiated. It should
let us create the instance, because that class has subclasses that inherit the object
properties from the upper-class.
This work shows that the migration from syntactic to semantic data is not
always easy and there is still a lot of work to do on this field.
8 Related Work
For this report, we have identified as related work integration of systems
using semantics.
Michael Uschold and Michael Gruninger (2004?) made a report about the
various existing architectures for integration of systems using semantics. They
have noticed five different types of architectures. JXML2OWL fits better in the
Manual Mapping architecture type. In this paper it is also suggested a hybrid
approach which mixes some of the indentified architectures to produce one that
better fits to a specific scenario.
Thomas Haselwanter et al, present in their paper how the technology of
Semantic Web Service can help overcoming the heterogeneity of data and
processes in a B2B integration scenario. For their case study, they use
RosettaNet for message exchange and message definition, and in the other end
of the communication line, it is used a combination of WSDL and XML
Schema. In their work it is presented an architecture using a middleware
framework conforming to the principles of a Semantic Service Oriented
Architecture. In this work we try to show a similar concept using the HR-XML
specifications to an ontology with B2BISS acting as a middleware framework.
The idea of using mappings to improve interoperability is also used by
Leonid Stoimenov, et al (2006) applied to Geographical Information Systems
(GIS). Their proposed framework, called GeoNis, uses a hybrid ontology
approach mixing the usage of a top level ontology and more specific ontologies
applied to each domain or office within a company. Some mappings must be
made in order to allow the ontologies to communicate among them.
JXML2OWL also uses the concept of mapping to ontologies, but on this one,
from XML files.
References
1. HR-XML official Site: www.hr-xml.org
2. Rodrigues, T., Rosa, P.: JXML2OWL: An Approach to Semantic Data Integration
3. Cover Pages: http://xml.coverpages.org/hr-xml.html
4. HRO News: http://www.hrotoday.com/News.asp?id=117
5. Swoogle: http://swoogle.umbc.edu/
6. W3C Consortium Xpath tutorial: http://www.w3schools.com/xpath/
7. cXML official Site: http://cxml.org
8. RosettaNet official Site: http://www.rosettanet.org/
9. ebXML official Site: http://www.ebxml.org/
10. FIXML official Site: http://www.fixprotocol.org/
11. USCHOLD, Michael and Gruninger, Michael. (2004?). Arcuitectures for Semantic
Integration.
12. HASELWANTER, Thomas. (2006). Dynamic B2B Integration on the Semantic Web
Services: SWS Challenge Phase 2.
13. TEIXEIRA, Daniel, SOBRIHO, João. (2007). Advanced Applications for Management,
Integration and Analysis of Medium/Small Companies.
14. Stoimenov, Leonid et al. (2006). Discovering Mappings between Ontologies in
Semantic Integration Process.
15. XSLT Specification Site: http://www.w3.org/TR/xslt
16. XSL Transformations on Wikipedia: http://en.wikipedia.org/wiki/XSLT
Appendix A: HR-XML Resume Example File
This is the complete code of the HR-XML Resume used on this work.
<?xml version="1.0" encoding="UTF-8"?>
<Resume>
<StructuredXMLResume>
<ContactInfo>
<PersonName>
<FormattedName>John A. Example</FormattedName>
</PersonName>
<ContactMethod>
<Telephone>
<FormattedNumber>+1 404 122 1234 </FormattedNumber>
</Telephone>
<Fax>
<FormattedNumber>+1 404 123 1234</FormattedNumber>
</Fax>
<InternetEmailAd-
dress>[email protected]</InternetEmailAddress>
</ContactMethod>
</ContactInfo>
<PublicationHistory>
<Article>
<Title>Designing Interfaces for Youth Services Information Manage-
ment.</Title>
<Name>
<FormattedName>John A. Example</FormattedName>
</Name>
<PublicationDate>
<YearMonth>1996-06</YearMonth>
</PublicationDate>
<JournalOrSerialName>1996 Human-Computer Interaction Laboratory
Video Reports, K. Pleasant, Ed., </JournalOrSerialName>
<Volume>vol. 2</Volume>
<Issue>no. 3</Issue>
<PageNumber>pp.319-329</PageNumber>
<PublicationLanguage>EN</PublicationLanguage>
</Article>
<Book>
<Title>XML in a Seashell</Title>
<Name>
<FormattedName>John A. Example</FormattedName>
</Name>
<PublicationDate>
<YearMonth>2001-02</YearMonth>
</PublicationDate>
<Abstract>A very readable introduction to XML for readers with exist-
ing knowledge of markup and Web technologies. </Abstract>
<Copyright>
<CopyrightDates>
<OriginalDate>
<Year>2001</Year>
</OriginalDate>
</CopyrightDates>
<CopyrightText>Copyright 2nd edition</CopyrightText>
</Copyright>
<Edition>2nd Edition</Edition>
<ISBN>0596000222</ISBN>
<PublisherName>O'Malley Associates</PublisherName>
<PublisherLocation> Garden City, NY, US </PublisherLocation>
</Book>
<ConferencePaper>
<Title>Trends in Employee Benefit Offerings</Title>
<Name>
<FormattedName>Debra J. Cohen</FormattedName>
</Name>
<EventName>SHRM 55th Annual Conference and Exposi-
tion</EventName>
<ConferenceDate>
<AnyDate>2003-06-10</AnyDate>
</ConferenceDate>
<ConferenceLocation>Orlando, FL</ConferenceLocation>
</ConferencePaper>
</PublicationHistory>
</StructuredXMLResume>
</Resume>
Appendix B: Publications OWL Ontology
This is the ontology we developed to this work.
<?xml version="1.0"?>
<rdf:RDF xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<owl:Ontology rdf:about="">
<owl:versionInfo>
Ontology version 0.8, 21 May 2007
</owl:versionInfo>
</owl:Ontology>
<owl:Class rdf:ID="publicationRecord">
<rdfs:comment>Record of publications that an employee has made.</rdfs:comment>
</owl:Class>
<owl:Class rdf:ID="publication">
<rdfs:comment>publication that an employee has made.</rdfs:comment>
</owl:Class>
<owl:Class rdf:ID="article">
<rdfs:comment>Articles written</rdfs:comment>
<rdfs:subClassOf rdf:resource="#publication"/>
<owl:disjointWith rdf:resource="#book"/>
<owl:disjointWith rdf:resource="#conferencePaper"/>
</owl:Class>
<owl:Class rdf:ID="book">
<rdfs:comment>Books written</rdfs:comment>
<rdfs:subClassOf rdf:resource="#publication"/>
<owl:disjointWith rdf:resource="#article"/>
<owl:disjointWith rdf:resource="#conferencePaper"/>
</owl:Class>
<owl:Class rdf:ID="conferencePaper">
<rdfs:comment>Accepted papers written</rdfs:comment>
<rdfs:subClassOf rdf:resource="#publication"/>
<owl:disjointWith rdf:resource="#article"/>
<owl:disjointWith rdf:resource="#book"/>
</owl:Class>
<owl:Class rdf:ID="copyrights">
<rdfs:comment>copyrights of a publication</rdfs:comment>
</owl:Class>
<owl:Class rdf:ID="employee">
<rdfs:comment>Informations about who written the publications</rdfs:comment>
<rdfs:subClassOf rdf:resource="#author"/>
</owl:Class>
<owl:Class rdf:ID="author">
<rdfs:comment>Informations about who written the publications</rdfs:comment>
</owl:Class>
<!-- Object Properties-->
<owl:ObjectProperty rdf:ID="publishes">
<rdfs:domain rdf:resource="#author"/>
<rdfs:range rdf:resource="#publication"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="copyrightedBy">
<rdfs:domain rdf:resource="#publication"/>
<rdfs:range rdf:resource="#copyrights"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="belongsToRecord">
<rdfs:domain rdf:resource="#publication"/>
<rdfs:range rdf:resource="#publicationRecord"/>
</owl:ObjectProperty>
<!-- Restrictions -->
<owl:Class rdf:about="#author">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#publishes"/>
<owl:minCardinality rdf:datatype=
"http://www.w3.org/2001/XMLSchema#positiveInteger">1</owl:minCardinality>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
<owl:Class rdf:about="#publication">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#copyrightedBy"/>
<owl:maxCardinality rdf:datatype
="http://www.w3.org/2001/XMLSchema#positiveInteger">1</owl:maxCardinality>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
<owl:Class rdf:about="#publicationRecord">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#belongsToRecord"/>
<owl:minCardinality rdf:datatype
="http://www.w3.org/2001/XMLSchema#positiveInteger">1</owl:minCardinality>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
<!-- Datatypes Properties-->
<owl:DatatypeProperty rdf:ID="title">
<rdfs:label>publication title</rdfs:label>
<rdfs:domain rdf:resource="#publication"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="publicationName">
<rdfs:label>name of the author used in this publication</rdfs:label>
<rdfs:domain rdf:resource="#publication"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="coments">
<rdfs:label>coments about this publication</rdfs:label>
<rdfs:domain rdf:resource="#publication"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="abstract">
<rdfs:label>abstract of the publication</rdfs:label>
<rdfs:domain rdf:resource="#publication"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="articlePublicationDate">
<rdfs:label>publication date</rdfs:label>
<rdfs:domain rdf:resource="#article"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="journal">
<rdfs:label>Journal where article has been published</rdfs:label>
<rdfs:domain rdf:resource="#article"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="volume">
<rdfs:label>volume of the publication</rdfs:label>
<rdfs:domain rdf:resource="#article"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="issue">
<rdfs:label>The meaning is based on context</rdfs:label>
<rdfs:domain rdf:resource="#article"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="pageNumbers">
<rdfs:label>Number of the pages of the article on the book</rdfs:label>
<rdfs:domain rdf:resource="#article"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="language">
<rdfs:label>language of the article</rdfs:label>
<rdfs:domain rdf:resource="#article"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="copyrightYear">
<rdfs:label>Year of the publication copyrights</rdfs:label>
<rdfs:domain rdf:resource="#copyrights"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="copyrightContent">
<rdfs:label>Text of the publication copyrights</rdfs:label>
<rdfs:domain rdf:resource="#copyrights"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="edition">
<rdfs:label>book's edition</rdfs:label>
<rdfs:domain rdf:resource="#book"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="ISBN">
<rdfs:label>book's ISBN</rdfs:label>
<rdfs:domain rdf:resource="#book"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="bookPublicationDate">
<rdfs:label>book's publication date</rdfs:label>
<rdfs:domain rdf:resource="#book"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="publisherName">
<rdfs:label>book's publisher</rdfs:label>
<rdfs:domain rdf:resource="#book"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="publishLocation">
<rdfs:label>book's publish location</rdfs:label>
<rdfs:domain rdf:resource="#book"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="eventName">
<rdfs:label>Name of the Event where the paper has been submited</rdfs:label>
<rdfs:domain rdf:resource="#conferencePaper"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="conferenceDate">
<rdfs:label>Date of the Event where the paper has been submited</rdfs:label>
<rdfs:domain rdf:resource="#conferencePaper"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="conferenceLocation">
<rdfs:label>Location of the Event where the paper has been submited</rdfs:label>
<rdfs:domain rdf:resource="#conferencePaper"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="name">
<rdfs:label>Author's Name</rdfs:label>
<rdfs:domain rdf:resource="#author"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="phoneNumber">
<rdfs:label>Author's Phone Number</rdfs:label>
<rdfs:domain rdf:resource="#author"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="faxNumber">
<rdfs:label>Author's Fax Number</rdfs:label>
<rdfs:domain rdf:resource="#author"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="mailAddress">
<rdfs:label>Author's Mail Address</rdfs:label>
<rdfs:domain rdf:resource="#author"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:ID="position">
<rdfs:label>Position of the employee on the company</rdfs:label>
<rdfs:domain rdf:resource="#employee"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
</owl:DatatypeProperty>
</rdf:RDF>