d22.3 development of methods for virtualization of … id 284860 msee – manufacturing services...
TRANSCRIPT
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
D22.3
Development of methods for virtualization of
MSE intangibles
M12
Document Owner: Christian Zanetti, David Opresnik (POLIMI)
Contributors: Hadrien Boyé (HARDIS), Manuel Hisrch (DITF), Davide Storelli (ENG)
Dissemination: Public
Contributing to: WP 22.3
Date: 12/10/2012
Revision: V1.0
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 2/47
VERSION HISTORY
DATE NOTES AND COMMENTS
01 27/07/2012 CHRISTIAN ZANETTI (POLIMI) – INITIAL DELIVERABLE DEFINITION – EXTENDED
TABLE OF CONTENT 02 10/08/2012 TABLE OF CONTENT UPDATE 03 23/08/2012 REVIEW OF T.O.C. BY GUY DOUMEINGTS 04 27/08/2012 HADRIEN BOYE (HARDIS) – CONTRIBUTION - PART OF THE VIRTUALIZATION
PROCESS (EII AND ETL) AND ALL VIRTUALIZATION TECHNIQUES 05 10/09/2012 DRAFT OF DELIVERABLE FOR REVIEW TO COORDINATORS
06 24/09/2012 DELIVERABLE FOR PEER REVIEW
07 08/10/2012 RECEIVED FROM REVIEW
08 12/12/2012 FINAL VERSION
DELIVERABLE PEER REVIEW SUMMARY
ID Comments Addressed ()
Answered (A)
1 Describe shortly what virtualization of intangible assets is. Give references is needed.
2 Mention explicitly which one. A
3
Add references. References should be added
everytime you mention a new technology,
language, etc.
4 Add link to project webpage
5
For each entry in this table explain briefly how the framework (or parts of the framework) you mention is beging used (same as is currently done for the first row entry).
6 Please rephrase. ”The identification of ... is being indetified ...” sounds strange.
7
To make it much clear I would rephrase this step as: Indetification of all resources inside the ME which are relevant to the project. I also expect step a. to be performed not so often as step b.
8 Add a pointer to Section 2.2 where these techniques and how they are used during the virtualization process are described.
9
Please make a reference to the WP and deliverable(s) that are providing solutions for modeling and virtual representation by means of USDL.
10 The English for this part of the document is a slopy. Please check the language.
11 Rephrase
15
Are Rules used exclusively for security and trust. I can imagine many other processes and tasks where Rules can be used to represent and manage knowledge about IA in a ME/MSE.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 3/47
16 Please rephrase. I would say: ”The gap between
the planned and actual data usability is filled”
17
There has been a considerable effort in the Semantic Web and recently Linked Open Data research areas to provide solutions for EII. The deliverable should mention these and give a short overview.
18
Is it only a matter of different DB systems used? I think is more than that. I’m thinking here about different data scehmas (that are conceptually different) , different representations, etc.
19 For ontologies there many representation languages,OWL being a standard. I’m not sure what OL stands for in this picture?
20
The most common used definition of an ontology is the one from Gruber. An ontology is a ”formal, explicit specification of a shared conceptualization”. Gruber, “Toward principles for the design of ontologies used or knowledge sharing?” , Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,1995
21 Something is missing here. ”Knowledge aquisition for ..”
22
This section starts directly with a description of Meritum project. You should first say that the conceptual model for a ME developed in the Meritum project is adopted, and then start the describing Meritum.
23
RDFS and OWL are W3C standards and not recommendations. Please add references fo RDF, RDF-S and OLW. Add pointers to W3C specification of these standards.
24 It is worth adding a short paragraph on RDF Schema (RDFS)
25
There is a new version of OWL, i.e. OWL2, that makes a much clear and cleaner separation of its varaints, allowing different profiles (i.e. OWL2RL, OWL2EL and OWL2QL)
26
Why not considering other tools such as Neon Toolkit. It was built as part of the Neon FP7 project and offers the same functionality and has the same friendlyness as Protege. European Commision will most likely appreciate that we are using results/outcomes of other EU projects.
27
It would have been nice to have a parallel/comparision of the virtualization methods for intangible assets presented in this deliverable with the methods for tangible assets from WP23 if they are available.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 4/47
TABLE OF CONTENTS
EXECUTIVE SUMMARY _______________________________________________________________________ 5
INTRODUCTION _____________________________________________________________________________ 7
1 IDENTIFICATION OF TYPES AND SOURCES OF INTANGIBLE ASSETS RELEVANT TO THE ENTERPRISE ____ 8
1.1 CONCEPTS DEFINITION ____________________________________________________________________ 9 1.1.1 GRAI model ____________________________________________________________________ 9 1.1.2 Model Driven Service Engineering Architecture _______________________________________ 10
1.2 PROCEDURE FOR IDENTIFICATION OF TYPES AND SOURCES OF INTANGIBLE ASSETS ___________________________ 11
2 IMPLEMENTATION METHOD OF INTANGIBLE ASSETS ________________________________________ 13
2.1 VIRTUALIZATION PROCESS ________________________________________________________________ 14 2.1.1 Technical framework for the virtualization process of intangible assets ___________________ 21 2.1.2 Data quality management and data virtualization ____________________________________ 26 2.1.3 Linkage of the presented technical concepts with intangible assets virtualization process _____ 27
2.2 TECHNIQUES FOR THE VIRTUALIZATION PROCESS _________________________________________________ 27 Data mining __________________________________________________________________________ 28 2.2.1 Classification __________________________________________________________________ 29 2.2.2 Clustering _____________________________________________________________________ 29
2.3 FORMALISM - ONTOLOGY AND TAXONOMY ____________________________________________________ 30 2.3.1 Ontology development for intangible assets _________________________________________ 31
2.4 FILE FORMAT DEFINITION _________________________________________________________________ 34 2.4.1 Analysis of taxonomy management tools ___________________________________________ 36
CONCLUSION AND FURTHER STEPS ____________________________________________________________ 37
SOURCES _________________________________________________________________________________ 39
ANNEXES _________________________________________________________________________________ 41
ANNEX 1 – IA CLASSIFICATION IN THE MSEE CONTEXT __________________________________________________ 41
Figures:
Figure 1: D22.3 document map and relations with D22.5 ......................................................... 8 Figure 2: GRAI model ................................................................................................................ 9
Figure 3: Towards a Model Driven Service Engineering Architecture .................................... 11 Figure 4: The relation of the partakers (ME, VE, MSE) with the GRAI model and MDSEA
framework ................................................................................................................................. 11
Figure 5: Virtualization process of intangible assets in the MSEE context ............................. 16 Figure 6: Virtualization procedure of intangible assets ............................................................ 18
Figure 7: Linkage of ETL and virtualization procedure of intangible assets ........................... 27
Figure 8: Relation between taxonomy and ontology ................................................................ 30
Figure 9: Main classification of intangible assets within the MSEE context ........................... 33 Figure 10: Representation of a statement ................................................................................. 35
Tables:
Table 1: ETL Phases ................................................................................................................. 24
Table 2: Ontology development process .................................................................................. 32 Table 3: Comparison of taxonomy management softwares ..................................................... 36
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 5/47
Executive summary
In order to understand the significance of the current deliverable D22.3, it has to be put into
context of its WP, which aim is to support the Manufacturing Service Ecosystem (hereinafter:
MSE) in managing Intangible Assets (hereinafter: IA), which ranges from their basic handling
to advanced analytical BI techniques. This deliverable provides all necessary documents to
develop and establish a technical framework that will enable the description and
representation of IA that could be offered as a service. Therefore this document offers to
D22.5 all necessary knowledge for the implementation of IA into such technical framework
that will lie in the MSE.
IA virtualization at an abstract level makes physical activities, relations and other assets more
virtual, being exemplified by ICT related activities (van Geenhuizen & Nijkamp, 2012). Such
process is per se not a novelty nor in science nor in practice. However making it successful,
effective and foremost holistic requires the development or amendment of specific
procedures, architectures and techniques. Therefore the value of this deliverable is at least
two folded: developing and proposing the virtualization procedure of IA into the MSE
repository and secondly proposing the usage and development of different processes,
approaches and techniques that are encompassed in the virtualization procedure.
At the beginning of the virtualization procedure, first the identification of relevant sources
and types of IA in Manufacturing Enterprises (hereinafter: ME) has been developed. A ME
has to structurally define its goals, for which the GRAI model is used. Afterwards by using
the MDSEA architecture, the ME can identify its IA that are project relevant and identify
what IA are additionally needed in order to execute a project. Those needs can be treated as
requirements, which could mean that the virtualization process could be possibly used to
virtualize also requirements coming from the market.
Afterwards a 6 steps virtualization process specific for IA in MSEE was developed.
Emphasis on data quality was put, as in IA management it can be one of Achille's heel. It was
then framed into the Extraction Transform Load (hereinafter: ETL) process in order to
provide D22.5 more information about the needed techniques needed to effectuate the process
and emphasize the potential risks during the virtualization process. ETL also gives a more
firm structure and adds validity to the developed virtualization process, as ETL is a well
known and established process.
Then techniques for handling the virtualization process were addressed. Therefore the
development of the ontology for IA in MSEE is presented, which is another novelty. A
development procedure for ontologies was followed and also well established guidelines
regarding intangible resources were followed during taxonomy development. The complete
ontology with detailed description is presented in D22.5 as part of the technical IA
framework. The ontology is presented as a content format; however it differs greatly from
the needed file format for handling the procedure, which was also presented; proposing the
use of RDF and OWL.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 6/47
To conclude, the true value that this deliverable co-creates, lies in two different matters. The
former one is that by virtualizing IA from a ME level to the MSE level, a foundation for
transformation from roles of individuals (e.g. a welder) to competencies (e.g. welding
competency) is being established. Meaning that the role of an individual will not be at the
core of services anymore, but competences will be. The latter essential matter that this
deliverable contributes to, although not so conspicuously, is helping ME and the MSE in
transferring skills and competences from an individual level to an enterprise and even to a
MSE level, while still being able to preserve IPR rights. Namely the ability to transfer them to
higher level is crucial in order that the MSE could operate successfully.
The use-case Bivolino is presented in D22.5, thus this deliverable merely provides necessary
information for the implementation of IA that could be offered as a service.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 7/47
Introduction
In D22.1, architectures, models and techniques that will contribute greatly to an efficient
virtualization have been identified, like Model Driven Service Engineering Architecture
(hereinafter: MDSEA) (INTEROP_NoE, 2005), Unified Service Description Language
(hereinafter: USDL) (Rockley, 2002) etc. The key concepts that affect the process have also
been identified and analyzed, such as intangible assets (hereinafter: IA), knowledge
management, multi level human resources management, organizational change theories etc.
The objective of this subproject is to develop all relevant documents, specifications and
guidelines required for the implementation of IA into the Manufacturing Service Ecosystem
(hereinafter: MSE) allowing their quick and efficient exchange. There are two essential steps.
In the first one a procedure and techniques are developed and proposed in order to help the
ME identify exactly what are their business goals, what are their relevant resources and needs
in terms of IA and how to identify them. The subsequent step is explicitly referring to
defining a virtualization procedure that is specific for IA in the MSEE environment, which is
constituted from the virtualization process, techniques (e.g. data mining), formalisms
(ontology and taxonomy), file formats (RDF, OWL).
Document roadmap
The first section proposes a procedure to identify types and sources of IA that are relevant to
a ME and that will be virtualized. Besides it also enables the ME to identify exactly its needs
from the MSE in order to execute a project. However before those two steps can be done, the
procedure proposes a structured way for a ME to identify and decompose its business
objective (on all three levels).
The second main section deals with the implementation method of IA into the MSE
repository. First the virtualization process adapted to IA needs is developed and explained.
Afterwards it is inserted into a well known framework, the ETL. Now that the process of
virtualization is known, the techniques required are presented. The following subsection
presents the content formats in which the data about IA will be represented, which are the
ontology and taxonomy. The emphasis is put on their development process that has been
used for IA. Also the main classification of IA specific for MSEE is presented. However the
entire developed ontology will be presented and explained in D22.5 as part of the technical
framework for IA. The document’s last section is related to specific file format, such as OWL
and RDF.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 8/47
Figure 1: D22.3 document map and relations with D22.5
Linkage between D22.3 and D22.5
The D22.3 creates a method for virtualization of IA expressed as all documents necessary for
the implementation of IA into the MSE repository. As for the D22.5 it uses the documents for
the implementation of IA from D22.3 and it then defines requirements to be fulfilled (hence
forming the conceptual IAMS framework) in the technical IAMS framework, which results as
a MSEE specific taxonomy of IA represented by means of USDL.
1 Identification of types and sources of intangible assets relevant to the enterprise
Criteria that have to be fulfilled on an enterprise level in order to enable effective
identification of relevant sources and types of IA are case specific. This is because ME have
different needs at different stages of their business lifecycle. However, common frameworks
guiding the enterprise’s decision are proposed, those being based on the GRAI model and
MDSEA (uses three level of abstraction BSM/TIM/TSM). The GRAI model is used to define,
describe and communicate the goals of the ME and the VE at all three levels (strategic,
tactical and operational). Afterwards the MDSEA is used to ease and improve the
identification and structured representation of their:
a) available and relevant IA (that will be mapped into the MSE’s repository) and
b) requirements for additional specific IA (that will be sent to MSE).
2. Virtualization process &
Techniques
&
Formalism
&
Formats
All documents needed for
representing and
describing
IA (D22.3)
1. Identification of types and
sources of IA
in a ME
Essential guidelines (hanlding, management of IA)
+
Advanced guidelines (BI techniques
for value creation from IA)
=
IA Manufacturing
CONCEPTUAL Framework
IA
Manufacturing Service
CONCEPTUAL Framework
IA Manufacturing Service
TECHNICAL Framework
= IA represented
(by IA specific taxonomy)
and described by means
of USDL as a service on a use-
case (D22.5)
+
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 9/47
1.1 Concepts definition
This section defines two concepts, being the GRAI model and the MDSEA architecture.
1.1.1 GRAI model
GRAI conceptual Model has been developed by University Bordeaux 1 – GRAI Laboratory
and then LAPS and IMS Laboratory. It has the following objective to represent with the same
concepts, the global and the local models of a manufacturing system in an enterprise. In fact,
the GRAI Model defines the various concepts that will be represented in the GRAI graphical
formalisms. The interest of a conceptual model is to relate the various concepts in order to
show their coherence, to avoid redundancies and to have a complete modelling.
The GRAI model is very rich and holistic, thus it is based on several theories: System Theory,
Hierarchical System Theory, Organisation Theory, Discrete event systems, Production
Management Concepts.
This document finds great value in the GRAI model as one of its main characteristics is that it
has the possibility to decompose the so called control system (the Management system), as it
is very complex. This axe controls the second one in the GRAI model – the physical system.
(Business system).
Figure 2: GRAI model
AG
GR
EG
AT
ION
Products Flow
R : Resources
R R
R R R
Synchronization
P
Synchronization
Coordination
TACTICAL
OPERATIONAL
STRATEGIC
PERIOD = 1 y. HORIZON = 5 y.
H=1y.P=1m.
P=1d. H=2w.
P=1w. H=2m.
To
manage
sales
To
manage
design
To
manage
engineer
.
To
manage
manufact.
To
manage
assembli.
To
manage
delivery
Market
Dec
om
po
siti
on
/
Aggre
gat
ion
of
info
rmat
ion
C O
O R
D I N
A T
I O N
CO
HE
RE
NC
E
Process
View
Process
View
Process
View
C
O
N
T
R
O
L
S
Y
S
T
E
M
CONTROLLED
SYSTEM
Source: (Doumeingts et al., 2012)
The control system (vertical axe) is decomposed in three levels – strategic, operational and
tactical. The controlled system is the decomposition of functional activities.
In this document we use the GRAI model to model and decompose the objectives (on all
three levels) of a ME when entering in contact with the MSE (e.g. it is assumed that a ME that
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 10/47
steps in contact with the MSE has a specific business requirement). Such structured
decomposition methodology will greatly contribute to the steps in the procedure of IA
virtualization, including the first steps being the identification of the relevant types and
sources of knowledge, however more in details in latter sections.
1.1.2 Model Driven Service Engineering Architecture
This concept definition also leans completely on the definition from D11.1. The Model
Driven Service Engineering Architecture (MDSEA) which defines three levels of
abstraction, inspired by the MDA/MDI Architecture proposed in INTEROP-NoE project
(INTEROP_NoE, 2005):
a) Business Service Model (BSM) to model the Service System at the Business level.
The models defined at the BSM level focus on the representation of the service (and
of its functionalities) and of the Service System (Enterprise, Virtual Enterprise and
Service Manufacturing Ecosystem) capturing information on its related product,
partner, customer, stakeholder, service KPIs and value, as well as on decision-making,
organization, resource and process.
Technical Independent Model (TIM) delivers the models at a second level of
abstraction independent from the technology used to implement the system. It gives
detailed specifications of the structure and functionality of the service system that do
not propose technological details. More concretely, it focuses on the operation details
while hiding specific details of any particular technology in order to be suitable for
use with several different technologies. The service system will be elaborated with
respect to: IT, Organisation/Human and Physical means.
Technical Specific Model (TSM) level which provides the technical model of the
various domains components and supports their realization. Combines the
specification in the TIM model with details that specify how the system uses a
particular type of technology (such as for example IT applications). At TSM level,
modeling and specifications must provide sufficient details to allow developing or
buying software applications, components, recruiting human operators / managers or
establishing internal training plans, buying and realizing machine devices, for
supporting and delivering services in interaction with customers. For instance for IT
component, a TSM adds to the TIM, technological details and implementation
constructs that are available in a specific implementation platform, including
middleware, operating systems and programming languages (e.g. Java, C++, EJB,
CORBA, XML, Web Services, etc).
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 11/47
Figure 3: Towards a Model Driven Service Engineering Architecture
Organisation
HumanDomain
Physical
meansDomain
IT
Domain
Business Services Models (BSM)
Technology Independent Models (TIM)
Technology Specific Models(TSM)
Services in virtual enterprises(IT Applications, Processes, Products,
Services, Organisation/Human, Physical
Means(machine, robots), etc…)
Generation of “components”
( IT_ Organisation/Human_Physical means
Source: (Doumeingts et al., 2012)
1.2 Procedure for identification of types and sources of intangible assets
When combining the use of the GRAI model and of the MDSEA, the procedure providing the
MSE the necessary information needed to assist in the creation of a VE is presented in the
table hereinafter (from the IA point of view). This procedure is presented in relations to each
partaker.
Figure 4: The relation of the partakers (ME, VE, MSE) with the GRAI model and
MDSEA framework
Partakers STEPS Identification of objectives
and needs Frameworks
Manufacturi
ng
Enterprise
(MEn)
1
Identification and structured
exposition of the enterprise’s
goals (on all 3 levels); on the
two lower levels, those goals
are also the goal of the VE
(Goals VE).
GRAI
All three levels are needed in order
that the enterprise can clearly define
the VE’s goals.
VE 2
I.AVE - identification of all
the resources needed for the
creation of a VE
MDSEAVE
Using the MDSEA a VE can be
modelled, thus identifying exact
needs in regards to IA.
MEn 3
IA En – identification of
available resources relevant
to the specific project:
a) identification of
MDSEAEn
Using the MDSEA, resources of the
ME can be modelled and hence
described in a clear and structure
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 12/47
way.
MSE
(ecosystem) 4
IAMSE - resources yet to be
acquired through MSE,
established based a gap
analysis between the
resources that are available in
the enterprise and the one that
are needed the VE creation.
Comparison of MDSEA from the
enterprise and the VE
MDSEAMSE= MDSEAVE -
MDSEAEn
Because along the process, the same
technique was used, a clear
comparison can be made.
MSE
(ecosystem) 5 Virtualization of IA.
The procedure is described in more details hereafter:
1. First a ME identifies and structures its goals (on all three levels) with the help of the
GRAI model (step 1).
2. Afterwards the identification of the resources needed to carry out a project is
performed (step 2), thus answering the question “what resources (limiting ourselves to
IA) are needed to effectuate the planned services through a VE”.
3. Then resources that are relevant and available inside a ME are being identified (step
3) through two sub steps:
a. Identification of all (available) resources inside the ME (maybe such overview
is already available in the ME). This gives an answer to the question:”What
kind of resources the enterprise holds or can acquire by itself? This step is not
necessary to be executed multiple times (e.g. when an enterprise maps its IA
once, it can only update them, not need to redo the entire process all over
again).
b. Identification of all resources inside the ME, which are relevant to the project.
It answers the question:”What resources that are needed for the project do we
already have?”
c. Gap analysis of resources (between the available inside and the needed one for
a specific project), which specifies exactly what resources (with exact
specifications) are missing and that have to be acquired through/with the help
of the MSE. The result of this gap analysis is the definition of resource
requirements of the ME that can be searched for through the MSE.
4. Finally the resources that have to be acquired through the MSE (or with its help) are
identified (step 4). This is performed based on the gap analysis between the resources
that are available in the enterprise and the one that are needed for the VE creation.
However this step is to be undertaken only in the case that an enterprise does not know
exactly what kind of IA has to be yet acquired through the MSE to attain its business
objective. Hence, the MDSEAMSE describes exactly what resources are yet to be
acquired (IAMSE). Only after this step the needed IA are clearly defined and structured.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 13/47
Afterwards the identification of their location (sources) and their form (types) is
easily established. Because of data and data bases heterogeneity, every ME will have
to do this by itself according to their internal information system.
5. Afterwards that all needed information are known and gathered (using techniques such
as data and text mining), the enterprise sends and integrates (this is the process of
virtualization) into the MSE the following structured information about:
a. its relevant available resources;
b. its requirements about:
i. the specific project that it wants to effectuate (those are market
requirements that are defined as business objectives on all three levels)
and
ii. the needed resources that have to be acquired through the MSE.
This means that every ME that wants to effectuate a project through the MSE will have two
roles:
a) feeding the MSE with its available relevant resources and
b) sending requirements from the market to the MSE (the ME represents partially the
market).
Also critical interdependencies between the sources of IA will be defined. Depending on their
strategy and business objectives, enterprises will be able to insert case specific types and
sources of IA (e.g. knowledge) expressed by MDSEA according to the different abstraction
level (BSM/TSM/TIM). Those identified interdenpencies with MDSEA will be inserted in the
IA ontology.
2 Implementation method of intangible assets
The process of linking an enterprise’s IA into the MSE’s repository has to ensure consistency
and quality of information; hence a clear sequence of steps, called a procedure, for the
virtualization process has to be defined, which will be harmonized as much as possible with
D23.3 Alongside rules has to be specified (e.g. addressing issues such as trust and security).
Formats in which the implementation and execution of the virtualization process will be
effectuated have to be determined as well, based on a set of predefined criteria (e.g.
manageable and sharable). Some issues that will have to be addressed are: dealing with
different formats of data representing IA during extraction, mapping those formats together to
build links, storing those data as a single format (this is the targeted format) etc. And
foremost anticipating the potential risks related to format management, where measures will
have to be defined to minimize those risks (like mismatching, mismanagement).
Next, the activities during the virtualization process will be identified and suitable handling
techniques needed to execute such process will be assigned (e.g. text mining for data
extraction, maybe also to identify interrelations between entities etc). Some other techniques
are data mining, rule base reasoning, classification or clustering, which can be found in
section 2.2.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 14/47
One very important factor regarding the data handling about IA is assuring their consistency
(referring especially to validity and reliability). This is why potential risks that undermine
their quality will be identified, analyzed and measures for risk alleviation will be proposed.
An example of such risk could be errors during data extraction due to inconsistent format of
IA (e.g. CVs in different forms with different attributes).
2.1 Virtualization process
This section is structured as followed. First the virtualization process of IA in the MSEE
context is represented graphically, in order to better understand its role and importance in the
MSE. Afterwards the virtualization process of IA in the MSEE context specific is proposed.
Data quality issues are emphasized and finally the virtualization process for IA within MSEE
is integrated into an existing well established framework – ETL.
The virtualization process, illustrated in the Figure bellow, was briefly described previously.
It purpose is two folded. Firstly to provide clearly as possible the virtualization process of IA
and secondly to position the process in the MSEE context and depicts its relevance for the
entire functioning of the MSE.
Description of the virtualization process on Figure 5:
1. Identification and structuration of goals of the ME (GRAI)
2. Identification of relevant data and sources (MDSEA)
3. Virtualization of (for details look at Figure 6):
a. requirements
b. available resources/assets
The process of virtualization can be repeated multiple times for different enterprises. The
same virtualization process is used to virtualize resources and requirements coming from
the enterprises (the market).
4. The virtualized information is then linked into the USDL repository. If needed, it will
have to be extended to meet the ME’s specific needs.
5. When all the needed resources from the MEs and other enterprises (note that enterprises
that are not manufacturing can also participate) are in the MSE repository, the assets and
the desired service is now described (with the help of USDL), hence forming the so called
capabilities of services. Afterwards business intelligence and statistical techniques can be
applied in order to define the most optimal composition of new services. Such operations
are feasible due the deep level of assets decomposition enabled by the use rdf.
6. Then the optimal scenario is presented to the participating enterprises or to the leading
one.
7. After it has been chosen, the extraction process begins and the optimal innovative service
is composed, which is feasible due to the use of USDL. It interlinks different service
modules that are represented together with one layer, called required service. Each module
is described by 3 layers (BSM/TIM/TSM). The business model, which is mostly
compounded from prices, legal relationships, clearly defined project goals is included,
described and offered in the service. However management of the enterprise is still
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 15/47
required, as the market is dynamic and control is required, this is why a VE cannot exist
without management, regardless of the ideal service representation.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 16/47
Figure 5: Virtualization process of intangible assets in the MSEE context
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 17/47
Virtualization process of IA in the MSEE context
The process represents a case of virtualization for a creating a new project, hence the ME has
a clear aim and goal.
The hereunder presented virtualization process of IA is defined for IA that represent the
enterprise’s available resources (of IA) and that are concurrently being relevant to the
specific project objective. Namely virtualization processes can be used to process other kind
of information, such as: requirements from the market and newly acquired knowledge after
the successful conclusion of a specific project.
The outcome of the virtualization process of IA lead to virtual artefacts (representing IA)
that can be combined with tangible assets in order to build up service (or products) in
manufacturing industries, which – themselves – can be modelled and represented virtually by
means of e.g. the USDL service description language, which application is represented in the
deliverable D22.5 entitled Development of IA in Manufacturing Service (IAMS) framework.
On the next Figure below is represented the entire virtualization procedure that
encompasses not merely virtualization techniques, but takes into account constraints such are
rules and IPR issues. The virtualization procedure can serve as framework for virtualization,
making it wider as the process itself.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 18/47
Figure 6: Virtualization procedure of intangible assets
Description of the virtualization procedure steps:
1. Identification
Firstly goals of the project and also of the virtualization have to be defined. Consequently,
based on a predefined IA framework (using taxonomy), the detection of relevant types
(explicit and then tacit) and sources of knowledge, within and around the enterprise, can be
performed.
In order that such identification process would proceed structurally and not intuitively,
enterprises will lean on a 3 levels model of abstraction (strategic, tactical and operational),-
the MDSEA architecture. After defining the enterprise’s relevant resources (treated as inputs
to the data repository) for the specific project (limited to IA) (for details look at D22.3,
section 2), the detection of types and sources (based on a predefined taxonomy that is context
specific) of knowledge can be effectuated.
One of the main goals, from the MSE point of view namely, is that the virtualization process
has to enable the transfer of individual competences onto higher levels (e. g. on enterprise
and MSE level). Boundaries of the process will not be defined, as it is one of the advantages
of the (open) enterprise and of course of the MSE.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 19/47
2. Analysis
This step commences by codifying the explicit and foremost the tacit knowledge in a unified
manner using semantic techniques such as ontology and taxonomies, as they contribute to
ease the organization, communication and reusability of IA. Only the knowledge identified
based on project goals and resource needs is being codified.
As IA have special characteristics - being “soft” and ubiquitous, their integration into a
system is often seamless, however as their value can quickly increase, so can it also decrease
and if not treated appropriately, losing its value and becoming useless. In order to mitigate
such risk of potential value degradation, data consistency and reliability and validity has to be
ensured at different stages of the virtualization process. Therefore the identification and
analysis of the relevant Input-Output relations, interconnectivity and foremost the
interdependency between individual objects (assets) is being performed. Among others the
goal is to identify specific IA that could not perform efficiently without another type; this way
the issue of potential inconsistency is being addressed.
In the current step enterprises that provide the knowledge are included in the process to
increase its quality. Also these way enterprises are more empowered during this process. It
gives an additional opportunity to discover anomalies during the virtualization process. If
anomalies are discovered, the process of additional part of knowledge can be codified.
Another risk beside inconsistency is that all relevant knowledge cannot be found, although it
is present in the enterprise. Tacit knowledge can only be integrated into a system when it is
found (Bohlouli, Holland, & Fathi, 2011). This is why after the codification of explicit
knowledge experts codify tacit knowledge (based on available sources in the enterprise) on an
individual level. Afterwards each employee is given an opportunity to complement their
knowledge and competency “list” with additional tacit knowledge that in most cases is not
known to the enterprise; such knowledge could be for instance important business
connections in specific industries or competencies that are not directly linked with the
employee’s workplace. Such quality loop gives the opportunity to increase the knowledge
database consistency and validity. However if the process of virtualization requires the
cooperation of employees, the organizational change process has to be performed with great
care and in maximum consent with the employees (this issue is addressed in WP24).
Employees are also given an opportunity to provide and assessment of their IA:
a) the level of experience and/or depth of expertise with the stated competencies or
skills;
b) the relevance (usability) of the specific knowledge (expressed in percents) to the
defined project goal;
c) potential geographical limitations.
The initiation of such assessment tool would affect the following:
a) The enterprise’s knowledge expert will be able to complement the assessment metrics
with the level of usability and quality of specific knowledge on the project after its
execution. Therefore information about IA already integrated into the MSE will
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 20/47
become more reliable and hence more reusable, which is also one of the MSE’s
objectives. The second issue
b) If such kinds of competencies would be available, assessment metrics could serve as
orientations for a final decision of which one to use.
c) It would allow tackling the issue of most virtualization process where people are
involved – behavioural peculiarities – for instance the deception when describing
yourself and your competencies (this is common especially in online dating processes)
(Overby, Slaughter, & Konsynski, 2010). Such deceptions form the part of employees
could potentially also occur.
In the last step of this stage, taxonomies from different sources and/or enterprises are being
aggregated.
3. Rules
In order to fortify trust and security, which are important barriers in such forms of
organizations regarding IA management, rules (including constraints) could potentially be
defined on two levels – on a set of IA level (e.g. set of knowledge about drilling) and also on
an attribute level. The process would hence merely offer a frame into which detailed rules at
different levels would be applied. Those rules would be applied at the demand of two main
sources: a) the collaborating enterprises or of b) the MSE's due to its potentially predefined
rules at its level, which are part of its managerial policy. There are also process rules that are
related to process management and relations between entities. Of course the collaborating
enterprises check and validate the defined project rules, this way additionally mitigating risk
for errors that would possibly undermine the reliability of the MSE.
4. Support procedures
Procedures to support the data management and foremost the execution of defined rules in the
previous process step are being identified, selected and applied. With this step the answer
»how« is mainly answered (e.g. how a rule of data access will be enforced).
5. Populating with data
Firstly units and values are being extracted from the enterprise’s data base and inserted into
the semantic model. Afterwards objects and meta-models are being created (comprising also
the previously defined and aggregated rules), which can be considered as a schema for
semantics that will be further managed (e.g. exported and/or stored).
6. Quality assurance
After the virtualization process IA analyzing techniques will be applied with the goal to
discover new value for the collaborating enterprises. However if the virtualization process
will be of low quality, the later applied IA analyzing techniques will be inefficient, hence
creating no new added value for enterprises. This is why another data and rules consistency
check (referring especially to validity and reliability) with the collaboration of the enterprises
is performed. The check for gap between the planned and actual data usability is performed. If
needed the process is optimized.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 21/47
Till now the procedure for IA discovery in terms of sources and types were proposed as well
as the IA virtualization procedure. In the next sub section the technical framework of the
virtualization process will be presented and the virtualization process of IA inserted into the
frame.
2.1.1 Technical framework for the virtualization process of intangible assets
In this section the main IT techniques that need to be used and/or applied when executing the
IA virtualization process are presented. First the architectural approach is presented, being the
based on Enterprise Information Integration. Afterwards the data virtualization is explained
from the technical point of view. Then the technical framework for the IA virtualization is
presented, being the Extract, Transform, Load (hereinafter: ETL) technique. The
virtualization process of IA is then directly linked with ETL. Matters such as data
warehousing are also addressed. At the end of this section possible data quality issues are
identified and measures are being proposed to alleviate them.
Enterprise Information Integration and data virtualization
Enterprise Information Integration (hereinafter: EII) is an architectural approach allowing an
enterprise, or a set of enterprise to have a unified view of certain data of the organization. As
an application of EII, Data Virtualization aims to provide a unified view for accessing all the
relevant data of an organization through a single set of structure and naming convention to
represent this data. From an IT perspective, Data Virtualization performs data integration
using data abstraction techniques upon large sets of heterogeneous data sources managed by
disparate IT systems (databases, files, websites, data services …). A single access layer is
responsible to provide a consistent representation of the information in a unified structure
format, independently of the technical aspects of source data, such as location, storage
structure, API, access language, and storage technology.
This conceptual pattern is typically used in various enterprise applications, such as business
intelligence, service oriented architectures, cloud computing and master data management.
Technically, examples of an implementation of the Data Virtualization pattern are numerous:
Enterprise Service Bus (hereinafter: ESB) software can be used to develop a layer of
service to allow access to the data, while hiding technical implementation details and
location of the source data.
Cloud storage services can act like a single access layer by providing a unified API,
making the location of the data irrelevant for the consumer.
At the level of infrastructure, virtualization can expose data access through a single
system by federating and abstracting multiple and disparate storage units.
However EII has many challenges during different phases of its lifecycle. As the goal of EII is
to get a large set of heterogeneous data sources to appear to a user or system as a single,
homogeneous data source, different data schemas and representations has to be linked
together. For this ontology(s) can be used in order to effectively combine data or information
from multiple heterogeneous sources. Ontology can be used to structure information about
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 22/47
enterprise business resources and define relevant standard terms and concepts to describe
these resources. Moreover, ontology based information have the ability of semantic reasoning,
which facilitate inference, retrieval and discovery of knowledge in mobile commerce
applications, and thereby ensure their scalability, universality and interoperability (Wei,
Kang, & Zhou, 2008). However the effectiveness of ontology based data integration is closely
tied to the consistency and expressivity of the ontology used in the integration process
(Wache et al., 2001).
As data integration can be a costly and long process, new approaches have to be always taken
under consideration. Such one could be the so called Linked Data, which emerged, which
refers to a set of best practices for publishing and connecting structured data on the Web.
These best practices have been adopted by an increasing number of data providers, which can
deal with for instance data bases of different sizes, schemes and in different geographical
locations. Technically, Linked Data refers to data published on the web in such a way that it is
machine-readable, its meaning is explicitly defined, it is linked to other external data sets, and
can in turn be linked to from external data sets (Heath, Hepp, & Bizer, 2009). A need for a
simple, bottom-up, best practice-based approach is expressed (Frischmuth et al., 2012).
To sustain the process of data virtualization, we can identify several applicable patterns from
the field of Enterprise Architecture, as well as several techniques from information
management science.
Data virtualization relies on a set of capabilities that many software techniques can provide.
Among such capabilities, we can mention:
Abstraction: decouples the representation of the information from its technical
representation, location and storage technology.
Virtualized data access: unifies the access to the data in the form of a single access
layer
Transformation and integration : mostly focus on improving existing data by
sanitizing, verifying, aggregating, cross linking and enriching data across multiple
sources
Data federation: combines and unifies content from multiple and autonomous source
storages. Data federation may imply transformation, cleaning, and data enrichment.
Flexible data delivery: publish valuable data sets as services, consumed by external
applications or users upon request. This capability has beneficial influence for
reducing cost and complexity of integration at enterprise level by promoting
reusability. At this level should be considered the publication rules applicable to these
datasets (e.g.: privacy policies).
The file system in enterprises is expected to be distributed, heterogeneous, with different
levels of security. However the virtualization process of the preselected data should resolve
those issues by mapping them into the MSE’s repository with a single access layer. Before
achieving this, constraints have to be taken under consideration before and during the
virtualization process (some of the constraints lean on the work of Liu, Cao, and He (2011):
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 23/47
a) Because an enterprise can identify relevant data in internal as external sources (e.g.
business partner), data heterogeneity has to be dealt with, meaning for instance tha
there different data schemas, which are conceptually different and could also be
differently represented.
b) Data distribution has to be taken into account.
c) The virtualization of data is predicted to first stream locally on the IT system of the
enterprise’s data owner. Only when the data mapping is formally accepted by the data
owner, data can be mapped with the help of an extended version of USDL to the
MSE’s repository. Security rules can be defined by the enterprise to each mapped
attributes, but in the limits defined by the MSE’s framework for security and trust on
the MSE level. This allows on one hand to the MSE to enforce its security policy and
on the other to the enterprise to assign project specific security rules (meaning that
different security rules will be able to be assigned to the same set of data used in
different projects).
d) Issue of time dynamics – the quality (expressed as the depth of experience and
expertise) and availability of IA varies through time. Therefore changes in IA have to
be able to be transferred as automatically as possible into the MSE. Consequently a
cyclic constant procedure of IA change has to run on the enterprise level, locally. If
changes are perceived, they are then transferred into the MSE.
e) Finally, mining techniques have to enable the extraction of data by the application of
constraints and so called knowledge discovery agents (they define what sources and
types of knowledge are project relevant). Such schema architecture is completely
aligned with Danish in Khan (2008).
The next sub sections of the document explore various existing patterns or techniques which
can be used to provide the capabilities aforementioned.
Beside the identification of approaches for implementing virtualization processes, a specific
care should be granted to the identification of relevant data sources among the organization
infrastructure. A critical pre requisite should consist to maintain an inventory of existing data
repositories and evaluating their potential for being eligible to virtualization, regarding the
information contained the quality of the data and the subsequent cost for extracting and
processing the data.
Data warehouse and Extract, Transform and Load (ETL) techniques
The concept of “data warehouse” emerged in the early 90’s from business intelligence work,
related to data management (Vassiliadis, 2009). A data warehouse typically collects data from
several operational of external systems in order to provide its end users with access to
integrated and manageable information. Data warehouses are typically assembled from a
variety of data sources with different formats and purposes. As such, ETL is a key process to
bring all the data together in a standard, homogeneous environment and takes it role in the
field of Enterprise Data Integration (EDI). The implementation of the virtualization process
implies an overall transformation, from the origin of the relevant information or data (mostly
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 24/47
at the level of operational information systems), toward a commonly accepted format, in a
unified repository.
In practice, an ETL process has to overcome several inherent problems. First, since the
different sources structure information in different data models, the need to transform the
incoming data to a common schema, which will eventually be used for querying, is imperative
In typical cases, the source data stores can be On-Line Transactional Processing (hereinafter:
OLTP) or legacy systems, les under any format, web pages, various kinds of documents (e.g.,
spread sheets and text documents) or even data coming in a streaming fashion. Second, the
data coming from the operational sources may suffer from quality problems, ranging from
simple misspellings in textual attributes to value inconsistencies database constraints violation
and conflicting or missing information. Consequently, this kind of “noise” from the data must
be removed, to that end-users are provided with clean, complete and truthful information. The
extracted data are propagated to a special-purpose area of the warehouse, called Data Staging
Area (hereinafter: DSA), where their transformation, homogenization, and cleansing take
place. The most frequently used transformations include filters and checks to ensure that the
data propagated to the warehouse respect business rules and integrity constraints, as well as
schema transformations that ensure that data fit the target data warehouse schema. Third,
since the information is constantly updated in the production systems that populate the
warehouse, it is necessary to refresh its content regularly, in order to provide up to date
information.
The software processes that facilitate the population of the data warehouse are commonly
known as “Extract-Transform-Load” processes (Vassiliadis & Alkis, 2007).
Table 1: ETL Phases
ETL Phase Responsibility Difficulties
Extract
Identifying the correct
subset of source data that has to be
submitted to the ETL workflow for
further processing.
Extraction of the appropriate data
from the sources
Interference with configuration or
performance of the source
operational system.
Privacy / visibility of the data to be
collected.
Level of structuration of the data
being collected.
Transform
The transportation of the data to a
special purpose area of the data
warehouse
The verification and the validation
of the collected data in respect with
associate rules
The transformation of the source
Schema level problems: Structural
matching conflicts between the
conceptual models of source data
with target data model.
Record level problems: duplicated or
contradicting records. Difference of
granularity or timeliness (e.g.:
different aggregation levels, different
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 25/47
data and the computation of new
value
The isolation and cleaning of the
data
points in time)
Value level problems: Semantic
mismatches between information
values (e.g.: different time zones for
timestamps values) or
implementation formats (e.g. : date
formats : mm/dd/yy or
dd/mm/yyy)from different source
systems.
Performance: amount of incoming
data and complexity of the
transforming processes.
Load
The loading of the transformed data
to the appropriate relations in the
warehouse.
Maintaining the integrity of the target
repository (e.g. : discriminating
existing records)
Performance: bulk loading or
sequenced data loading.
Typically, only the data that are different from the previous execution of an ETL process
(newly inserted, updated, and deleted information) should be extracted from the sources. In a
traditional data warehouse setting, the ETL process periodically refreshes the data warehouse
during idle or low-load, periods of its operation (e.g., every night) and has a specific time-
window to complete. Nowadays, business necessities and demands require near real-time data
warehouse refreshment.
Considering the complexity which can be involved by such ETL processes, the design phase
of ETL processes is considered as critical for its performance, across the lifetime of its usage.
One has to consider many aspects, such as: the dissemination and the heterogeneity of the
source data at the logical and the technical level, the scalability of the ETL system to sustain
the data volumetry at the runtime, the interactions with other information systems.
Software for transforming and filtering information from one (structured, semi-structured or
unstructured) location to another has been developed since the early days of databanks. Since
then, any kind of data processing software that reshapes or filters records and populates other
data stores is a form of an ETL process. Therefore, we can consider that the definition of ETL
process can sustain the first technical tasks accomplished by a logical framework for
virtualization, in the context of MSEE.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 26/47
2.1.2 Data quality management and data virtualization
The profusion and variety of data sources and consuming applications in the organization
infrastructure is often characterized with multiple copying and replication actions for reusing
the information across the organization. This repeated pattern tends to progressively introduce
inconsistencies, leading to the demand for a greater level of trust in the quality of data.
Typical data quality issues can be summarized by the following:
Structural and Semantic inconsistency: differences in formats, structures, and
semantics, which may impact the usability of the data by consumer applications
Inconsistent validations: differences between validation rules applied across various
business processes has a direct impact on the overall data quality level and reduce its
reusability
Replicated functionality: repetitive application of similar data cleansing processes
increase cost and has no positive impact on the consistency of the data
Data entropy : the multiplication of data silos contribute to degrading the quality of
data and
Consequently to these data quality issues, the practice of preparing the data for secondary
uses is commonly applied, and the following techniques are used and orchestrated by ETL
processes (cf. “data warehouse”):
Data validation : consisting to check data instances along defined quality rules
Data parsing and standardization: where data values are processed and potentially
reformatted into a standardized representation
Data cleansing: to avoid duplication and apply automatic value corrections, for
resolving structural and semantic inconsistencies
Data enrichment: for improving the value of data, this technique can consist to add
content to enrich the initial data.
The application of such techniques can encounter some limitations when applied multiple
times at different levels of the organization. It can lead to the introduction of new errors or
inconsistencies as the rules for validation / parsing / enrichment may vary, furthermore, the
repetition of such processes may not be cost effective and produce the inverse effect of the
initial objectives for improving data quality. As an alternative to the “data sharing” pattern,
some techniques of data virtualization suggest to avoid the replication of primary data sources
into intermediate repositories. In this context, source data remains in its primary data source
and is delivered to consuming applications through an abstraction layer, responsible for
retrieving content and standardizing its representation. By incorporating several data quality
techniques, this abstraction layer can address most of data quality challenges, such as:
resolving structural and semantic inconsistency, reduce replication, unifying data validation
rules, etc…
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 27/47
2.1.3 Linkage of the presented technical concepts with intangible assets virtualization
process
This section presents a direct links of the previously defined technical concepts like the
technical framework for the virtualization process and data quality management.
The virtualization process that was developed for virtualizing IA in the MSEE is inserted into
the ETL process in order to provide multiple additional information to the integrator, because
when it is clear at which phase of the process the integrator is, he can then easier allocate the
proper techniques to handle the process, plan the process of integration and anticipate
potential risks. The other benefit is that by linking those two concepts, it gives a firm structure
to the IA virtualization process and validity, as the ETL process is quite established.
Figure 7: Linkage of ETL and virtualization procedure of intangible assets
The next section provides the techniques that are needed in order to handle the virtualization
procedure. For instance techniques like data and text mining can be used to identify and
extract knowledge, checking for inconsistencies etc.
2.2 Techniques for the virtualization process
The procedure for IA virtualization and its appropriate technical framework have presented
and linked. This section presents the key techniques to be used during the presented procedure
as part of the presented technical framework.
EXTRACT
TRANSFORM
LOAD
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 28/47
However before conducting an analysis of the most suitable handling techniques, the main
constraints in form of guidelines for the MSEE environment regarding IA have to be defined.
Such constraints are:
- security,
- data heterogeneity,
- data stored in different places,
- constraints regarding data collection etc,
- potential usage of USDL and RDF and
- other potential constraints on the MSE level (have to be inserted after trust and
monitoring related constraints defined on the MSE level in WP11)
Data mining
Data mining (hereinafter: DM) is an interdisciplinary field that combines artificial
intelligence, computer science, machine learning, database management, data visualization,
mathematic algorithms, and statistics. DM tools support the identification of hidden patterns
in large volumes of structured data based on statistical methods like association analysis,
classification, or clustering (Hand, Mannila, & Smith, 2001). DM is a technology for
knowledge discovery applied to large scale databases. This set of techniques provides
different methodologies for decision-making, problem solving, analysis, planning, diagnosis,
detection, integration, prevention, learning, and innovation. For example, certain knowledge
discovery applications rely on data mining techniques for building classification metrics from
paradigms such as Bayesian classifiers, rule-induction, and decision tree algorithms. Decision
support is the objective for applying DM to extract knowledge from a database for certain
management issues, such as customer service support, corporate failure prediction, marketing,
and grid services (Abidi, 2001), (Cannataro, Talia, & Trunfio, 2002), (Lin & McClean, 2001)
and (Shaw, Subramaniam, Tan, & Welge, 2001). Also, knowledge warehousing is developed
as an architecture to integrate the functions of knowledge management, decision support,
artificial intelligence and data warehousing (Nemati, Steiger, Iyer, & Herschel, 2002). In
general, the data mining process, and the data mining technique and function to be applied
depend very much on the application domain and the nature of the data available.
In the context of IA virtualization, techniques derived from the field of data mining may be
used to address certain data quality issues, such as extracting value from unstructured data
(i.e.: text mining), identifying correlations inside large amount of data, or assessing the
relevance of certain contents in the organization.
Data mining, as a knowledge discovery technique, may typically include the following steps
in an iterative process:
Data cleaning
Data selection
Knowledge presentation
Data transformation
Data integration
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 29/47
There are a lot of trends influencing the use of data mining such as different trends
(computing, business, wireless...), which are of course changing. However the data mining
techniques during the virtualization process must be able to take under consideration dynamic
changes of privacy and security policies, because:
a) different enterprises will have different requirements;
b) rules on the highest level will change, comprising security and privacy rules, as the
MSE will learn and evolve during time.
2.2.1 Classification
Classification is the task of mapping a data item into one of several predefined classes
(Fayyad, Piatetsky Shapiro, & Smyth, 1996). On the basis of an initial training dataset
containing the category information, classification techniques will analyse this input data in
the form of a set of quantifiable properties and will produce a classification model, which can
be applied for the classification of similar datasets. Typically, classifiers algorithms can be
found in email servers for filtering spam emails. Classification techniques can find many
usage for business purpose, as it is an essential part of many data mining applications (e.g.:
predictive analysis).
2.2.2 Clustering
Clustering is a technique to group together a set of items having similar characteristics
(Srivastava, Cooley, Deshpande, & Tan, 2000). This data analysis technique finds its usage in
data mining applications for statistical data analysis purposes. Clustering is a typical form of
unsupervised learning which classify similar objects into different groups, or more precisely,
partition a data set into clusters, so that the data in each subset ideally share some common
trait (Lida, Li, Zhongzhi, Qing, & Maoguang, 2007). The data clustering techniques can be
used to perform similarity search, pattern recognition, trend analysis, grouping, classification,
and so forth (Cheng-Ru & Ming-Syan, 2005). It can be achieved by various algorithms, which
choices depend on the nature of the data being analysed and the overall objective of the result.
In this sense, cluster analysis can be seen as an iterative process of knowledge discovery,
including several cycles and temptatives of adjusting parameters setting of the algorithms
being used. These algorithms can be categorized into nearest-neighbour clustering (Lu & Fu,
1978) and (Khaled, 2004)], fuzzy clustering (Bezdek, Hathaway, Sabin, & Tucker, 1987),
partitional clustering (Dubes, 1987), hierarchical clustering (King, 1967), artificial neural
networks for clustering (Hertz, Krogh, & Palmer, 1991), statistical clustering algorithms
(Dempster & Laird, 1977), and so on. The notion of “cluster” can vary depending on the
algorithm chosen for solving a particular problem. Typical cluster models include:
“connectivity models”, “graph based models”, “distribution models”…
In business application, clustering help marketers discover distinct groups and characterize
customer groups based on purchasing patterns. Clustering techniques could bring support in
the virtualization process when processing large data sets and trying to group content by
category. e.g.: skills, competences, etc.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 30/47
2.3 Formalism - Ontology and Taxonomy
This section presents the preferred formats to handle the IA virtualized during the proposed
procedure. Hence this section presents the main characteristics of the content format and not
file format. As content format is defined as an encoded format for converting a specific type
of data to displayable information and are used to prepare transmission of data for their
observation or interpretation (Bojko, 2004) and (Rockley, 2002).
Based on the definition of content format, the concepts of ontology and taxonomy are first
presented. Afterwards the file formats, like RDF and OWL are explained.
The hereinafter section presents the most appropriate content formats for IA management –
ontology and taxonomy. First the concept of taxonomy is presented, and then the process that
was used to develop the ontology is presented. Finally the guidelines upon which relies the IA
ontology within MSEE are presented as also the basic classification of IA. The detailed
ontology can be found in D22.5 as part of the technical IAMS framework.
Taxonomy
Taxonomy is a hierarchical set of concepts incl. attributes, related by transitive is-a and/or
equality-relations. It is presented as a representation and management formalism, which are
crucial for the management of organisations. Pincher (2010) argues that, without a taxonomy
designed for storage and management, or one that supports better searching, all types of
management systems in an organisation are not usable. Hence, knowledge taxonomy focuses
on enabling the efficient retrieval and sharing of knowledge, information and data across an
organisation by building the taxonomy around workflows and knowledge needs in an intuitive
structure (Lambe 2007).
Figure 8: Relation between taxonomy and ontology
Expre
ssiv
eness
Complexity
Glossary
Taxonomy
Thesaurus
Entity Relationship/UML Model
Topic Maps
Formal Ontology
Syntactical
Interoperability
Structural
Interoperability
Semantic
Interoperability
RDF
KIF, OL
OWL Lite
Source: (Hirsch, 2012), MSEE – D23.3
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 31/47
Ontology
“Ontology” is a philosophical discipline, a branch of philosophy that deals with the nature and
the organization of being (Sure & Studer, 2002). An ontology is an explicit specification of a
conceptualization. For systems, what “exists” is that which can be represented. When the
knowledge of a domain is represented in a declarative formalism, the set of objects that can be
represented is called the universe of discourse. This set of objects, and the describable
relationships among them, are reflected in the representational vocabulary with which a
knowledge-based program represents knowledge (Gruber, 1993). In recent years ontologies
have become a topic of interest in computer science e.g. (Maedche & Staab, 2001) and
(Fensel, 2003). In its most prevalent use in computer science, an ontology refers to an
engineering artifact, constituted by a specific vocabulary used to describe a certain reality,
plus a set of explicit assumptions regarding the intended meaning of the vocabulary (Sure &
Studer, 2002). It can be considered a type of taxonomy with even more complex relationships
aiming to describing a domain of knowledge (e.g. manufacturing), a subject area, by both its
terms (called individuals or instances) and their relationships and thus supports inference
(Hedden, 2010).
Ontologies are built to be reused or shared anytime, anywhere, and independently of the
behaviour and domain of the application that uses them. So, ontologists should be able to
specify, at least partially, a big portion of the needed vocabulary that the ontology will cover
for a given domain (Fernandez, Gomez-Perez, & Juristo, 1997). For this they have to be
expressed logically in order to assure consistency, accuracy, meaningfulness, with the use of
properties, relations and classes.
Their role is crucial, thus they for instance enable knowledge based on the internet to be
processes, shared and reused between different applications. Such meaningful role in different
environments (e.g. knowledge management in/or between organizations) is due to their
characteristics, that they provide a common understanding of specific topics. For the
management of intangible asset, ontology has another meaning, as it is the most widely used
method of mapping the knowledge of a domain to represent and describe it (Brewster,
Ciravegna, & Wilks, 2010).
2.3.1 Ontology development for intangible assets
First this section provides an overview on the development process of ontologies, afterwards
the development process for IA in MSEE is presented.
Development process for ontologies
Till now there is now unified approach, method or technique to develop an ontology, which
leads to a lack of standardized activities. However predefined criteria and described
methodologies exist. The approach of building blocks was decided upon in developing the
ontology for IA in the manufacturing environment; meaning that we decided to develop an
ontology based on some predefined guidelines valid for IA and afterwards amend it to the
specific needs of IA in the MSE.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 32/47
The ontology development process that suites MSEE needs is the following that is extensively
based on the work of Fernandez (Fernandez et al., 1997).
Table 2: Ontology development process
1. Management
1.1 Planning the goals
1.2 Ontology specification requirements documents
2. Operation
2.1 Identification of sources of knowledge – it can be elicited using Knowledge
Based Systems techniques, which should enable the listing of sources of
knowledge and
2.2 Conceptualization and implementation in a conceptual model (try to integrate
as much as possible existing ontologies in your ontology)
2.3 Transformation of model into a compatible model
3. Support
3.1 Evaluation – if it understandable to all (complete documentation is often
needed)
3.2 Maintain - Guidelines for maintenance
As one of the major risk, besides non understanding of the ontology by the targeted user, is
the lack of quality. Therefore the evaluation phase is important. Ontologies should be
evaluated before they are used or reused. The technical verification is not the only type;
moreover the user verification holds a lot of weight. According to (Gonzalez, 2005)) It can
include:
- validation - if the ontology definitions really model the real world for which the
ontology was created;
- assessment is focused on judging the ontology from the user's point of view like:
o consistency,
o completeness,
o conciseness.
Development process for the intangible assets specific ontology within the MSEE context
In order to develop the IA specific ontology within MSEE, the EU funded project entitled
MERITUM was adopted as a basis. Afterwards the taxonomy was customized to specific
needs in the manufacturing environment.
MERITUM was an EU-sponsored research project. It produced a set of guidelines for the
measurement and disclosure of intangibles which should be useful both for private and public
policy decisions. To do so, the project was organised into four activities (MERITUM, 2001)
a) Classification of intangibles;
b) Management Control Study;
c) Capital Markets and
d) Guidelines
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 33/47
The guidelines have been obtained as result of the a log process of the project group started
with a reflection on the economic nature of intangibles and a discussion of their definition and
classification, then continued with the analysis of the current measurement, management and
disclosure practices, and concluded with a test of the validity of the guidelines by means of a
Delphi analysis.
By leaning on e those guidelines, we do not agree that this framework provides a complete
comprehension of IA; however it is the most suitable framework known to us. By building on
it, hopefully the framework will be further elaborated and hence the MERITUM’s guidelines
will evolve.
Meritums’ framework proposes three main classes of intangibles (MERITUM, 2001):
a) Human capital is defined as the knowledge that employees take with them when they
leave the firm. It includes the knowledge, skills, experiences and abilities of people.
Some of this knowledge is unique to the individual, some may be generic. Examples
are innovation capacity, creativity, knowhow and previous experience, teamwork
capacity, employee flexibility, tolerance for ambiguity, motivation, satisfaction,
learning capacity, loyalty, formal training and education.
b) Structural capital is defined as the pool of knowledge that stays with the firm at the
end of the working day. It comprises the organisational routines, procedures, systems,
cultures, databases, etc. Some of them may be legally protected and become
Intellectual Property Rights, legally owned by the firm under separate title. Examples
are organisational flexibility, a documentation service, the existence of a knowledge
centre, the general use of Information Technologies, organisational learning capacity,
etc.
c) Relational capital is defined as all resources linked to the external relationships of the
firm such us customers, suppliers or R&D partners. It comprises that part of Human
and Structural Capital dealing with the company’s relations with stakeholders
(investors, creditors, customers, suppliers, etc.), plus the perceptions that they hold
about the company. Examples of this category are image, customers loyalty, customer
satisfaction, links with suppliers, commercial power, negotiating capacity with
financial entities, environmental activities, etc.
d)
The main classification of IA in the context of MSEE is presented in the figure bellow.
Figure 9: Main classification of intangible assets within the MSEE context
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 34/47
The detailed IA classification can be found in Annex #1 of this document.
Till now a process of developing the ontology has been presented, afterwards the
classification of IA from the MERITUM project was presented, which was taken as a
framework for the ontology of IA in MSEE. The IA main classification for the MSEE has
also been presented. However the developed ontology and its corresponding description is
presented in the deliverable D22.5 (Development of IA in Manufacturing Service) as part of
the technical framework for representation and description of IA by means of USDL.
It has to be emphasized that the ontology of IA specific for MSEE depicts the crucial
interdependencies between IA.
At this point in order to make an ontology or taxonomy computable, the implementation in
a formal language or called also format is needed. The World Wide Web Consortium
(WC3) has published the RDF (resource description framework) Schema and the Web
Ontology Language (OWL) recommendations about those two standards (Hedden, 2010).
The RDF specifications can be found on W3 (W3C, 2012b), as also OWL (W3C, 2012a).
2.4 File format definition
RDF is a general method to decompose any type of knowledge into small pieces, with some
rules about the semantics, or meaning, of those pieces. The point is to have a method so
simple that it can express any fact, and yet so structured that computer applications can do
useful things with it (Tauberer, 2012).
Its interesting feature is that it facilitates data merging even if the underlying schemas differ,
and it specifically supports the evolution of schemas over time without requiring all the data
consumers to be changed. The design of RDF is intended to meet the following goals (W3C,
2012b):
having a simple data model
having formal semantics and provable inference
using an extensible URI-based vocabulary
using an XML-based syntax
supporting use of XML schema datatypes
allowing anyone to make statements about any resource
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 35/47
The underlying structure of any expression in RDF is a collection of triples, each consisting
of a subject, a predicate and an object. A set of such triples is called an RDF graph. This can
be illustrated by a node and directed-arc diagram, in which each triple is represented as a
node-arc-node link (hence the term "graph") (W3C, 2012b).
Figure 10: Representation of a statement
Source: (W3C, 2012b)
Each triple represents a statement of a relationship between a subject and an object, the
entities denoted by the nodes that are linked.
The direction of the arc is significant: it always points toward the object.
They are many reasons why use RDF, among others are (Tauberer, 2012):
integration data from different sources without custom programming.
offer data for re-use by other parties
enable decentralization data in a way that no single party "owns" all the data.
data handling (browse, query, match, input, extract, ...),.
It allows decomposing to the desired level IA and afterwards describing each piece
accordingly to MSEE needs. Also it will allow the MSE to apply data management techniques
and also more advanced techniques. This means that RDF is in compliance with the MSEE
requirements regarding IA.
There is also the RDF Schema, which defines “schema vocabulary” that supports definition of
ontologies – gives “extra meaning” to particular RDF predicates and resources. It provides the
framework to describe application-specific classes and properties. This allows resources to be
defined as instances of classes, and subclasses of classes (W3Schools, 2012).
The OWL Web Ontology Language is designed for use by applications that need to process
the content of information instead of just presenting information to humans. OWL facilitates
greater machine interpretability of Web content than that supported by XML, RDF, and RDF
Schema (RDF-S) by providing additional vocabulary along with a formal semantics (W3C,
2012a).
The W3C layer cake includes:
XML provides a surface syntax for structured documents, but imposes no semantic
constraints on the meaning of these documents.
XML Schema is a language for restricting the structure of XML documents and also
extends XML with datatypes.
RDF is a datamodel for objects ("resources") and relations between them, provides a
simple semantics for this datamodel, and these datamodels can be represented in a
XML syntax.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 36/47
RDF Schema is a vocabulary for describing properties and classes of RDF resources,
with a semantics for generalization-hierarchies of such properties and classes.
OWL adds more vocabulary for describing properties and classes: among others,
relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality,
richer typing of properties, characteristics of properties (e.g. symmetry), and
enumerated classes.
It can be concluded that if one wants to perform more than basic semantics RDF and
OWL must be used.
Although in the next iteration OWL 2 could be taken into account. It has a very similar
overall structure to OWL 1 and adds new functionalities with respect to OWL 1. Some of the
new features are syntactic sugar (e.g., disjoint union of classes) while others offer new
expressivity, including (W3C, 2009):
keys;
property chains;
richer datatypes, data ranges;
qualified cardinality restrictions;
asymmetric, reflexive, and disjoint properties; and
enhanced annotation capabilities
2.4.1 Analysis of taxonomy management tools
A shorter taxonomy software analysis, aligned with project partners, is also planned in order
to identify the most suitable one. Regarding the situation, Protégé is most promising one for
now.
Table 3: Comparison of taxonomy management softwares
Ontology editor PROTÉGÉ ONTO STUDIO NEOLOGISM
Yes (++) Yes (++) No (--)
Programming language Java (+) mostly in Java (-) PHP (Web-based)
(+)
Support for RDF Yes (++) Yes (++) Yes (++)
Support for OWL Yes (++) Yes (++) just a part (-) Rapid prototyping and application development + + -
User-friendly ++ - -
Free Yes (++) No (--) Yes (++)
Open source Yes (++) No (--) Yes (++)
Label: We assigned for every point and for every SW a qualitative judgment (++/+/-/--), based on the coherence between the characteristics of the SW and the objectives of our work within MSEE project.
In the next document iteration (M24), one new software will be taken under consideration and
analysis, namely the NeOn Toolkit (NeOn Project, 2012). It is a state-of-the-art, open source
multi-platform ontology engineering environment, which provides comprehensive support for
the ontology engineering life-cycle. It was developed as part of the NeOn Project.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 37/47
Conclusion and further steps
All relevant documents in terms of procedures, techniques, models and requirements have
been defined in order to alleviate the often complex process of virtualization into a repository.
All have been made specific for the IA context in MSEE, thus trying to diminish the gap from
the IA standpoint of view between the needed and available resources (knowledge, techniques
…) of the MSE and MEs in creating an effective MSE.
Leaning extensively on the GRAI model a clear and structured definition and decomposition
of objectives have been made possible to ta ME. From the decomposed objectives (using all
three levels – strategic, tactical and operational), using the MDSEA architecture the
identification of their project relevant IA, that will be virtualized, has been made possible.
Still leaning on the same architecture the ME can now identify also what exactly is needed (in
terms of IA) from the MSE to be able to accomplish a project. This means that it enables also
the identification and definition of requirements from the MEs, which leads also to the
assumption that the developed virtualization process for implementation of IA, could be also
used for virtualization requirements coming from the market to the MSE. However this issue
is not dealt with in this document. The sources of IA can now be identified by the ME, as it
has clearly defined and decomposed IA relevant to the project that has to be virtualized.
After having identified exactly which IA to virtualize and identify its location, the
virtualization process can begin. Although it is IA specific for the MSEE context, it has been
inserted into a well known and stable framework, called the ETL. Taking into consideration
that one the main risk is low data quality (expressed in different ways), the cooperation of the
MSE during the process has been inserted, in order to reach higher level of quality. Of course
the virtualization process cannot be effectuated without techniques that will enable the
identification of sources of IA, the extraction, cleaning and integration into the taxonomy.
Therefore the main techniques have been presented.
At this point formats in which the process will be handled is still unknown. In order to ensure
as much detailed information for the development of the technical IAMS framework in
D22.5, formats were divided into two main categories. The first being content format and the
second being file format. The former one, the ontology development process for IA in the
MSEE specific context has been presented. However the result is presented in D22.5 as an
essential part of the technical IAMS framework. As for the file content, RDF and OWL were
presented and proposed for further use.
All the necessary documents have been established in order to enable IA as a capability of
service to flow effectively between the ME and MSE and also within the MSE. Subsequently
the virtualized IA provides the so called capabilities of IA as a service.
In D22.5 the established specific ontology and taxonomy will be presented as formalism for
effective content management; also realistic data will be used and inserted into a real case
from one of our project partner Bivolino. Of course as the prototype based on USDL will be
established in D22.5, the results obtained in D22.3 will form an important building block –
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 38/47
representing the documents and specification for an implementation of IA as a service. All
information related to the Bivolino use-case can be found in D22.5. It has to be emphasized
that the real world case example will be aligned with WP23, which is dealing with tangible
assets.
The next steps go into deepening the proposed techniques and procedures, refining them and
aligning them according to new information obtained from the prototype development and
other project partners. Optimisation of the proposed procedure has to be addressed as also
challenges related to demanding requirements such as IPR management. However regardless
of the next steps, data quality and consistency has to be always kept in mind.
If one takes a look at the results of D22.3 and D22.5 from a wider angle, it can be seen that
they will afterwards enable the systematic monitoring of performance and risk in an
enterprise’s value creation system and thus constituting an important part of the management
system.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 39/47
SOURCES
1. Abidi, S. (2001). Knowledge management in healthcare: towards ‘knowledge-driven’
decision-support services. International Journal of Medical Informatics, 63, 5-18.
2. Bezdek, J. C., Hathaway, R. J., Sabin, M. J., & Tucker, W. T. (1987).
CONVERGENCE THEORY FOR FUZZY C-MEANS: COUNTEREXAMPLES
AND REPAIRS. IEEE Transactions on Systems, Man and Cybernetics, 873-877.
3. Bohlouli, M., Holland, A., & Fathi, M. (2011, 15-17 May 2011). Knowledge
integration of collaborative product design using cloud computing infrastructure.
Paper presented at the Electro/Information Technology (EIT), 2011 IEEE International
Conference on.
4. Bojko, B. (2004). Content Management Bible: Wiley.
5. Brewster, C., Ciravegna, F., & Wilks, Y. (2010). Knowledge Management: Position
Paper
6. Cannataro, M., Talia, D., & Trunfio, P. (2002). Distributed data mining on the grid.
Future Generation Computer Systems, 18, 1101–1112.
7. Cheng-Ru, L., & Ming-Syan, C. (2005). Combining partitional and hierarchical
algorithms for robust and efficient data clustering with cohesion self-merging.
Knowledge and Data Engineering, IEEE Transactions on, 17(2), 145-159. doi:
10.1109/tkde.2005.21
8. Dempster, A., & Laird, N. M. (1977). Maximum Likelihood from Incomplete Data via
the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological),
39(1).
9. Doumeingts, G., Lieu, C., Chen, D., Ducq, Y., T.;, A., G.;, Z., . . . Silva, E. (2012).
D11.1 - Service concepts, models and method: Model Driven Service Engineering:
MSEE.
10. Dubes, R. C. (1987). How many clusters are best? - An experiment. Pattern
Recognition, 20(6), 645-663. doi: 10.1016/0031-3203(87)90034-3
11. Fayyad, U., Piatetsky Shapiro, G., & Smyth, P. (1996). From data mining to
knowledge discovery: An overview. ACM KDD, 39(11), 27-34.
12. Fensel, D. (2003). A Silver Bullet for Knowledge Management and Electronic
Commerce: Springer.
13. Fernandez, M., Gomez-Perez, A., & Juristo, N. (1997). Methontology: From
Ontological Art Towards Ontological Engineering AAAI Technical report (pp. 33-40):
AAAI.
14. Frischmuth, P., Klímek, J., Auer, S., Tramp, S., Unbehauen, J., Holzweißig, K., &
Marquardt, C. M. (2012). Linked Data in Enterprise Information Integration. Semantic
Web journal.
15. Gonzalez, R. (2005). A Semantic Web approach to digital rights management. od
Manuel a.
16. Gruber, T. (1993). Toward Principles for the Design of Ontologies Used for
Knowledge Sharing (pp. 1-22). Knowledge Systems Laboratory: Stanford University.
17. Hand, D., Mannila, H., & Smith, P. (2001). Principles of Data Mining: MIT Press.
18. Heath, T., Hepp, M., & Bizer, C. (Producer). (2009, 2012). Linked Data - The Story
So Far. International Journal on Semantic Web and Information Systems. Retrieved
from http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf
19. Hedden, H. (2010). The Accidental Taxonomist, what are taxonmies. 464.
20. Hertz, J. A., Krogh, A. S., & Palmer, R. G. (1991). Introduction To The Theory Of
Neural Computation: Westview Press.
21. Hirsch, M. (2012). D23.3 - OMSE Management Framework for Tangible Assets (Vol.
D23.3): MSEE.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 40/47
22. INTEROP_NoE. (2005). Deliverable DTG3. http://www.interop-noe.org/.
23. Khaled, M. H. (2004). Efficient Phrase-Based Document Indexing for Web Document
Clustering. IEEE Transactions on Knowledge and Data Engineering, 16, 1279-1296.
24. Khan, D. (2008, 9-12 Dec. 2008). CAKE - Classifying, Associating and Knowledge
DiscovEry - An Approach for Distributed Data Mining (DDM) Using PArallel Data
Mining Agents (PADMAs). Paper presented at the Web Intelligence and Intelligent
Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on.
25. King, B. (1967). Step-Wise Clustering Procedures. Journal of the Ameriacn Statistical
Association, 62(317), 86-101.
26. Lambe , P. (2007). Taxonomies, Knowledge and Organisation Effectiveness (Chandos.
27. Lida, X., Li, Z., Zhongzhi, S., Qing, H., & Maoguang, W. (2007, 7-10 Oct. 2007).
Research on Business Intelligence in enterprise computing environment. Paper
presented at the IEEE International Conference on Systems, Man and Cybernetics,
2007. ISIC.
28. Lin, F., & McClean, S. (2001). A data mining approach to the prediction of corporate
failure. Knowledge-Based Systems, 14, 189-195.
29. Liu, B., Cao, S. G., & He, W. (2011). Distributed data mining for e-business.
Information Technology and Management, 12(2), 67-79. doi: 10.1007/s10799-011-
0091-8
30. Lu, S.-Y., & Fu, K. S. (1978). A Sentence-to-Sentence Clustering Procedure for
Pattern Analysis. Systems, Man and Cybernetics, IEEE Transactions on, 8(5), 381-
389. doi: 10.1109/tsmc.1978.4309979
31. Maedche, A., & Staab, S. (2001). Ontology Learning for the Semantic Web. IEEE
Intelligent Systems, 16(2), 72-79.
32. MERITUM. (2001). MEasuRing Intangibles To Understand and improve innovation
Management. In T. program (Ed.).
33. Nemati, H. R., Steiger, D. M., Iyer, L. S., & Herschel, R. T. (2002). Knowledge
warehouse: an architectural integration of knowledge management, decision support,
artificial intelligence and data warehousing. Decision Support Systems, 33, 143–161.
34. NeOn Project. (2012). NeOn Toolkit, 2012, from http://neon-toolkit.org/
35. Overby, E., Slaughter, S. A., & Konsynski, B. (2010). Research Commentary -The
Design, Use, and Consequences of Virtual Processes. Information Systems Research,
21(4), 700-710. doi: 10.1287/isre.1100.0319
36. Pincher, M. (2010). A guide to developing taxonomies for effective data management,
2012, from www.computerweekly.com/Articles/2010/04/06/240539/a-guide-to-
developing-taxonomies-for-effective-data.htm
37. Rockley, A. (2002). Managing Enterprise Content: A Unified Content Strategy: New
Riders.
38. Shaw, M., Subramaniam, C., Tan, G. W., & Welge, M. (2001). Knowledge
management and data mining for marketing. Decision Support Systems, 31, 127–137.
39. Srivastava, J., Cooley, R., Deshpande, M., & Tan, P.-N. (2000). Web usage mining:
discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl.,
1(2), 12-23. doi: 10.1145/846183.846188
40. Sure, Y., & Studer, R. (2002). On-To-Knowledge Methodology - Final version.
41. Tauberer, J. (2012). Rdf:about. Retrieved 2012-09-01 http://rdfabout.com/
42. van Geenhuizen, M., & Nijkamp, P. (2012). Knowledge virtualization and local
connectedness among young globalized high-tech companies. Technological
Forecasting and Social Change, 79(7), 1179-1191. doi:
10.1016/j.techfore.2012.01.010
43. Vassiliadis, P. (2009). A Survey of Extract–Transform–Load Technology.
International Journal of Data Warehousing & Mining, 5(3), 1-27.
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 41/47
44. Vassiliadis, P., & Alkis, S. (2007). EXTRACTION, TRANSFORMATION, AND
LOADING. Retrieved from University of Ioannina website:
http://www.cs.uoi.gr/~pvassil/downloads/ETL/SHORT_DESCR/08SpringerEncyclop
edia_draft.pdf
45. W3C. (2009). OWL 2 Web Ontology Language Document Overview: W3C.
46. W3C. (2012a). W3C Recommendation - OWL Web Ontology Language. W3C
Recommendation, 2012, from http://www.w3.org/TR/owl-features/
47. W3C. (2012b). W3C Recommendation - Resource Description Framework (RDF):
Concepts and Abstract Syntax, 2012, from http://www.w3.org/TR/2004/REC-rdf-
concepts-20040210/
48. W3Schools. (2012). RDF Schema (RDFS), 2012, from
http://www.w3schools.com/rdf/rdf_schema.asp
49. Wache, W., Vogele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., &
H¨ubner, S. (2001). Ontology-Based Integration of Information - A Survey of Existing
Approaches. Paper presented at the IJCAI’01 Workshop. on Ontologies and
Information Sharing.
50. Wei, Z., Kang, M., & Zhou, W. (2008). A Semantic Web-Based Enterprise
Information Integration Platform for Mobile Commerce. Paper presented at the
International Conference on Management of e-Commerce and e-Government.
Annexes
Annex 1 – IA classification in the MSEE context
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 42/47
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 43/47
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 44/47
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 45/47
Project ID 284860 MSEE – Manufacturing SErvices Ecosystem
Date: 12/10/2012 Deliverable D22.3 – M12
MSEE Consortium Dissemination: PU 46/47