d22.3 development of methods for virtualization of … id 284860 msee – manufacturing services...

47
Project ID 284860 MSEE Manufacturing SErvices Ecosystem Date: 12/10/2012 Deliverable D22.3 M12 D22.3 Development of methods for virtualization of MSE intangibles M12 Document Owner: Christian Zanetti, David Opresnik (POLIMI) Contributors: Hadrien Boyé (HARDIS), Manuel Hisrch (DITF), Davide Storelli (ENG) Dissemination: Public Contributing to: WP 22.3 Date: 12/10/2012 Revision: V1.0

Upload: dinhanh

Post on 17-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

D22.3

Development of methods for virtualization of

MSE intangibles

M12

Document Owner: Christian Zanetti, David Opresnik (POLIMI)

Contributors: Hadrien Boyé (HARDIS), Manuel Hisrch (DITF), Davide Storelli (ENG)

Dissemination: Public

Contributing to: WP 22.3

Date: 12/10/2012

Revision: V1.0

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 2/47

VERSION HISTORY

DATE NOTES AND COMMENTS

01 27/07/2012 CHRISTIAN ZANETTI (POLIMI) – INITIAL DELIVERABLE DEFINITION – EXTENDED

TABLE OF CONTENT 02 10/08/2012 TABLE OF CONTENT UPDATE 03 23/08/2012 REVIEW OF T.O.C. BY GUY DOUMEINGTS 04 27/08/2012 HADRIEN BOYE (HARDIS) – CONTRIBUTION - PART OF THE VIRTUALIZATION

PROCESS (EII AND ETL) AND ALL VIRTUALIZATION TECHNIQUES 05 10/09/2012 DRAFT OF DELIVERABLE FOR REVIEW TO COORDINATORS

06 24/09/2012 DELIVERABLE FOR PEER REVIEW

07 08/10/2012 RECEIVED FROM REVIEW

08 12/12/2012 FINAL VERSION

DELIVERABLE PEER REVIEW SUMMARY

ID Comments Addressed ()

Answered (A)

1 Describe shortly what virtualization of intangible assets is. Give references is needed.

2 Mention explicitly which one. A

3

Add references. References should be added

everytime you mention a new technology,

language, etc.

4 Add link to project webpage

5

For each entry in this table explain briefly how the framework (or parts of the framework) you mention is beging used (same as is currently done for the first row entry).

6 Please rephrase. ”The identification of ... is being indetified ...” sounds strange.

7

To make it much clear I would rephrase this step as: Indetification of all resources inside the ME which are relevant to the project. I also expect step a. to be performed not so often as step b.

8 Add a pointer to Section 2.2 where these techniques and how they are used during the virtualization process are described.

9

Please make a reference to the WP and deliverable(s) that are providing solutions for modeling and virtual representation by means of USDL.

10 The English for this part of the document is a slopy. Please check the language.

11 Rephrase

15

Are Rules used exclusively for security and trust. I can imagine many other processes and tasks where Rules can be used to represent and manage knowledge about IA in a ME/MSE.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 3/47

16 Please rephrase. I would say: ”The gap between

the planned and actual data usability is filled”

17

There has been a considerable effort in the Semantic Web and recently Linked Open Data research areas to provide solutions for EII. The deliverable should mention these and give a short overview.

18

Is it only a matter of different DB systems used? I think is more than that. I’m thinking here about different data scehmas (that are conceptually different) , different representations, etc.

19 For ontologies there many representation languages,OWL being a standard. I’m not sure what OL stands for in this picture?

20

The most common used definition of an ontology is the one from Gruber. An ontology is a ”formal, explicit specification of a shared conceptualization”. Gruber, “Toward principles for the design of ontologies used or knowledge sharing?” , Int. J. Hum.-Comput. Stud., vol. 43, no. 5-6,1995

21 Something is missing here. ”Knowledge aquisition for ..”

22

This section starts directly with a description of Meritum project. You should first say that the conceptual model for a ME developed in the Meritum project is adopted, and then start the describing Meritum.

23

RDFS and OWL are W3C standards and not recommendations. Please add references fo RDF, RDF-S and OLW. Add pointers to W3C specification of these standards.

24 It is worth adding a short paragraph on RDF Schema (RDFS)

25

There is a new version of OWL, i.e. OWL2, that makes a much clear and cleaner separation of its varaints, allowing different profiles (i.e. OWL2RL, OWL2EL and OWL2QL)

26

Why not considering other tools such as Neon Toolkit. It was built as part of the Neon FP7 project and offers the same functionality and has the same friendlyness as Protege. European Commision will most likely appreciate that we are using results/outcomes of other EU projects.

27

It would have been nice to have a parallel/comparision of the virtualization methods for intangible assets presented in this deliverable with the methods for tangible assets from WP23 if they are available.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 4/47

TABLE OF CONTENTS

EXECUTIVE SUMMARY _______________________________________________________________________ 5

INTRODUCTION _____________________________________________________________________________ 7

1 IDENTIFICATION OF TYPES AND SOURCES OF INTANGIBLE ASSETS RELEVANT TO THE ENTERPRISE ____ 8

1.1 CONCEPTS DEFINITION ____________________________________________________________________ 9 1.1.1 GRAI model ____________________________________________________________________ 9 1.1.2 Model Driven Service Engineering Architecture _______________________________________ 10

1.2 PROCEDURE FOR IDENTIFICATION OF TYPES AND SOURCES OF INTANGIBLE ASSETS ___________________________ 11

2 IMPLEMENTATION METHOD OF INTANGIBLE ASSETS ________________________________________ 13

2.1 VIRTUALIZATION PROCESS ________________________________________________________________ 14 2.1.1 Technical framework for the virtualization process of intangible assets ___________________ 21 2.1.2 Data quality management and data virtualization ____________________________________ 26 2.1.3 Linkage of the presented technical concepts with intangible assets virtualization process _____ 27

2.2 TECHNIQUES FOR THE VIRTUALIZATION PROCESS _________________________________________________ 27 Data mining __________________________________________________________________________ 28 2.2.1 Classification __________________________________________________________________ 29 2.2.2 Clustering _____________________________________________________________________ 29

2.3 FORMALISM - ONTOLOGY AND TAXONOMY ____________________________________________________ 30 2.3.1 Ontology development for intangible assets _________________________________________ 31

2.4 FILE FORMAT DEFINITION _________________________________________________________________ 34 2.4.1 Analysis of taxonomy management tools ___________________________________________ 36

CONCLUSION AND FURTHER STEPS ____________________________________________________________ 37

SOURCES _________________________________________________________________________________ 39

ANNEXES _________________________________________________________________________________ 41

ANNEX 1 – IA CLASSIFICATION IN THE MSEE CONTEXT __________________________________________________ 41

Figures:

Figure 1: D22.3 document map and relations with D22.5 ......................................................... 8 Figure 2: GRAI model ................................................................................................................ 9

Figure 3: Towards a Model Driven Service Engineering Architecture .................................... 11 Figure 4: The relation of the partakers (ME, VE, MSE) with the GRAI model and MDSEA

framework ................................................................................................................................. 11

Figure 5: Virtualization process of intangible assets in the MSEE context ............................. 16 Figure 6: Virtualization procedure of intangible assets ............................................................ 18

Figure 7: Linkage of ETL and virtualization procedure of intangible assets ........................... 27

Figure 8: Relation between taxonomy and ontology ................................................................ 30

Figure 9: Main classification of intangible assets within the MSEE context ........................... 33 Figure 10: Representation of a statement ................................................................................. 35

Tables:

Table 1: ETL Phases ................................................................................................................. 24

Table 2: Ontology development process .................................................................................. 32 Table 3: Comparison of taxonomy management softwares ..................................................... 36

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 5/47

Executive summary

In order to understand the significance of the current deliverable D22.3, it has to be put into

context of its WP, which aim is to support the Manufacturing Service Ecosystem (hereinafter:

MSE) in managing Intangible Assets (hereinafter: IA), which ranges from their basic handling

to advanced analytical BI techniques. This deliverable provides all necessary documents to

develop and establish a technical framework that will enable the description and

representation of IA that could be offered as a service. Therefore this document offers to

D22.5 all necessary knowledge for the implementation of IA into such technical framework

that will lie in the MSE.

IA virtualization at an abstract level makes physical activities, relations and other assets more

virtual, being exemplified by ICT related activities (van Geenhuizen & Nijkamp, 2012). Such

process is per se not a novelty nor in science nor in practice. However making it successful,

effective and foremost holistic requires the development or amendment of specific

procedures, architectures and techniques. Therefore the value of this deliverable is at least

two folded: developing and proposing the virtualization procedure of IA into the MSE

repository and secondly proposing the usage and development of different processes,

approaches and techniques that are encompassed in the virtualization procedure.

At the beginning of the virtualization procedure, first the identification of relevant sources

and types of IA in Manufacturing Enterprises (hereinafter: ME) has been developed. A ME

has to structurally define its goals, for which the GRAI model is used. Afterwards by using

the MDSEA architecture, the ME can identify its IA that are project relevant and identify

what IA are additionally needed in order to execute a project. Those needs can be treated as

requirements, which could mean that the virtualization process could be possibly used to

virtualize also requirements coming from the market.

Afterwards a 6 steps virtualization process specific for IA in MSEE was developed.

Emphasis on data quality was put, as in IA management it can be one of Achille's heel. It was

then framed into the Extraction Transform Load (hereinafter: ETL) process in order to

provide D22.5 more information about the needed techniques needed to effectuate the process

and emphasize the potential risks during the virtualization process. ETL also gives a more

firm structure and adds validity to the developed virtualization process, as ETL is a well

known and established process.

Then techniques for handling the virtualization process were addressed. Therefore the

development of the ontology for IA in MSEE is presented, which is another novelty. A

development procedure for ontologies was followed and also well established guidelines

regarding intangible resources were followed during taxonomy development. The complete

ontology with detailed description is presented in D22.5 as part of the technical IA

framework. The ontology is presented as a content format; however it differs greatly from

the needed file format for handling the procedure, which was also presented; proposing the

use of RDF and OWL.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 6/47

To conclude, the true value that this deliverable co-creates, lies in two different matters. The

former one is that by virtualizing IA from a ME level to the MSE level, a foundation for

transformation from roles of individuals (e.g. a welder) to competencies (e.g. welding

competency) is being established. Meaning that the role of an individual will not be at the

core of services anymore, but competences will be. The latter essential matter that this

deliverable contributes to, although not so conspicuously, is helping ME and the MSE in

transferring skills and competences from an individual level to an enterprise and even to a

MSE level, while still being able to preserve IPR rights. Namely the ability to transfer them to

higher level is crucial in order that the MSE could operate successfully.

The use-case Bivolino is presented in D22.5, thus this deliverable merely provides necessary

information for the implementation of IA that could be offered as a service.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 7/47

Introduction

In D22.1, architectures, models and techniques that will contribute greatly to an efficient

virtualization have been identified, like Model Driven Service Engineering Architecture

(hereinafter: MDSEA) (INTEROP_NoE, 2005), Unified Service Description Language

(hereinafter: USDL) (Rockley, 2002) etc. The key concepts that affect the process have also

been identified and analyzed, such as intangible assets (hereinafter: IA), knowledge

management, multi level human resources management, organizational change theories etc.

The objective of this subproject is to develop all relevant documents, specifications and

guidelines required for the implementation of IA into the Manufacturing Service Ecosystem

(hereinafter: MSE) allowing their quick and efficient exchange. There are two essential steps.

In the first one a procedure and techniques are developed and proposed in order to help the

ME identify exactly what are their business goals, what are their relevant resources and needs

in terms of IA and how to identify them. The subsequent step is explicitly referring to

defining a virtualization procedure that is specific for IA in the MSEE environment, which is

constituted from the virtualization process, techniques (e.g. data mining), formalisms

(ontology and taxonomy), file formats (RDF, OWL).

Document roadmap

The first section proposes a procedure to identify types and sources of IA that are relevant to

a ME and that will be virtualized. Besides it also enables the ME to identify exactly its needs

from the MSE in order to execute a project. However before those two steps can be done, the

procedure proposes a structured way for a ME to identify and decompose its business

objective (on all three levels).

The second main section deals with the implementation method of IA into the MSE

repository. First the virtualization process adapted to IA needs is developed and explained.

Afterwards it is inserted into a well known framework, the ETL. Now that the process of

virtualization is known, the techniques required are presented. The following subsection

presents the content formats in which the data about IA will be represented, which are the

ontology and taxonomy. The emphasis is put on their development process that has been

used for IA. Also the main classification of IA specific for MSEE is presented. However the

entire developed ontology will be presented and explained in D22.5 as part of the technical

framework for IA. The document’s last section is related to specific file format, such as OWL

and RDF.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 8/47

Figure 1: D22.3 document map and relations with D22.5

Linkage between D22.3 and D22.5

The D22.3 creates a method for virtualization of IA expressed as all documents necessary for

the implementation of IA into the MSE repository. As for the D22.5 it uses the documents for

the implementation of IA from D22.3 and it then defines requirements to be fulfilled (hence

forming the conceptual IAMS framework) in the technical IAMS framework, which results as

a MSEE specific taxonomy of IA represented by means of USDL.

1 Identification of types and sources of intangible assets relevant to the enterprise

Criteria that have to be fulfilled on an enterprise level in order to enable effective

identification of relevant sources and types of IA are case specific. This is because ME have

different needs at different stages of their business lifecycle. However, common frameworks

guiding the enterprise’s decision are proposed, those being based on the GRAI model and

MDSEA (uses three level of abstraction BSM/TIM/TSM). The GRAI model is used to define,

describe and communicate the goals of the ME and the VE at all three levels (strategic,

tactical and operational). Afterwards the MDSEA is used to ease and improve the

identification and structured representation of their:

a) available and relevant IA (that will be mapped into the MSE’s repository) and

b) requirements for additional specific IA (that will be sent to MSE).

2. Virtualization process &

Techniques

&

Formalism

&

Formats

All documents needed for

representing and

describing

IA (D22.3)

1. Identification of types and

sources of IA

in a ME

Essential guidelines (hanlding, management of IA)

+

Advanced guidelines (BI techniques

for value creation from IA)

=

IA Manufacturing

CONCEPTUAL Framework

IA

Manufacturing Service

CONCEPTUAL Framework

IA Manufacturing Service

TECHNICAL Framework

= IA represented

(by IA specific taxonomy)

and described by means

of USDL as a service on a use-

case (D22.5)

+

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 9/47

1.1 Concepts definition

This section defines two concepts, being the GRAI model and the MDSEA architecture.

1.1.1 GRAI model

GRAI conceptual Model has been developed by University Bordeaux 1 – GRAI Laboratory

and then LAPS and IMS Laboratory. It has the following objective to represent with the same

concepts, the global and the local models of a manufacturing system in an enterprise. In fact,

the GRAI Model defines the various concepts that will be represented in the GRAI graphical

formalisms. The interest of a conceptual model is to relate the various concepts in order to

show their coherence, to avoid redundancies and to have a complete modelling.

The GRAI model is very rich and holistic, thus it is based on several theories: System Theory,

Hierarchical System Theory, Organisation Theory, Discrete event systems, Production

Management Concepts.

This document finds great value in the GRAI model as one of its main characteristics is that it

has the possibility to decompose the so called control system (the Management system), as it

is very complex. This axe controls the second one in the GRAI model – the physical system.

(Business system).

Figure 2: GRAI model

AG

GR

EG

AT

ION

Products Flow

R : Resources

R R

R R R

Synchronization

P

Synchronization

Coordination

TACTICAL

OPERATIONAL

STRATEGIC

PERIOD = 1 y. HORIZON = 5 y.

H=1y.P=1m.

P=1d. H=2w.

P=1w. H=2m.

To

manage

sales

To

manage

design

To

manage

engineer

.

To

manage

manufact.

To

manage

assembli.

To

manage

delivery

Market

Dec

om

po

siti

on

/

Aggre

gat

ion

of

info

rmat

ion

C O

O R

D I N

A T

I O N

CO

HE

RE

NC

E

Process

View

Process

View

Process

View

C

O

N

T

R

O

L

S

Y

S

T

E

M

CONTROLLED

SYSTEM

Source: (Doumeingts et al., 2012)

The control system (vertical axe) is decomposed in three levels – strategic, operational and

tactical. The controlled system is the decomposition of functional activities.

In this document we use the GRAI model to model and decompose the objectives (on all

three levels) of a ME when entering in contact with the MSE (e.g. it is assumed that a ME that

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 10/47

steps in contact with the MSE has a specific business requirement). Such structured

decomposition methodology will greatly contribute to the steps in the procedure of IA

virtualization, including the first steps being the identification of the relevant types and

sources of knowledge, however more in details in latter sections.

1.1.2 Model Driven Service Engineering Architecture

This concept definition also leans completely on the definition from D11.1. The Model

Driven Service Engineering Architecture (MDSEA) which defines three levels of

abstraction, inspired by the MDA/MDI Architecture proposed in INTEROP-NoE project

(INTEROP_NoE, 2005):

a) Business Service Model (BSM) to model the Service System at the Business level.

The models defined at the BSM level focus on the representation of the service (and

of its functionalities) and of the Service System (Enterprise, Virtual Enterprise and

Service Manufacturing Ecosystem) capturing information on its related product,

partner, customer, stakeholder, service KPIs and value, as well as on decision-making,

organization, resource and process.

Technical Independent Model (TIM) delivers the models at a second level of

abstraction independent from the technology used to implement the system. It gives

detailed specifications of the structure and functionality of the service system that do

not propose technological details. More concretely, it focuses on the operation details

while hiding specific details of any particular technology in order to be suitable for

use with several different technologies. The service system will be elaborated with

respect to: IT, Organisation/Human and Physical means.

Technical Specific Model (TSM) level which provides the technical model of the

various domains components and supports their realization. Combines the

specification in the TIM model with details that specify how the system uses a

particular type of technology (such as for example IT applications). At TSM level,

modeling and specifications must provide sufficient details to allow developing or

buying software applications, components, recruiting human operators / managers or

establishing internal training plans, buying and realizing machine devices, for

supporting and delivering services in interaction with customers. For instance for IT

component, a TSM adds to the TIM, technological details and implementation

constructs that are available in a specific implementation platform, including

middleware, operating systems and programming languages (e.g. Java, C++, EJB,

CORBA, XML, Web Services, etc).

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 11/47

Figure 3: Towards a Model Driven Service Engineering Architecture

Organisation

HumanDomain

Physical

meansDomain

IT

Domain

Business Services Models (BSM)

Technology Independent Models (TIM)

Technology Specific Models(TSM)

Services in virtual enterprises(IT Applications, Processes, Products,

Services, Organisation/Human, Physical

Means(machine, robots), etc…)

Generation of “components”

( IT_ Organisation/Human_Physical means

Source: (Doumeingts et al., 2012)

1.2 Procedure for identification of types and sources of intangible assets

When combining the use of the GRAI model and of the MDSEA, the procedure providing the

MSE the necessary information needed to assist in the creation of a VE is presented in the

table hereinafter (from the IA point of view). This procedure is presented in relations to each

partaker.

Figure 4: The relation of the partakers (ME, VE, MSE) with the GRAI model and

MDSEA framework

Partakers STEPS Identification of objectives

and needs Frameworks

Manufacturi

ng

Enterprise

(MEn)

1

Identification and structured

exposition of the enterprise’s

goals (on all 3 levels); on the

two lower levels, those goals

are also the goal of the VE

(Goals VE).

GRAI

All three levels are needed in order

that the enterprise can clearly define

the VE’s goals.

VE 2

I.AVE - identification of all

the resources needed for the

creation of a VE

MDSEAVE

Using the MDSEA a VE can be

modelled, thus identifying exact

needs in regards to IA.

MEn 3

IA En – identification of

available resources relevant

to the specific project:

a) identification of

MDSEAEn

Using the MDSEA, resources of the

ME can be modelled and hence

described in a clear and structure

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 12/47

way.

MSE

(ecosystem) 4

IAMSE - resources yet to be

acquired through MSE,

established based a gap

analysis between the

resources that are available in

the enterprise and the one that

are needed the VE creation.

Comparison of MDSEA from the

enterprise and the VE

MDSEAMSE= MDSEAVE -

MDSEAEn

Because along the process, the same

technique was used, a clear

comparison can be made.

MSE

(ecosystem) 5 Virtualization of IA.

The procedure is described in more details hereafter:

1. First a ME identifies and structures its goals (on all three levels) with the help of the

GRAI model (step 1).

2. Afterwards the identification of the resources needed to carry out a project is

performed (step 2), thus answering the question “what resources (limiting ourselves to

IA) are needed to effectuate the planned services through a VE”.

3. Then resources that are relevant and available inside a ME are being identified (step

3) through two sub steps:

a. Identification of all (available) resources inside the ME (maybe such overview

is already available in the ME). This gives an answer to the question:”What

kind of resources the enterprise holds or can acquire by itself? This step is not

necessary to be executed multiple times (e.g. when an enterprise maps its IA

once, it can only update them, not need to redo the entire process all over

again).

b. Identification of all resources inside the ME, which are relevant to the project.

It answers the question:”What resources that are needed for the project do we

already have?”

c. Gap analysis of resources (between the available inside and the needed one for

a specific project), which specifies exactly what resources (with exact

specifications) are missing and that have to be acquired through/with the help

of the MSE. The result of this gap analysis is the definition of resource

requirements of the ME that can be searched for through the MSE.

4. Finally the resources that have to be acquired through the MSE (or with its help) are

identified (step 4). This is performed based on the gap analysis between the resources

that are available in the enterprise and the one that are needed for the VE creation.

However this step is to be undertaken only in the case that an enterprise does not know

exactly what kind of IA has to be yet acquired through the MSE to attain its business

objective. Hence, the MDSEAMSE describes exactly what resources are yet to be

acquired (IAMSE). Only after this step the needed IA are clearly defined and structured.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 13/47

Afterwards the identification of their location (sources) and their form (types) is

easily established. Because of data and data bases heterogeneity, every ME will have

to do this by itself according to their internal information system.

5. Afterwards that all needed information are known and gathered (using techniques such

as data and text mining), the enterprise sends and integrates (this is the process of

virtualization) into the MSE the following structured information about:

a. its relevant available resources;

b. its requirements about:

i. the specific project that it wants to effectuate (those are market

requirements that are defined as business objectives on all three levels)

and

ii. the needed resources that have to be acquired through the MSE.

This means that every ME that wants to effectuate a project through the MSE will have two

roles:

a) feeding the MSE with its available relevant resources and

b) sending requirements from the market to the MSE (the ME represents partially the

market).

Also critical interdependencies between the sources of IA will be defined. Depending on their

strategy and business objectives, enterprises will be able to insert case specific types and

sources of IA (e.g. knowledge) expressed by MDSEA according to the different abstraction

level (BSM/TSM/TIM). Those identified interdenpencies with MDSEA will be inserted in the

IA ontology.

2 Implementation method of intangible assets

The process of linking an enterprise’s IA into the MSE’s repository has to ensure consistency

and quality of information; hence a clear sequence of steps, called a procedure, for the

virtualization process has to be defined, which will be harmonized as much as possible with

D23.3 Alongside rules has to be specified (e.g. addressing issues such as trust and security).

Formats in which the implementation and execution of the virtualization process will be

effectuated have to be determined as well, based on a set of predefined criteria (e.g.

manageable and sharable). Some issues that will have to be addressed are: dealing with

different formats of data representing IA during extraction, mapping those formats together to

build links, storing those data as a single format (this is the targeted format) etc. And

foremost anticipating the potential risks related to format management, where measures will

have to be defined to minimize those risks (like mismatching, mismanagement).

Next, the activities during the virtualization process will be identified and suitable handling

techniques needed to execute such process will be assigned (e.g. text mining for data

extraction, maybe also to identify interrelations between entities etc). Some other techniques

are data mining, rule base reasoning, classification or clustering, which can be found in

section 2.2.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 14/47

One very important factor regarding the data handling about IA is assuring their consistency

(referring especially to validity and reliability). This is why potential risks that undermine

their quality will be identified, analyzed and measures for risk alleviation will be proposed.

An example of such risk could be errors during data extraction due to inconsistent format of

IA (e.g. CVs in different forms with different attributes).

2.1 Virtualization process

This section is structured as followed. First the virtualization process of IA in the MSEE

context is represented graphically, in order to better understand its role and importance in the

MSE. Afterwards the virtualization process of IA in the MSEE context specific is proposed.

Data quality issues are emphasized and finally the virtualization process for IA within MSEE

is integrated into an existing well established framework – ETL.

The virtualization process, illustrated in the Figure bellow, was briefly described previously.

It purpose is two folded. Firstly to provide clearly as possible the virtualization process of IA

and secondly to position the process in the MSEE context and depicts its relevance for the

entire functioning of the MSE.

Description of the virtualization process on Figure 5:

1. Identification and structuration of goals of the ME (GRAI)

2. Identification of relevant data and sources (MDSEA)

3. Virtualization of (for details look at Figure 6):

a. requirements

b. available resources/assets

The process of virtualization can be repeated multiple times for different enterprises. The

same virtualization process is used to virtualize resources and requirements coming from

the enterprises (the market).

4. The virtualized information is then linked into the USDL repository. If needed, it will

have to be extended to meet the ME’s specific needs.

5. When all the needed resources from the MEs and other enterprises (note that enterprises

that are not manufacturing can also participate) are in the MSE repository, the assets and

the desired service is now described (with the help of USDL), hence forming the so called

capabilities of services. Afterwards business intelligence and statistical techniques can be

applied in order to define the most optimal composition of new services. Such operations

are feasible due the deep level of assets decomposition enabled by the use rdf.

6. Then the optimal scenario is presented to the participating enterprises or to the leading

one.

7. After it has been chosen, the extraction process begins and the optimal innovative service

is composed, which is feasible due to the use of USDL. It interlinks different service

modules that are represented together with one layer, called required service. Each module

is described by 3 layers (BSM/TIM/TSM). The business model, which is mostly

compounded from prices, legal relationships, clearly defined project goals is included,

described and offered in the service. However management of the enterprise is still

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 15/47

required, as the market is dynamic and control is required, this is why a VE cannot exist

without management, regardless of the ideal service representation.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 16/47

Figure 5: Virtualization process of intangible assets in the MSEE context

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 17/47

Virtualization process of IA in the MSEE context

The process represents a case of virtualization for a creating a new project, hence the ME has

a clear aim and goal.

The hereunder presented virtualization process of IA is defined for IA that represent the

enterprise’s available resources (of IA) and that are concurrently being relevant to the

specific project objective. Namely virtualization processes can be used to process other kind

of information, such as: requirements from the market and newly acquired knowledge after

the successful conclusion of a specific project.

The outcome of the virtualization process of IA lead to virtual artefacts (representing IA)

that can be combined with tangible assets in order to build up service (or products) in

manufacturing industries, which – themselves – can be modelled and represented virtually by

means of e.g. the USDL service description language, which application is represented in the

deliverable D22.5 entitled Development of IA in Manufacturing Service (IAMS) framework.

On the next Figure below is represented the entire virtualization procedure that

encompasses not merely virtualization techniques, but takes into account constraints such are

rules and IPR issues. The virtualization procedure can serve as framework for virtualization,

making it wider as the process itself.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 18/47

Figure 6: Virtualization procedure of intangible assets

Description of the virtualization procedure steps:

1. Identification

Firstly goals of the project and also of the virtualization have to be defined. Consequently,

based on a predefined IA framework (using taxonomy), the detection of relevant types

(explicit and then tacit) and sources of knowledge, within and around the enterprise, can be

performed.

In order that such identification process would proceed structurally and not intuitively,

enterprises will lean on a 3 levels model of abstraction (strategic, tactical and operational),-

the MDSEA architecture. After defining the enterprise’s relevant resources (treated as inputs

to the data repository) for the specific project (limited to IA) (for details look at D22.3,

section 2), the detection of types and sources (based on a predefined taxonomy that is context

specific) of knowledge can be effectuated.

One of the main goals, from the MSE point of view namely, is that the virtualization process

has to enable the transfer of individual competences onto higher levels (e. g. on enterprise

and MSE level). Boundaries of the process will not be defined, as it is one of the advantages

of the (open) enterprise and of course of the MSE.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 19/47

2. Analysis

This step commences by codifying the explicit and foremost the tacit knowledge in a unified

manner using semantic techniques such as ontology and taxonomies, as they contribute to

ease the organization, communication and reusability of IA. Only the knowledge identified

based on project goals and resource needs is being codified.

As IA have special characteristics - being “soft” and ubiquitous, their integration into a

system is often seamless, however as their value can quickly increase, so can it also decrease

and if not treated appropriately, losing its value and becoming useless. In order to mitigate

such risk of potential value degradation, data consistency and reliability and validity has to be

ensured at different stages of the virtualization process. Therefore the identification and

analysis of the relevant Input-Output relations, interconnectivity and foremost the

interdependency between individual objects (assets) is being performed. Among others the

goal is to identify specific IA that could not perform efficiently without another type; this way

the issue of potential inconsistency is being addressed.

In the current step enterprises that provide the knowledge are included in the process to

increase its quality. Also these way enterprises are more empowered during this process. It

gives an additional opportunity to discover anomalies during the virtualization process. If

anomalies are discovered, the process of additional part of knowledge can be codified.

Another risk beside inconsistency is that all relevant knowledge cannot be found, although it

is present in the enterprise. Tacit knowledge can only be integrated into a system when it is

found (Bohlouli, Holland, & Fathi, 2011). This is why after the codification of explicit

knowledge experts codify tacit knowledge (based on available sources in the enterprise) on an

individual level. Afterwards each employee is given an opportunity to complement their

knowledge and competency “list” with additional tacit knowledge that in most cases is not

known to the enterprise; such knowledge could be for instance important business

connections in specific industries or competencies that are not directly linked with the

employee’s workplace. Such quality loop gives the opportunity to increase the knowledge

database consistency and validity. However if the process of virtualization requires the

cooperation of employees, the organizational change process has to be performed with great

care and in maximum consent with the employees (this issue is addressed in WP24).

Employees are also given an opportunity to provide and assessment of their IA:

a) the level of experience and/or depth of expertise with the stated competencies or

skills;

b) the relevance (usability) of the specific knowledge (expressed in percents) to the

defined project goal;

c) potential geographical limitations.

The initiation of such assessment tool would affect the following:

a) The enterprise’s knowledge expert will be able to complement the assessment metrics

with the level of usability and quality of specific knowledge on the project after its

execution. Therefore information about IA already integrated into the MSE will

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 20/47

become more reliable and hence more reusable, which is also one of the MSE’s

objectives. The second issue

b) If such kinds of competencies would be available, assessment metrics could serve as

orientations for a final decision of which one to use.

c) It would allow tackling the issue of most virtualization process where people are

involved – behavioural peculiarities – for instance the deception when describing

yourself and your competencies (this is common especially in online dating processes)

(Overby, Slaughter, & Konsynski, 2010). Such deceptions form the part of employees

could potentially also occur.

In the last step of this stage, taxonomies from different sources and/or enterprises are being

aggregated.

3. Rules

In order to fortify trust and security, which are important barriers in such forms of

organizations regarding IA management, rules (including constraints) could potentially be

defined on two levels – on a set of IA level (e.g. set of knowledge about drilling) and also on

an attribute level. The process would hence merely offer a frame into which detailed rules at

different levels would be applied. Those rules would be applied at the demand of two main

sources: a) the collaborating enterprises or of b) the MSE's due to its potentially predefined

rules at its level, which are part of its managerial policy. There are also process rules that are

related to process management and relations between entities. Of course the collaborating

enterprises check and validate the defined project rules, this way additionally mitigating risk

for errors that would possibly undermine the reliability of the MSE.

4. Support procedures

Procedures to support the data management and foremost the execution of defined rules in the

previous process step are being identified, selected and applied. With this step the answer

»how« is mainly answered (e.g. how a rule of data access will be enforced).

5. Populating with data

Firstly units and values are being extracted from the enterprise’s data base and inserted into

the semantic model. Afterwards objects and meta-models are being created (comprising also

the previously defined and aggregated rules), which can be considered as a schema for

semantics that will be further managed (e.g. exported and/or stored).

6. Quality assurance

After the virtualization process IA analyzing techniques will be applied with the goal to

discover new value for the collaborating enterprises. However if the virtualization process

will be of low quality, the later applied IA analyzing techniques will be inefficient, hence

creating no new added value for enterprises. This is why another data and rules consistency

check (referring especially to validity and reliability) with the collaboration of the enterprises

is performed. The check for gap between the planned and actual data usability is performed. If

needed the process is optimized.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 21/47

Till now the procedure for IA discovery in terms of sources and types were proposed as well

as the IA virtualization procedure. In the next sub section the technical framework of the

virtualization process will be presented and the virtualization process of IA inserted into the

frame.

2.1.1 Technical framework for the virtualization process of intangible assets

In this section the main IT techniques that need to be used and/or applied when executing the

IA virtualization process are presented. First the architectural approach is presented, being the

based on Enterprise Information Integration. Afterwards the data virtualization is explained

from the technical point of view. Then the technical framework for the IA virtualization is

presented, being the Extract, Transform, Load (hereinafter: ETL) technique. The

virtualization process of IA is then directly linked with ETL. Matters such as data

warehousing are also addressed. At the end of this section possible data quality issues are

identified and measures are being proposed to alleviate them.

Enterprise Information Integration and data virtualization

Enterprise Information Integration (hereinafter: EII) is an architectural approach allowing an

enterprise, or a set of enterprise to have a unified view of certain data of the organization. As

an application of EII, Data Virtualization aims to provide a unified view for accessing all the

relevant data of an organization through a single set of structure and naming convention to

represent this data. From an IT perspective, Data Virtualization performs data integration

using data abstraction techniques upon large sets of heterogeneous data sources managed by

disparate IT systems (databases, files, websites, data services …). A single access layer is

responsible to provide a consistent representation of the information in a unified structure

format, independently of the technical aspects of source data, such as location, storage

structure, API, access language, and storage technology.

This conceptual pattern is typically used in various enterprise applications, such as business

intelligence, service oriented architectures, cloud computing and master data management.

Technically, examples of an implementation of the Data Virtualization pattern are numerous:

Enterprise Service Bus (hereinafter: ESB) software can be used to develop a layer of

service to allow access to the data, while hiding technical implementation details and

location of the source data.

Cloud storage services can act like a single access layer by providing a unified API,

making the location of the data irrelevant for the consumer.

At the level of infrastructure, virtualization can expose data access through a single

system by federating and abstracting multiple and disparate storage units.

However EII has many challenges during different phases of its lifecycle. As the goal of EII is

to get a large set of heterogeneous data sources to appear to a user or system as a single,

homogeneous data source, different data schemas and representations has to be linked

together. For this ontology(s) can be used in order to effectively combine data or information

from multiple heterogeneous sources. Ontology can be used to structure information about

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 22/47

enterprise business resources and define relevant standard terms and concepts to describe

these resources. Moreover, ontology based information have the ability of semantic reasoning,

which facilitate inference, retrieval and discovery of knowledge in mobile commerce

applications, and thereby ensure their scalability, universality and interoperability (Wei,

Kang, & Zhou, 2008). However the effectiveness of ontology based data integration is closely

tied to the consistency and expressivity of the ontology used in the integration process

(Wache et al., 2001).

As data integration can be a costly and long process, new approaches have to be always taken

under consideration. Such one could be the so called Linked Data, which emerged, which

refers to a set of best practices for publishing and connecting structured data on the Web.

These best practices have been adopted by an increasing number of data providers, which can

deal with for instance data bases of different sizes, schemes and in different geographical

locations. Technically, Linked Data refers to data published on the web in such a way that it is

machine-readable, its meaning is explicitly defined, it is linked to other external data sets, and

can in turn be linked to from external data sets (Heath, Hepp, & Bizer, 2009). A need for a

simple, bottom-up, best practice-based approach is expressed (Frischmuth et al., 2012).

To sustain the process of data virtualization, we can identify several applicable patterns from

the field of Enterprise Architecture, as well as several techniques from information

management science.

Data virtualization relies on a set of capabilities that many software techniques can provide.

Among such capabilities, we can mention:

Abstraction: decouples the representation of the information from its technical

representation, location and storage technology.

Virtualized data access: unifies the access to the data in the form of a single access

layer

Transformation and integration : mostly focus on improving existing data by

sanitizing, verifying, aggregating, cross linking and enriching data across multiple

sources

Data federation: combines and unifies content from multiple and autonomous source

storages. Data federation may imply transformation, cleaning, and data enrichment.

Flexible data delivery: publish valuable data sets as services, consumed by external

applications or users upon request. This capability has beneficial influence for

reducing cost and complexity of integration at enterprise level by promoting

reusability. At this level should be considered the publication rules applicable to these

datasets (e.g.: privacy policies).

The file system in enterprises is expected to be distributed, heterogeneous, with different

levels of security. However the virtualization process of the preselected data should resolve

those issues by mapping them into the MSE’s repository with a single access layer. Before

achieving this, constraints have to be taken under consideration before and during the

virtualization process (some of the constraints lean on the work of Liu, Cao, and He (2011):

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 23/47

a) Because an enterprise can identify relevant data in internal as external sources (e.g.

business partner), data heterogeneity has to be dealt with, meaning for instance tha

there different data schemas, which are conceptually different and could also be

differently represented.

b) Data distribution has to be taken into account.

c) The virtualization of data is predicted to first stream locally on the IT system of the

enterprise’s data owner. Only when the data mapping is formally accepted by the data

owner, data can be mapped with the help of an extended version of USDL to the

MSE’s repository. Security rules can be defined by the enterprise to each mapped

attributes, but in the limits defined by the MSE’s framework for security and trust on

the MSE level. This allows on one hand to the MSE to enforce its security policy and

on the other to the enterprise to assign project specific security rules (meaning that

different security rules will be able to be assigned to the same set of data used in

different projects).

d) Issue of time dynamics – the quality (expressed as the depth of experience and

expertise) and availability of IA varies through time. Therefore changes in IA have to

be able to be transferred as automatically as possible into the MSE. Consequently a

cyclic constant procedure of IA change has to run on the enterprise level, locally. If

changes are perceived, they are then transferred into the MSE.

e) Finally, mining techniques have to enable the extraction of data by the application of

constraints and so called knowledge discovery agents (they define what sources and

types of knowledge are project relevant). Such schema architecture is completely

aligned with Danish in Khan (2008).

The next sub sections of the document explore various existing patterns or techniques which

can be used to provide the capabilities aforementioned.

Beside the identification of approaches for implementing virtualization processes, a specific

care should be granted to the identification of relevant data sources among the organization

infrastructure. A critical pre requisite should consist to maintain an inventory of existing data

repositories and evaluating their potential for being eligible to virtualization, regarding the

information contained the quality of the data and the subsequent cost for extracting and

processing the data.

Data warehouse and Extract, Transform and Load (ETL) techniques

The concept of “data warehouse” emerged in the early 90’s from business intelligence work,

related to data management (Vassiliadis, 2009). A data warehouse typically collects data from

several operational of external systems in order to provide its end users with access to

integrated and manageable information. Data warehouses are typically assembled from a

variety of data sources with different formats and purposes. As such, ETL is a key process to

bring all the data together in a standard, homogeneous environment and takes it role in the

field of Enterprise Data Integration (EDI). The implementation of the virtualization process

implies an overall transformation, from the origin of the relevant information or data (mostly

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 24/47

at the level of operational information systems), toward a commonly accepted format, in a

unified repository.

In practice, an ETL process has to overcome several inherent problems. First, since the

different sources structure information in different data models, the need to transform the

incoming data to a common schema, which will eventually be used for querying, is imperative

In typical cases, the source data stores can be On-Line Transactional Processing (hereinafter:

OLTP) or legacy systems, les under any format, web pages, various kinds of documents (e.g.,

spread sheets and text documents) or even data coming in a streaming fashion. Second, the

data coming from the operational sources may suffer from quality problems, ranging from

simple misspellings in textual attributes to value inconsistencies database constraints violation

and conflicting or missing information. Consequently, this kind of “noise” from the data must

be removed, to that end-users are provided with clean, complete and truthful information. The

extracted data are propagated to a special-purpose area of the warehouse, called Data Staging

Area (hereinafter: DSA), where their transformation, homogenization, and cleansing take

place. The most frequently used transformations include filters and checks to ensure that the

data propagated to the warehouse respect business rules and integrity constraints, as well as

schema transformations that ensure that data fit the target data warehouse schema. Third,

since the information is constantly updated in the production systems that populate the

warehouse, it is necessary to refresh its content regularly, in order to provide up to date

information.

The software processes that facilitate the population of the data warehouse are commonly

known as “Extract-Transform-Load” processes (Vassiliadis & Alkis, 2007).

Table 1: ETL Phases

ETL Phase Responsibility Difficulties

Extract

Identifying the correct

subset of source data that has to be

submitted to the ETL workflow for

further processing.

Extraction of the appropriate data

from the sources

Interference with configuration or

performance of the source

operational system.

Privacy / visibility of the data to be

collected.

Level of structuration of the data

being collected.

Transform

The transportation of the data to a

special purpose area of the data

warehouse

The verification and the validation

of the collected data in respect with

associate rules

The transformation of the source

Schema level problems: Structural

matching conflicts between the

conceptual models of source data

with target data model.

Record level problems: duplicated or

contradicting records. Difference of

granularity or timeliness (e.g.:

different aggregation levels, different

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 25/47

data and the computation of new

value

The isolation and cleaning of the

data

points in time)

Value level problems: Semantic

mismatches between information

values (e.g.: different time zones for

timestamps values) or

implementation formats (e.g. : date

formats : mm/dd/yy or

dd/mm/yyy)from different source

systems.

Performance: amount of incoming

data and complexity of the

transforming processes.

Load

The loading of the transformed data

to the appropriate relations in the

warehouse.

Maintaining the integrity of the target

repository (e.g. : discriminating

existing records)

Performance: bulk loading or

sequenced data loading.

Typically, only the data that are different from the previous execution of an ETL process

(newly inserted, updated, and deleted information) should be extracted from the sources. In a

traditional data warehouse setting, the ETL process periodically refreshes the data warehouse

during idle or low-load, periods of its operation (e.g., every night) and has a specific time-

window to complete. Nowadays, business necessities and demands require near real-time data

warehouse refreshment.

Considering the complexity which can be involved by such ETL processes, the design phase

of ETL processes is considered as critical for its performance, across the lifetime of its usage.

One has to consider many aspects, such as: the dissemination and the heterogeneity of the

source data at the logical and the technical level, the scalability of the ETL system to sustain

the data volumetry at the runtime, the interactions with other information systems.

Software for transforming and filtering information from one (structured, semi-structured or

unstructured) location to another has been developed since the early days of databanks. Since

then, any kind of data processing software that reshapes or filters records and populates other

data stores is a form of an ETL process. Therefore, we can consider that the definition of ETL

process can sustain the first technical tasks accomplished by a logical framework for

virtualization, in the context of MSEE.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 26/47

2.1.2 Data quality management and data virtualization

The profusion and variety of data sources and consuming applications in the organization

infrastructure is often characterized with multiple copying and replication actions for reusing

the information across the organization. This repeated pattern tends to progressively introduce

inconsistencies, leading to the demand for a greater level of trust in the quality of data.

Typical data quality issues can be summarized by the following:

Structural and Semantic inconsistency: differences in formats, structures, and

semantics, which may impact the usability of the data by consumer applications

Inconsistent validations: differences between validation rules applied across various

business processes has a direct impact on the overall data quality level and reduce its

reusability

Replicated functionality: repetitive application of similar data cleansing processes

increase cost and has no positive impact on the consistency of the data

Data entropy : the multiplication of data silos contribute to degrading the quality of

data and

Consequently to these data quality issues, the practice of preparing the data for secondary

uses is commonly applied, and the following techniques are used and orchestrated by ETL

processes (cf. “data warehouse”):

Data validation : consisting to check data instances along defined quality rules

Data parsing and standardization: where data values are processed and potentially

reformatted into a standardized representation

Data cleansing: to avoid duplication and apply automatic value corrections, for

resolving structural and semantic inconsistencies

Data enrichment: for improving the value of data, this technique can consist to add

content to enrich the initial data.

The application of such techniques can encounter some limitations when applied multiple

times at different levels of the organization. It can lead to the introduction of new errors or

inconsistencies as the rules for validation / parsing / enrichment may vary, furthermore, the

repetition of such processes may not be cost effective and produce the inverse effect of the

initial objectives for improving data quality. As an alternative to the “data sharing” pattern,

some techniques of data virtualization suggest to avoid the replication of primary data sources

into intermediate repositories. In this context, source data remains in its primary data source

and is delivered to consuming applications through an abstraction layer, responsible for

retrieving content and standardizing its representation. By incorporating several data quality

techniques, this abstraction layer can address most of data quality challenges, such as:

resolving structural and semantic inconsistency, reduce replication, unifying data validation

rules, etc…

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 27/47

2.1.3 Linkage of the presented technical concepts with intangible assets virtualization

process

This section presents a direct links of the previously defined technical concepts like the

technical framework for the virtualization process and data quality management.

The virtualization process that was developed for virtualizing IA in the MSEE is inserted into

the ETL process in order to provide multiple additional information to the integrator, because

when it is clear at which phase of the process the integrator is, he can then easier allocate the

proper techniques to handle the process, plan the process of integration and anticipate

potential risks. The other benefit is that by linking those two concepts, it gives a firm structure

to the IA virtualization process and validity, as the ETL process is quite established.

Figure 7: Linkage of ETL and virtualization procedure of intangible assets

The next section provides the techniques that are needed in order to handle the virtualization

procedure. For instance techniques like data and text mining can be used to identify and

extract knowledge, checking for inconsistencies etc.

2.2 Techniques for the virtualization process

The procedure for IA virtualization and its appropriate technical framework have presented

and linked. This section presents the key techniques to be used during the presented procedure

as part of the presented technical framework.

EXTRACT

TRANSFORM

LOAD

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 28/47

However before conducting an analysis of the most suitable handling techniques, the main

constraints in form of guidelines for the MSEE environment regarding IA have to be defined.

Such constraints are:

- security,

- data heterogeneity,

- data stored in different places,

- constraints regarding data collection etc,

- potential usage of USDL and RDF and

- other potential constraints on the MSE level (have to be inserted after trust and

monitoring related constraints defined on the MSE level in WP11)

Data mining

Data mining (hereinafter: DM) is an interdisciplinary field that combines artificial

intelligence, computer science, machine learning, database management, data visualization,

mathematic algorithms, and statistics. DM tools support the identification of hidden patterns

in large volumes of structured data based on statistical methods like association analysis,

classification, or clustering (Hand, Mannila, & Smith, 2001). DM is a technology for

knowledge discovery applied to large scale databases. This set of techniques provides

different methodologies for decision-making, problem solving, analysis, planning, diagnosis,

detection, integration, prevention, learning, and innovation. For example, certain knowledge

discovery applications rely on data mining techniques for building classification metrics from

paradigms such as Bayesian classifiers, rule-induction, and decision tree algorithms. Decision

support is the objective for applying DM to extract knowledge from a database for certain

management issues, such as customer service support, corporate failure prediction, marketing,

and grid services (Abidi, 2001), (Cannataro, Talia, & Trunfio, 2002), (Lin & McClean, 2001)

and (Shaw, Subramaniam, Tan, & Welge, 2001). Also, knowledge warehousing is developed

as an architecture to integrate the functions of knowledge management, decision support,

artificial intelligence and data warehousing (Nemati, Steiger, Iyer, & Herschel, 2002). In

general, the data mining process, and the data mining technique and function to be applied

depend very much on the application domain and the nature of the data available.

In the context of IA virtualization, techniques derived from the field of data mining may be

used to address certain data quality issues, such as extracting value from unstructured data

(i.e.: text mining), identifying correlations inside large amount of data, or assessing the

relevance of certain contents in the organization.

Data mining, as a knowledge discovery technique, may typically include the following steps

in an iterative process:

Data cleaning

Data selection

Knowledge presentation

Data transformation

Data integration

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 29/47

There are a lot of trends influencing the use of data mining such as different trends

(computing, business, wireless...), which are of course changing. However the data mining

techniques during the virtualization process must be able to take under consideration dynamic

changes of privacy and security policies, because:

a) different enterprises will have different requirements;

b) rules on the highest level will change, comprising security and privacy rules, as the

MSE will learn and evolve during time.

2.2.1 Classification

Classification is the task of mapping a data item into one of several predefined classes

(Fayyad, Piatetsky Shapiro, & Smyth, 1996). On the basis of an initial training dataset

containing the category information, classification techniques will analyse this input data in

the form of a set of quantifiable properties and will produce a classification model, which can

be applied for the classification of similar datasets. Typically, classifiers algorithms can be

found in email servers for filtering spam emails. Classification techniques can find many

usage for business purpose, as it is an essential part of many data mining applications (e.g.:

predictive analysis).

2.2.2 Clustering

Clustering is a technique to group together a set of items having similar characteristics

(Srivastava, Cooley, Deshpande, & Tan, 2000). This data analysis technique finds its usage in

data mining applications for statistical data analysis purposes. Clustering is a typical form of

unsupervised learning which classify similar objects into different groups, or more precisely,

partition a data set into clusters, so that the data in each subset ideally share some common

trait (Lida, Li, Zhongzhi, Qing, & Maoguang, 2007). The data clustering techniques can be

used to perform similarity search, pattern recognition, trend analysis, grouping, classification,

and so forth (Cheng-Ru & Ming-Syan, 2005). It can be achieved by various algorithms, which

choices depend on the nature of the data being analysed and the overall objective of the result.

In this sense, cluster analysis can be seen as an iterative process of knowledge discovery,

including several cycles and temptatives of adjusting parameters setting of the algorithms

being used. These algorithms can be categorized into nearest-neighbour clustering (Lu & Fu,

1978) and (Khaled, 2004)], fuzzy clustering (Bezdek, Hathaway, Sabin, & Tucker, 1987),

partitional clustering (Dubes, 1987), hierarchical clustering (King, 1967), artificial neural

networks for clustering (Hertz, Krogh, & Palmer, 1991), statistical clustering algorithms

(Dempster & Laird, 1977), and so on. The notion of “cluster” can vary depending on the

algorithm chosen for solving a particular problem. Typical cluster models include:

“connectivity models”, “graph based models”, “distribution models”…

In business application, clustering help marketers discover distinct groups and characterize

customer groups based on purchasing patterns. Clustering techniques could bring support in

the virtualization process when processing large data sets and trying to group content by

category. e.g.: skills, competences, etc.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 30/47

2.3 Formalism - Ontology and Taxonomy

This section presents the preferred formats to handle the IA virtualized during the proposed

procedure. Hence this section presents the main characteristics of the content format and not

file format. As content format is defined as an encoded format for converting a specific type

of data to displayable information and are used to prepare transmission of data for their

observation or interpretation (Bojko, 2004) and (Rockley, 2002).

Based on the definition of content format, the concepts of ontology and taxonomy are first

presented. Afterwards the file formats, like RDF and OWL are explained.

The hereinafter section presents the most appropriate content formats for IA management –

ontology and taxonomy. First the concept of taxonomy is presented, and then the process that

was used to develop the ontology is presented. Finally the guidelines upon which relies the IA

ontology within MSEE are presented as also the basic classification of IA. The detailed

ontology can be found in D22.5 as part of the technical IAMS framework.

Taxonomy

Taxonomy is a hierarchical set of concepts incl. attributes, related by transitive is-a and/or

equality-relations. It is presented as a representation and management formalism, which are

crucial for the management of organisations. Pincher (2010) argues that, without a taxonomy

designed for storage and management, or one that supports better searching, all types of

management systems in an organisation are not usable. Hence, knowledge taxonomy focuses

on enabling the efficient retrieval and sharing of knowledge, information and data across an

organisation by building the taxonomy around workflows and knowledge needs in an intuitive

structure (Lambe 2007).

Figure 8: Relation between taxonomy and ontology

Expre

ssiv

eness

Complexity

Glossary

Taxonomy

Thesaurus

Entity Relationship/UML Model

Topic Maps

Formal Ontology

Syntactical

Interoperability

Structural

Interoperability

Semantic

Interoperability

RDF

KIF, OL

OWL Lite

Source: (Hirsch, 2012), MSEE – D23.3

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 31/47

Ontology

“Ontology” is a philosophical discipline, a branch of philosophy that deals with the nature and

the organization of being (Sure & Studer, 2002). An ontology is an explicit specification of a

conceptualization. For systems, what “exists” is that which can be represented. When the

knowledge of a domain is represented in a declarative formalism, the set of objects that can be

represented is called the universe of discourse. This set of objects, and the describable

relationships among them, are reflected in the representational vocabulary with which a

knowledge-based program represents knowledge (Gruber, 1993). In recent years ontologies

have become a topic of interest in computer science e.g. (Maedche & Staab, 2001) and

(Fensel, 2003). In its most prevalent use in computer science, an ontology refers to an

engineering artifact, constituted by a specific vocabulary used to describe a certain reality,

plus a set of explicit assumptions regarding the intended meaning of the vocabulary (Sure &

Studer, 2002). It can be considered a type of taxonomy with even more complex relationships

aiming to describing a domain of knowledge (e.g. manufacturing), a subject area, by both its

terms (called individuals or instances) and their relationships and thus supports inference

(Hedden, 2010).

Ontologies are built to be reused or shared anytime, anywhere, and independently of the

behaviour and domain of the application that uses them. So, ontologists should be able to

specify, at least partially, a big portion of the needed vocabulary that the ontology will cover

for a given domain (Fernandez, Gomez-Perez, & Juristo, 1997). For this they have to be

expressed logically in order to assure consistency, accuracy, meaningfulness, with the use of

properties, relations and classes.

Their role is crucial, thus they for instance enable knowledge based on the internet to be

processes, shared and reused between different applications. Such meaningful role in different

environments (e.g. knowledge management in/or between organizations) is due to their

characteristics, that they provide a common understanding of specific topics. For the

management of intangible asset, ontology has another meaning, as it is the most widely used

method of mapping the knowledge of a domain to represent and describe it (Brewster,

Ciravegna, & Wilks, 2010).

2.3.1 Ontology development for intangible assets

First this section provides an overview on the development process of ontologies, afterwards

the development process for IA in MSEE is presented.

Development process for ontologies

Till now there is now unified approach, method or technique to develop an ontology, which

leads to a lack of standardized activities. However predefined criteria and described

methodologies exist. The approach of building blocks was decided upon in developing the

ontology for IA in the manufacturing environment; meaning that we decided to develop an

ontology based on some predefined guidelines valid for IA and afterwards amend it to the

specific needs of IA in the MSE.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 32/47

The ontology development process that suites MSEE needs is the following that is extensively

based on the work of Fernandez (Fernandez et al., 1997).

Table 2: Ontology development process

1. Management

1.1 Planning the goals

1.2 Ontology specification requirements documents

2. Operation

2.1 Identification of sources of knowledge – it can be elicited using Knowledge

Based Systems techniques, which should enable the listing of sources of

knowledge and

2.2 Conceptualization and implementation in a conceptual model (try to integrate

as much as possible existing ontologies in your ontology)

2.3 Transformation of model into a compatible model

3. Support

3.1 Evaluation – if it understandable to all (complete documentation is often

needed)

3.2 Maintain - Guidelines for maintenance

As one of the major risk, besides non understanding of the ontology by the targeted user, is

the lack of quality. Therefore the evaluation phase is important. Ontologies should be

evaluated before they are used or reused. The technical verification is not the only type;

moreover the user verification holds a lot of weight. According to (Gonzalez, 2005)) It can

include:

- validation - if the ontology definitions really model the real world for which the

ontology was created;

- assessment is focused on judging the ontology from the user's point of view like:

o consistency,

o completeness,

o conciseness.

Development process for the intangible assets specific ontology within the MSEE context

In order to develop the IA specific ontology within MSEE, the EU funded project entitled

MERITUM was adopted as a basis. Afterwards the taxonomy was customized to specific

needs in the manufacturing environment.

MERITUM was an EU-sponsored research project. It produced a set of guidelines for the

measurement and disclosure of intangibles which should be useful both for private and public

policy decisions. To do so, the project was organised into four activities (MERITUM, 2001)

a) Classification of intangibles;

b) Management Control Study;

c) Capital Markets and

d) Guidelines

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 33/47

The guidelines have been obtained as result of the a log process of the project group started

with a reflection on the economic nature of intangibles and a discussion of their definition and

classification, then continued with the analysis of the current measurement, management and

disclosure practices, and concluded with a test of the validity of the guidelines by means of a

Delphi analysis.

By leaning on e those guidelines, we do not agree that this framework provides a complete

comprehension of IA; however it is the most suitable framework known to us. By building on

it, hopefully the framework will be further elaborated and hence the MERITUM’s guidelines

will evolve.

Meritums’ framework proposes three main classes of intangibles (MERITUM, 2001):

a) Human capital is defined as the knowledge that employees take with them when they

leave the firm. It includes the knowledge, skills, experiences and abilities of people.

Some of this knowledge is unique to the individual, some may be generic. Examples

are innovation capacity, creativity, knowhow and previous experience, teamwork

capacity, employee flexibility, tolerance for ambiguity, motivation, satisfaction,

learning capacity, loyalty, formal training and education.

b) Structural capital is defined as the pool of knowledge that stays with the firm at the

end of the working day. It comprises the organisational routines, procedures, systems,

cultures, databases, etc. Some of them may be legally protected and become

Intellectual Property Rights, legally owned by the firm under separate title. Examples

are organisational flexibility, a documentation service, the existence of a knowledge

centre, the general use of Information Technologies, organisational learning capacity,

etc.

c) Relational capital is defined as all resources linked to the external relationships of the

firm such us customers, suppliers or R&D partners. It comprises that part of Human

and Structural Capital dealing with the company’s relations with stakeholders

(investors, creditors, customers, suppliers, etc.), plus the perceptions that they hold

about the company. Examples of this category are image, customers loyalty, customer

satisfaction, links with suppliers, commercial power, negotiating capacity with

financial entities, environmental activities, etc.

d)

The main classification of IA in the context of MSEE is presented in the figure bellow.

Figure 9: Main classification of intangible assets within the MSEE context

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 34/47

The detailed IA classification can be found in Annex #1 of this document.

Till now a process of developing the ontology has been presented, afterwards the

classification of IA from the MERITUM project was presented, which was taken as a

framework for the ontology of IA in MSEE. The IA main classification for the MSEE has

also been presented. However the developed ontology and its corresponding description is

presented in the deliverable D22.5 (Development of IA in Manufacturing Service) as part of

the technical framework for representation and description of IA by means of USDL.

It has to be emphasized that the ontology of IA specific for MSEE depicts the crucial

interdependencies between IA.

At this point in order to make an ontology or taxonomy computable, the implementation in

a formal language or called also format is needed. The World Wide Web Consortium

(WC3) has published the RDF (resource description framework) Schema and the Web

Ontology Language (OWL) recommendations about those two standards (Hedden, 2010).

The RDF specifications can be found on W3 (W3C, 2012b), as also OWL (W3C, 2012a).

2.4 File format definition

RDF is a general method to decompose any type of knowledge into small pieces, with some

rules about the semantics, or meaning, of those pieces. The point is to have a method so

simple that it can express any fact, and yet so structured that computer applications can do

useful things with it (Tauberer, 2012).

Its interesting feature is that it facilitates data merging even if the underlying schemas differ,

and it specifically supports the evolution of schemas over time without requiring all the data

consumers to be changed. The design of RDF is intended to meet the following goals (W3C,

2012b):

having a simple data model

having formal semantics and provable inference

using an extensible URI-based vocabulary

using an XML-based syntax

supporting use of XML schema datatypes

allowing anyone to make statements about any resource

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 35/47

The underlying structure of any expression in RDF is a collection of triples, each consisting

of a subject, a predicate and an object. A set of such triples is called an RDF graph. This can

be illustrated by a node and directed-arc diagram, in which each triple is represented as a

node-arc-node link (hence the term "graph") (W3C, 2012b).

Figure 10: Representation of a statement

Source: (W3C, 2012b)

Each triple represents a statement of a relationship between a subject and an object, the

entities denoted by the nodes that are linked.

The direction of the arc is significant: it always points toward the object.

They are many reasons why use RDF, among others are (Tauberer, 2012):

integration data from different sources without custom programming.

offer data for re-use by other parties

enable decentralization data in a way that no single party "owns" all the data.

data handling (browse, query, match, input, extract, ...),.

It allows decomposing to the desired level IA and afterwards describing each piece

accordingly to MSEE needs. Also it will allow the MSE to apply data management techniques

and also more advanced techniques. This means that RDF is in compliance with the MSEE

requirements regarding IA.

There is also the RDF Schema, which defines “schema vocabulary” that supports definition of

ontologies – gives “extra meaning” to particular RDF predicates and resources. It provides the

framework to describe application-specific classes and properties. This allows resources to be

defined as instances of classes, and subclasses of classes (W3Schools, 2012).

The OWL Web Ontology Language is designed for use by applications that need to process

the content of information instead of just presenting information to humans. OWL facilitates

greater machine interpretability of Web content than that supported by XML, RDF, and RDF

Schema (RDF-S) by providing additional vocabulary along with a formal semantics (W3C,

2012a).

The W3C layer cake includes:

XML provides a surface syntax for structured documents, but imposes no semantic

constraints on the meaning of these documents.

XML Schema is a language for restricting the structure of XML documents and also

extends XML with datatypes.

RDF is a datamodel for objects ("resources") and relations between them, provides a

simple semantics for this datamodel, and these datamodels can be represented in a

XML syntax.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 36/47

RDF Schema is a vocabulary for describing properties and classes of RDF resources,

with a semantics for generalization-hierarchies of such properties and classes.

OWL adds more vocabulary for describing properties and classes: among others,

relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality,

richer typing of properties, characteristics of properties (e.g. symmetry), and

enumerated classes.

It can be concluded that if one wants to perform more than basic semantics RDF and

OWL must be used.

Although in the next iteration OWL 2 could be taken into account. It has a very similar

overall structure to OWL 1 and adds new functionalities with respect to OWL 1. Some of the

new features are syntactic sugar (e.g., disjoint union of classes) while others offer new

expressivity, including (W3C, 2009):

keys;

property chains;

richer datatypes, data ranges;

qualified cardinality restrictions;

asymmetric, reflexive, and disjoint properties; and

enhanced annotation capabilities

2.4.1 Analysis of taxonomy management tools

A shorter taxonomy software analysis, aligned with project partners, is also planned in order

to identify the most suitable one. Regarding the situation, Protégé is most promising one for

now.

Table 3: Comparison of taxonomy management softwares

Ontology editor PROTÉGÉ ONTO STUDIO NEOLOGISM

Yes (++) Yes (++) No (--)

Programming language Java (+) mostly in Java (-) PHP (Web-based)

(+)

Support for RDF Yes (++) Yes (++) Yes (++)

Support for OWL Yes (++) Yes (++) just a part (-) Rapid prototyping and application development + + -

User-friendly ++ - -

Free Yes (++) No (--) Yes (++)

Open source Yes (++) No (--) Yes (++)

Label: We assigned for every point and for every SW a qualitative judgment (++/+/-/--), based on the coherence between the characteristics of the SW and the objectives of our work within MSEE project.

In the next document iteration (M24), one new software will be taken under consideration and

analysis, namely the NeOn Toolkit (NeOn Project, 2012). It is a state-of-the-art, open source

multi-platform ontology engineering environment, which provides comprehensive support for

the ontology engineering life-cycle. It was developed as part of the NeOn Project.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 37/47

Conclusion and further steps

All relevant documents in terms of procedures, techniques, models and requirements have

been defined in order to alleviate the often complex process of virtualization into a repository.

All have been made specific for the IA context in MSEE, thus trying to diminish the gap from

the IA standpoint of view between the needed and available resources (knowledge, techniques

…) of the MSE and MEs in creating an effective MSE.

Leaning extensively on the GRAI model a clear and structured definition and decomposition

of objectives have been made possible to ta ME. From the decomposed objectives (using all

three levels – strategic, tactical and operational), using the MDSEA architecture the

identification of their project relevant IA, that will be virtualized, has been made possible.

Still leaning on the same architecture the ME can now identify also what exactly is needed (in

terms of IA) from the MSE to be able to accomplish a project. This means that it enables also

the identification and definition of requirements from the MEs, which leads also to the

assumption that the developed virtualization process for implementation of IA, could be also

used for virtualization requirements coming from the market to the MSE. However this issue

is not dealt with in this document. The sources of IA can now be identified by the ME, as it

has clearly defined and decomposed IA relevant to the project that has to be virtualized.

After having identified exactly which IA to virtualize and identify its location, the

virtualization process can begin. Although it is IA specific for the MSEE context, it has been

inserted into a well known and stable framework, called the ETL. Taking into consideration

that one the main risk is low data quality (expressed in different ways), the cooperation of the

MSE during the process has been inserted, in order to reach higher level of quality. Of course

the virtualization process cannot be effectuated without techniques that will enable the

identification of sources of IA, the extraction, cleaning and integration into the taxonomy.

Therefore the main techniques have been presented.

At this point formats in which the process will be handled is still unknown. In order to ensure

as much detailed information for the development of the technical IAMS framework in

D22.5, formats were divided into two main categories. The first being content format and the

second being file format. The former one, the ontology development process for IA in the

MSEE specific context has been presented. However the result is presented in D22.5 as an

essential part of the technical IAMS framework. As for the file content, RDF and OWL were

presented and proposed for further use.

All the necessary documents have been established in order to enable IA as a capability of

service to flow effectively between the ME and MSE and also within the MSE. Subsequently

the virtualized IA provides the so called capabilities of IA as a service.

In D22.5 the established specific ontology and taxonomy will be presented as formalism for

effective content management; also realistic data will be used and inserted into a real case

from one of our project partner Bivolino. Of course as the prototype based on USDL will be

established in D22.5, the results obtained in D22.3 will form an important building block –

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 38/47

representing the documents and specification for an implementation of IA as a service. All

information related to the Bivolino use-case can be found in D22.5. It has to be emphasized

that the real world case example will be aligned with WP23, which is dealing with tangible

assets.

The next steps go into deepening the proposed techniques and procedures, refining them and

aligning them according to new information obtained from the prototype development and

other project partners. Optimisation of the proposed procedure has to be addressed as also

challenges related to demanding requirements such as IPR management. However regardless

of the next steps, data quality and consistency has to be always kept in mind.

If one takes a look at the results of D22.3 and D22.5 from a wider angle, it can be seen that

they will afterwards enable the systematic monitoring of performance and risk in an

enterprise’s value creation system and thus constituting an important part of the management

system.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 39/47

SOURCES

1. Abidi, S. (2001). Knowledge management in healthcare: towards ‘knowledge-driven’

decision-support services. International Journal of Medical Informatics, 63, 5-18.

2. Bezdek, J. C., Hathaway, R. J., Sabin, M. J., & Tucker, W. T. (1987).

CONVERGENCE THEORY FOR FUZZY C-MEANS: COUNTEREXAMPLES

AND REPAIRS. IEEE Transactions on Systems, Man and Cybernetics, 873-877.

3. Bohlouli, M., Holland, A., & Fathi, M. (2011, 15-17 May 2011). Knowledge

integration of collaborative product design using cloud computing infrastructure.

Paper presented at the Electro/Information Technology (EIT), 2011 IEEE International

Conference on.

4. Bojko, B. (2004). Content Management Bible: Wiley.

5. Brewster, C., Ciravegna, F., & Wilks, Y. (2010). Knowledge Management: Position

Paper

6. Cannataro, M., Talia, D., & Trunfio, P. (2002). Distributed data mining on the grid.

Future Generation Computer Systems, 18, 1101–1112.

7. Cheng-Ru, L., & Ming-Syan, C. (2005). Combining partitional and hierarchical

algorithms for robust and efficient data clustering with cohesion self-merging.

Knowledge and Data Engineering, IEEE Transactions on, 17(2), 145-159. doi:

10.1109/tkde.2005.21

8. Dempster, A., & Laird, N. M. (1977). Maximum Likelihood from Incomplete Data via

the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological),

39(1).

9. Doumeingts, G., Lieu, C., Chen, D., Ducq, Y., T.;, A., G.;, Z., . . . Silva, E. (2012).

D11.1 - Service concepts, models and method: Model Driven Service Engineering:

MSEE.

10. Dubes, R. C. (1987). How many clusters are best? - An experiment. Pattern

Recognition, 20(6), 645-663. doi: 10.1016/0031-3203(87)90034-3

11. Fayyad, U., Piatetsky Shapiro, G., & Smyth, P. (1996). From data mining to

knowledge discovery: An overview. ACM KDD, 39(11), 27-34.

12. Fensel, D. (2003). A Silver Bullet for Knowledge Management and Electronic

Commerce: Springer.

13. Fernandez, M., Gomez-Perez, A., & Juristo, N. (1997). Methontology: From

Ontological Art Towards Ontological Engineering AAAI Technical report (pp. 33-40):

AAAI.

14. Frischmuth, P., Klímek, J., Auer, S., Tramp, S., Unbehauen, J., Holzweißig, K., &

Marquardt, C. M. (2012). Linked Data in Enterprise Information Integration. Semantic

Web journal.

15. Gonzalez, R. (2005). A Semantic Web approach to digital rights management. od

Manuel a.

16. Gruber, T. (1993). Toward Principles for the Design of Ontologies Used for

Knowledge Sharing (pp. 1-22). Knowledge Systems Laboratory: Stanford University.

17. Hand, D., Mannila, H., & Smith, P. (2001). Principles of Data Mining: MIT Press.

18. Heath, T., Hepp, M., & Bizer, C. (Producer). (2009, 2012). Linked Data - The Story

So Far. International Journal on Semantic Web and Information Systems. Retrieved

from http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf

19. Hedden, H. (2010). The Accidental Taxonomist, what are taxonmies. 464.

20. Hertz, J. A., Krogh, A. S., & Palmer, R. G. (1991). Introduction To The Theory Of

Neural Computation: Westview Press.

21. Hirsch, M. (2012). D23.3 - OMSE Management Framework for Tangible Assets (Vol.

D23.3): MSEE.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 40/47

22. INTEROP_NoE. (2005). Deliverable DTG3. http://www.interop-noe.org/.

23. Khaled, M. H. (2004). Efficient Phrase-Based Document Indexing for Web Document

Clustering. IEEE Transactions on Knowledge and Data Engineering, 16, 1279-1296.

24. Khan, D. (2008, 9-12 Dec. 2008). CAKE - Classifying, Associating and Knowledge

DiscovEry - An Approach for Distributed Data Mining (DDM) Using PArallel Data

Mining Agents (PADMAs). Paper presented at the Web Intelligence and Intelligent

Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on.

25. King, B. (1967). Step-Wise Clustering Procedures. Journal of the Ameriacn Statistical

Association, 62(317), 86-101.

26. Lambe , P. (2007). Taxonomies, Knowledge and Organisation Effectiveness (Chandos.

27. Lida, X., Li, Z., Zhongzhi, S., Qing, H., & Maoguang, W. (2007, 7-10 Oct. 2007).

Research on Business Intelligence in enterprise computing environment. Paper

presented at the IEEE International Conference on Systems, Man and Cybernetics,

2007. ISIC.

28. Lin, F., & McClean, S. (2001). A data mining approach to the prediction of corporate

failure. Knowledge-Based Systems, 14, 189-195.

29. Liu, B., Cao, S. G., & He, W. (2011). Distributed data mining for e-business.

Information Technology and Management, 12(2), 67-79. doi: 10.1007/s10799-011-

0091-8

30. Lu, S.-Y., & Fu, K. S. (1978). A Sentence-to-Sentence Clustering Procedure for

Pattern Analysis. Systems, Man and Cybernetics, IEEE Transactions on, 8(5), 381-

389. doi: 10.1109/tsmc.1978.4309979

31. Maedche, A., & Staab, S. (2001). Ontology Learning for the Semantic Web. IEEE

Intelligent Systems, 16(2), 72-79.

32. MERITUM. (2001). MEasuRing Intangibles To Understand and improve innovation

Management. In T. program (Ed.).

33. Nemati, H. R., Steiger, D. M., Iyer, L. S., & Herschel, R. T. (2002). Knowledge

warehouse: an architectural integration of knowledge management, decision support,

artificial intelligence and data warehousing. Decision Support Systems, 33, 143–161.

34. NeOn Project. (2012). NeOn Toolkit, 2012, from http://neon-toolkit.org/

35. Overby, E., Slaughter, S. A., & Konsynski, B. (2010). Research Commentary -The

Design, Use, and Consequences of Virtual Processes. Information Systems Research,

21(4), 700-710. doi: 10.1287/isre.1100.0319

36. Pincher, M. (2010). A guide to developing taxonomies for effective data management,

2012, from www.computerweekly.com/Articles/2010/04/06/240539/a-guide-to-

developing-taxonomies-for-effective-data.htm

37. Rockley, A. (2002). Managing Enterprise Content: A Unified Content Strategy: New

Riders.

38. Shaw, M., Subramaniam, C., Tan, G. W., & Welge, M. (2001). Knowledge

management and data mining for marketing. Decision Support Systems, 31, 127–137.

39. Srivastava, J., Cooley, R., Deshpande, M., & Tan, P.-N. (2000). Web usage mining:

discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl.,

1(2), 12-23. doi: 10.1145/846183.846188

40. Sure, Y., & Studer, R. (2002). On-To-Knowledge Methodology - Final version.

41. Tauberer, J. (2012). Rdf:about. Retrieved 2012-09-01 http://rdfabout.com/

42. van Geenhuizen, M., & Nijkamp, P. (2012). Knowledge virtualization and local

connectedness among young globalized high-tech companies. Technological

Forecasting and Social Change, 79(7), 1179-1191. doi:

10.1016/j.techfore.2012.01.010

43. Vassiliadis, P. (2009). A Survey of Extract–Transform–Load Technology.

International Journal of Data Warehousing & Mining, 5(3), 1-27.

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 41/47

44. Vassiliadis, P., & Alkis, S. (2007). EXTRACTION, TRANSFORMATION, AND

LOADING. Retrieved from University of Ioannina website:

http://www.cs.uoi.gr/~pvassil/downloads/ETL/SHORT_DESCR/08SpringerEncyclop

edia_draft.pdf

45. W3C. (2009). OWL 2 Web Ontology Language Document Overview: W3C.

46. W3C. (2012a). W3C Recommendation - OWL Web Ontology Language. W3C

Recommendation, 2012, from http://www.w3.org/TR/owl-features/

47. W3C. (2012b). W3C Recommendation - Resource Description Framework (RDF):

Concepts and Abstract Syntax, 2012, from http://www.w3.org/TR/2004/REC-rdf-

concepts-20040210/

48. W3Schools. (2012). RDF Schema (RDFS), 2012, from

http://www.w3schools.com/rdf/rdf_schema.asp

49. Wache, W., Vogele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., &

H¨ubner, S. (2001). Ontology-Based Integration of Information - A Survey of Existing

Approaches. Paper presented at the IJCAI’01 Workshop. on Ontologies and

Information Sharing.

50. Wei, Z., Kang, M., & Zhou, W. (2008). A Semantic Web-Based Enterprise

Information Integration Platform for Mobile Commerce. Paper presented at the

International Conference on Management of e-Commerce and e-Government.

Annexes

Annex 1 – IA classification in the MSEE context

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 42/47

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 43/47

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 44/47

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 45/47

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 46/47

Project ID 284860 MSEE – Manufacturing SErvices Ecosystem

Date: 12/10/2012 Deliverable D22.3 – M12

MSEE Consortium Dissemination: PU 47/47