[ieee 2011 ieee 8th international conference on e-business engineering (icebe) - beijing, china...

5
A Method of Building Virtual Datacenter Based on Semantic Views Li Huayu 1 College of Computer and Communication Engineering, China University of Petroleum 1 Dongying, China [email protected] OUYANG Chunping 2 College of Computer and Communication Engineering, University of Science and Technology Beijing 2 Beijing, China Abstract—Massive data exists in Well-Engineering domain, and they are distributed or heterogeneous. These features lead to difficulties during data integration and global decision making. For this problem, this paper proposes a solution of building virtual datacenter by means of semantic integration technology. In this method, schemas of data sources are firstly mapped to semantic views on domain ontology; secondly, with translation algorithm, global semantic query is rewritten into series of SQL statements which will be dispatched to relevant data sources and executed on local sites; finally, query results are reorganized and submitted to users. By practical application, this virtual datacenter can supply semantic-based data support for well production decision. Semantic views, Domain ontology, Virtual datacenter, Well engineering I. INTRODUCTION With rapid development in Well-Engineering informatization, production and management institutions have accumulated immeasurable data resources and they usually present following characters: distributed, semantic heterogeneous, having complex association relationship, oriented to different systems and so on. Among them, semantic heterogeneous has become a prominent problem on data sharing and integration. Therefore, it is very necessary to build an integration platform to provide effective data services for global decision making. For above problems, VDC (Virtual Datacenter) technology provides a good solution. VDC is defined as an integration infrastructure to support unified and transparent data services based on virtual views technology by which local data sources is mapped to series of virtual views described using global schema elements. In VDC, global query committed to VDC is firstly parsed and rewritten into several sub-queries; secondly, each sub-query is dispatched to respective local data site to be executed; finally, each result of sub-query are combined to a complete result and returned to users. Because results are directly from local data sources, resource consumption caused by frequent data loading is avoided and operation efficiency of VDC can increase to a great extent. At present, many research issues of data integration based on VDC are concerned. COSMOS VDC [1] constructed by USA government provides data support for earthquake prediction by data integration from each earthquake observation stations. For social science, Harvard University develops the Dataverse VDC [2] which provides sharing services for research achievement among sociologists. Oriented towards gas field, Y Wangcheng proposes a views-based VDC [3] platform. Two components are packaged in this VDC to achieve data integration: one is mapping-rules base responsible for definition of virtual views; another component is DXC (Data Exchange Center) with function of query decomposition and results combination. The advent of Ontology technology [4] provides a better solution to semantic heterogeneity and makes semantic- based VDC become a research focus. INDUS [5] is developed as an integration platform by creating semantic views for distributed biological data sources. In DartGrid [6] project, which is designed for data integration in Traditional Chinese Medicine domain, each table is mapped as semantic views of global ontology. The same application requirement also exists in Well-Engineering area; therefore, based on domain global ontology, this paper proposes a semantic- based VDC implementation named WeVDC (Well- Engineering VDC), in which each table is defined as series of semantic views expressed in a group of 3-tuples with similar syntax to RDF triple. With this kind of mapping mode, WeVDC can resolve semantic difference among distributed relational data sources and effectively satisfy the real-time requirement for query results. The reminder of this paper is organized as follows: Section 2 presents the implementation framework of WeVDC; Section 3 gives the definition of semantic views; Section 4 describes semantic query process and SPARQL- SQL translation algorithm in detail; Section 5 introduces implementation technology and demonstrates application effect with several interfaces; finally, conclusions are summarized. II. IMPLEMENTATION ARCHITECTURE As shown in Fig. 1, WeVDC is implemented as four- layer architecture composed of Services Layer, Integration Layer, Wrapper Layer and Data Source Layer. 1) Services Layer: This layer comprises two components: one is Query Constructor by which user can generate a semantic query in form of SPARQL statement and the other is Results-Displayer responsible for organizing and displaying query results. 2011 Eighth IEEE International Conference on e-Business Engineering 978-0-7695-4518-9/11 $26.00 © 2011 IEEE DOI 10.1109/ICEBE.2011.18 31

Upload: chunping

Post on 28-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

A Method of Building Virtual Datacenter Based on Semantic Views

Li Huayu1

College of Computer and Communication Engineering, China University of Petroleum1

Dongying, China [email protected]

OUYANG Chunping2

College of Computer and Communication Engineering, University of Science and Technology Beijing2

Beijing, China

Abstract—Massive data exists in Well-Engineering domain, and they are distributed or heterogeneous. These features lead to difficulties during data integration and global decision making. For this problem, this paper proposes a solution of building virtual datacenter by means of semantic integration technology. In this method, schemas of data sources are firstly mapped to semantic views on domain ontology; secondly, with translation algorithm, global semantic query is rewritten into series of SQL statements which will be dispatched to relevant data sources and executed on local sites; finally, query results are reorganized and submitted to users. By practical application, this virtual datacenter can supply semantic-based data support for well production decision.

Semantic views, Domain ontology, Virtual datacenter, Well engineering

I. INTRODUCTION

With rapid development in Well-Engineering informatization, production and management institutions have accumulated immeasurable data resources and they usually present following characters: distributed, semantic heterogeneous, having complex association relationship, oriented to different systems and so on. Among them, semantic heterogeneous has become a prominent problem on data sharing and integration. Therefore, it is very necessary to build an integration platform to provide effective data services for global decision making.

For above problems, VDC (Virtual Datacenter) technology provides a good solution. VDC is defined as an integration infrastructure to support unified and transparent data services based on virtual views technology by which local data sources is mapped to series of virtual views described using global schema elements. In VDC, global query committed to VDC is firstly parsed and rewritten into several sub-queries; secondly, each sub-query is dispatched to respective local data site to be executed; finally, each result of sub-query are combined to a complete result and returned to users. Because results are directly from local data sources, resource consumption caused by frequent data loading is avoided and operation efficiency of VDC can increase to a great extent.

At present, many research issues of data integration based on VDC are concerned. COSMOS VDC [1] constructed by USA government provides data support for earthquake prediction by data integration from each earthquake observation stations. For social science, Harvard University develops the Dataverse VDC [2] which provides

sharing services for research achievement among sociologists. Oriented towards gas field, Y Wangcheng proposes a views-based VDC [3] platform. Two components are packaged in this VDC to achieve data integration: one is mapping-rules base responsible for definition of virtual views; another component is DXC (Data Exchange Center) with function of query decomposition and results combination.

The advent of Ontology technology [4] provides a better solution to semantic heterogeneity and makes semantic-based VDC become a research focus. INDUS [5] is developed as an integration platform by creating semantic views for distributed biological data sources. In DartGrid [6] project, which is designed for data integration in Traditional Chinese Medicine domain, each table is mapped as semantic views of global ontology. The same application requirement also exists in Well-Engineering area; therefore, based on domain global ontology, this paper proposes a semantic-based VDC implementation named WeVDC (Well-Engineering VDC), in which each table is defined as series of semantic views expressed in a group of 3-tuples with similar syntax to RDF triple. With this kind of mapping mode, WeVDC can resolve semantic difference among distributed relational data sources and effectively satisfy the real-time requirement for query results.

The reminder of this paper is organized as follows: Section 2 presents the implementation framework of WeVDC; Section 3 gives the definition of semantic views; Section 4 describes semantic query process and SPARQL-SQL translation algorithm in detail; Section 5 introduces implementation technology and demonstrates application effect with several interfaces; finally, conclusions are summarized.

II. IMPLEMENTATION ARCHITECTURE

As shown in Fig. 1, WeVDC is implemented as four-layer architecture composed of Services Layer, Integration Layer, Wrapper Layer and Data Source Layer.

1) Services Layer: This layer comprises two components: one is Query Constructor by which user can generate a semantic query in form of SPARQL statement and the other is Results-Displayer responsible for organizing and displaying query results.

2011 Eighth IEEE International Conference on e-Business Engineering

978-0-7695-4518-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ICEBE.2011.18

31

Page 2: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

Figure 1. Implementation Structure of WeVDC

2) Integration Layer: This layer consists of four principal components: WeOnto (Well-Engineering domain Ontology), Mapping-rules Base, Query Parser and Results Processor. As semantic model shared in Well-Engineering domain, WeOnto is generated under guidance of domain experts and it has two functions of providing domain terms for Query Constructor and semantic definition for semantic views which are stored in Mapping-rules Base as reference for Query Parser to execute SPARQL-SQL translation algorithm. Results Processor is responsible for combining sub-results returned from distributed data sources and submitting combined results to Results Displayer.

3) Wrapper Layer: Two components are packaged in this layer: Schema Extractor and Query Executor. The former is responsible for connecting data sources and extracting schema metadata; the function of latter is to execute sub-query dispatched from Query Parser and return query results to Results Processor.

4) Data Source Layer: Distributed data sources are located in this layer. They are mainly relational databases ranging from oil production, measures, and underground equipment to cost consumption.

III. DEFINITION OF SEMANTIC VIEWS

Defining semantic views for data sources is in nature to create series of mapping rules between local schemas and WeOnto. In this paper, we adopt LAV (Local as Views) [7] mapping mode to express logical model of table in groups of semantic query statements which are similar to RDF triples. These statements are then stored in certain format as reference rules to execute SPARQL-SQL translation process.

Semantic views is named SV-R (Semantic Views for Relational tables) and defined as Definition 1.

Definition 1. SVR (A1 A2 … An) ← RDF(X Y) • Ai is a column of table; • RDF(X,Y)={RDFAi |RDFAi = {rdf1,rdf2,…,rdfn}};

RDFAi is a set of 3-tuples used to describe Ai; and defined as Definition 2. Definition 2. RDFAi(X,Y) = { rdf1: <?X1 rdf:type WeOnto:StartClass > rdf2: <?X2 rdfs:subClassOf ?X1 > | <?X1 :objAtt ?X2 >rdf3: <?X2 rdf:type WeOnto:X2Class >,…, rdfn: <?Xn-1 WeOnto:dataTypeAtt ?Y >}.

• X={X1,X2,…,Xn}, each Xi is a class of WeOnto; • rdf1: <?X rdf:type WeOnto:StartClass> is the first 3-

tuple indicating the start class of WeOnto. • rdfn is the last 3-tuple of RDFAi; Y is a data-type

property of of ClassXn-1 and denotes the column Ai.• ∀rdf2i (i=1,2,…k,…) denotes the relationship of

ClassX2i and ClassX2i-1: ClassX2i is subclass or an object property of ClassX2i-1.

Fig 2 shows this definition process of SVR. The tree on the upper part displays part structure of WeOnto; each column of table in D1 and D2 (they are two data sources) is mapped to relevant data-type property of WeOnto.

Next, we take one table named YJYCYSJ (JH, QK, RQ,YCY, YCY2, YCQ) in D1 as example to illustrate this process.

Figure 2. Definition Process of SVR

According to syntax of SVR, SVRYJYCYSJ is written as following statements

SVRYJYCYSJ (JH, QK, RQ, YCY, YCY2, YCQ) ←{rdf1=[<?Y1 rdf:type :OilWell>, <?Y1 :wellID ?JH>]; rdf2=[<?Y1 rdf:type :OilWell>, <?Y1:blockName ?QK)]; rdf3=[<?Y1 rdf:type :OilWell>,<?Y1 :OutPut ?Y2>

<?Y2 rdf:type :OutPut>, <?Y2 :date ?RQ>]; rdf4=[<?Y1 rdf:type :OilWell>,<?Y1 :OutPut ?Y2>

<?Y2 rdf:type :OutPut>, <?Y2 : fluidOutPut ?YCY>]; rdf5=[<?Y1 rdf:type :OilWell>,<?Y1 :OutPut ?Y2>

<?Y2 rdf:type :OutPut>, <?Y2 : oilOutPut ?YCY2>]rdf6=[<?Y1 rdf:type :OilWell>,<?Y1 :OutPut ?Y2>

<?Y2 rdf:type :OutPut>, <?Y2 : gasOutPut ?YCQ>]}

As formalized statement, SVR statements are required to convert into mapping rules in certain format and saved in Mapping-rules Base. Therefore, a storage structure is needed to express instantiation object of SVR, and we give

32

Page 3: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

this instantiation structure of SVR in Definition 3.Definition 3. SVR-I (SVR Instances) is the instantiation

structure of SVRSVR-I = {Li | ∀Nodej∈Li, Nodej = (pnode, node, cnode).

Node, pnode, node, cnode ∈ C ∪ P, C and P is class or property of WeOnto}.

• SVR-I is a set of orderly Nodes and each Node is a 3-tuple containing three elements such as pnode, nodeand cnode, where pnode is super class of node and cnode is subclass or data-type property of node.

• Node has two types, one is Class type named NodeC,the other is data-type property named NodeDTP(DataType Property).

• ∀ Nodei, Nodei+1, Nodei = (pnodei, nodei, cnodei)Nodei+1 = cnodei = (pnodei+1, nodei+1, cnodei+1).

According to Definition 3, SVR-IYJYCYSJ is described as SVR-IYJYCYSJ = {L1, L2, L3, L4, L5, L6}.

Each Li in SVR-IYJYCYSJ is a mapping item and corresponding to rdfi in SVRYJYCYSJ. Among them, the expression of L3 denotes column RQ is mapped to a data-type property of class OutPut, which is then an object property of class OilWell. This expression is formalized as L3 = {NodeCOilWell (Well, OilWell, OutPut), NodeCOutPut (OilWell,OutPut, :date), NodeDTPdate (OutPut, :date)}.

A graphic interface is developed to achieve the definition of SVR. During this process, for every data sources, its corresponding SVR-Ii is created and saved in Mapping-rules Base.

IV. SPARQL-SQL TRANSLATION PROCESS

The purpose of this algorithm is to translate global semantic SPARQL query into sub-SQL which is corresponded to local database. As two reference information, mapping rules of SVR and semantic model of WeOnto are required to accomplish translation process. Moreover, in designing this algorithm, two kinds of distributed situations of local data sources should be considered seriously:

1) Query results come from multiply data sources and final results are combination of sub-result of each data sources. This combination is simply adding new records or deleting identical records.

2) Query results concern multiple data sources. To obtain final results, this process requires parameter transmission among data sources.

A. SPARQL-SQL Translation Algorithm This algorithm comprises three sub-processes. Firstly, a

procedure is executed to parse SPARQL statement and the returned information is saved in four relational tables:

• WhereTable (CID, Property, Value). This table storages variables and properties in Where sub-statement of SPARQL.

• ClassTable (CID, Classes). It saves class symbols which have corresponding properties existing in records of WhereTable.

• SelectTable(CID,Property): This table storages variables and properties in Select sub-statement of SPARQL.

• ConditonTable(CID,Property,Value): This table storages constant and statements in Where and Filtersub-statement of SPARQL.

Secondly, according to mapping rules in Mapping-rules Base and records in these four tables, WeVDC start SPARQL-SQL translation algorithm to construct the Selectstatement and Where statement of SQL. Table 1 gives this algorithm.

TABLE I. SPARQL-SQL TRANSLATION ALGORITHM

Input SPARQL query statement Output A sqlVector object which saves SQL statements Step 1 Construction of Select statements of SQL 123456789

10 11

Vector selV; for (int i=0; i< SelectTable.rowCount(); i++) {String cBinding = SelectTable.getValueAt(i,1);

String pMark = SelectTable.getValueAt(i,2); Vector setC = ClassTable.find(cBinding); Vector setP = WeOnto.ClassOfProperty(pMark); String cC = setC.lowest(); String cP = setP.lowest(); For each cC, Cp and pMark, its corresponding class or property is matched from WeOnto and named respectively WeOnto.cC, WeOnto.cP and WeOnto.pMark; meanwhile, their SVR-I statements are created; As parameters, they are transmitted to Mapping-Rules Base to obtain the data sources, tables and columns. The returned symbols are named Di.cCDj.cP and Dj.pMark(Di denotes the ID of data source); selV.add(“select * from Di.cC”); selV.add(“select Dj.pMark from Dj.cP”); }

Step 2 Construction of Where statements of SQL 12 13 14 15 16

17 18

Vector whereV;for (int i=0; i<conditionTable.rowCount(); i++) {String cBinding= conditionTable.getValueAt(i,1);

String pMark= conditionTable.getValueAt(i,2); Taking the same method as Row 9 described to get Di.cC, Di.cPand Di.pMar ; in addition, the const value of pMark is named Vpmark.whereV.add(“Di.cC.pMark=Vpmark”); whereV.add(“Di.cP.pMark=Vpmark”);}

Step 3 Combination of sub-statements 19 Statements in selV and whereV are combined and saved in

sqlVector;20 return sqlVector;

In Step 1, a repeat procedure is firstly executed to get all symbols of classes and properties from records in SelectTable (Rows 2-4); secondly, by function call, two class set relating to above symbols are saved in setC and setP objects (Rows 5-6); thirdly, by matching process conducted in Mapping-Rules Base, data sources, tables and columns relating to objects in setC and setP query requirement are determined (Rows 9); fourthly, based on returned logical information of data sources, select sub-statement of SQL is constructed and saved in selV object (Rows 10-11).

33

Page 4: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

With reference information in Mapping-Rules Base and ConditionTable, Step 2 takes similar process to construct Where sub-statement of SQL which will be save in whereVobject (Rows 14-18).

In the end, the final SQL statement is created in Step 3 by combination relevant sub-statement from selV and whereV(Rows 19-20).

B. SPARQL Query Example We take following query as example to illustrate the

SPARQL-SQL translation process.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dg: <http://huabeiOilField.com/schema#> SELECT ?x ?y WHERE { ?a rdf:type OilWell ?a weonto:wellID ?x ?a weonto:output ?b ?b rdf:type OutPut ?b weonto:outPutMon ?y

?a weonto:blockName “WangXuZHuang " Filter (?y > 10}

This query requirement is to get well number and monthly oil-production information of oil wells which locates at WangXuZHuang block and were imposed fracture measure in October 2009. Further, amount of monthly oil-production is more than 10 ton is a query condition which should be satisfied.

By executing this algorithm, relevant records are generated in four tables shown as Fig 3.

Figure 3. Four tables Generated by translation algorithm

Then, this SPARQL is translated into two SQL statements relating to two distributed data sources.

• For data source D2, the SQL is constructed as following: Select csxm.jh from csxm where csxm.qk = WangXuZHuang

and csxm.cslx = ’yl’. • For data source D1, the SQL is constructed as

following: Select daa01.jh, daa01.ycy from daa01 where daa01.qk= WangXuZHuang

and daa01.ycy>10. At last, these two sub-results are combined to get the final

query results.

V. SYSTEM IMPLEMENATATION

Based on above methods, we implement a prototype of WeVDC using Java technology. In this system, use interfaces are developed by Swing package; Jena APIs are used to access WeOnto, and JDBC-Thin APIs are imported in Wrapper to connect and access relational database.

Fig. 4 shows the interface of SVR definition. The tree control on left part and right part display respectively hierarchical structure of WeOnto and logical model of data source; the middle part is the mapping setting area. With this interface, the operation users need to do is to select one semantic path of WeOnto for every columns of a table.

Figure 4. Interface of SVR Definition

Fig. 5 displays the interface of Query Constructor. As shown in this interface, this query requirement is to get some information of oil wells which locates at Guan109block and were imposed fracture measure in October 2008, and the detailed query items includes fracture type, fracture liquid, monthly oil-production and production mode in March 2009. Through setting procedure, the query requirement is translated into a SPARQL statement and submitted to Query Parser.

Figure 5. Interface of Query Constructor

Results of this query are shown in Fig. 6. In order to get further and detailed information of each result item, users can click the button “Query” located at the end of each item, and the returned results will be displayed on popup windows.

34

Page 5: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

Figure 6. Results Display of Semantic Query

VI. CONCLUSIONS For problems of schema and semantic heterogeneous

among distributed data sources in Well-Engineering domain, by creating semantic views between global ontology and data sources, this paper proposes a semantic-based WeVDC implementation. Based on mapping rules defined in semantic views, WeVDC can translate global SPARQL query into series of sub-SQL statements and can provide users a unified, transparent and semantic query functions with on need of loading data from local data sources. The future work of WeVDC is to realize automation of creating SVR and provide certain reasoning ability.

ACKNOWLEDGMENT

This work is supported in part by the National Natural Science Foundation of China under grant No. 60373008, and

the Key Project of Chinese Ministry of Education under Grant No. 108008.

REFERENCES

[1] Archuleta, R. J., J. Steidl, “The COSMOS Virtual Data Center: A Web

Portal for Strong Motion Data Dissemination,” Seismological Research

Letters, 2006, pp. 651-658

[2] King, G. “An introduction to the Dataverse Network as an infrastructure

for data sharing,” Sociological Methods & Research, 2007, pp. 173-199

[3] Y. Wangcheng, J. Wangzong, ZH. Zhangde, “Research on Information

Integration Platform for Gas Field Based on Virtual Data Center,”

Application research of computers, 2006, pp. 54-56.

[4] BERNERS-LEE T, HENDLER J, LASSILA O, “The Semantic Web,”

Scientific American Magazine, 2001, pp. 34-43

[5] CARAGEA D, PATHAK J, BAO J, “Information Integration and

Knowledge Acquisition from Semantically Heterogeneous Biological

Data Sources,” Proc. the Data Integration in Life Sciences. Berlin:

Springer-Verlag, 2005, pp. 175-190

[6] CHEN H, WANG Y, WANG H, “Towards a Semantic Web of Relational

Databases: a Practical Semantic Toolkit and an In-Use Case from

Traditional Chinese Medicine,” Proc.the 5th International Semantic Web

Conference. Berlin: Springer, 2006, pp. 750-763

[7] DUSCHKA O M, GENESERETH M R. “Answering recursive queries

using views,” Proc. 16 ACM Sigact Sigmod Sigart Symp, 1997, pp.109-

116.

35