a framework for web service discovery based on ontology similarity

A Framework for Web Service Discovery Based on Ontology Similarity

Lianjie Chen School of Computer Science & Engineering

Southeast University Nanjing, PR. China

[email protected]

Lei Xu State Key Laboratory for Novel Software Technology

Nanjing University Nanjing, PR. China

[email protected]

AbstractWith the rapid growth of published Web services, matchmaking of relevant Web services becomes a significant challenge. In this paper, we present a novel framework based on genetic algorithm and domain-ontology for semantic Web Service discovery. We calculate the similarity between any two abstract models of web services, and use a genetic algorithm-based optimization procedure which relies on a given sample data to achieve the combination of the model parameters for best performance. The results show the high-values of precision and recall in our framework, especially when the ontological matching is involved.

Keywords- Web Service Discovery; Semantic Similarity; Genetic algorithm; Ontology

I. INTRODUCTION Currently, there are mainly two kinds of strategies for

Web Service discovery depending on the way of Web Service descriptions [1]: users search a particular Web Service based on keywords or predefined taxonomies, such as Universal Description, Discovery and Integration (UDDI); or users calculate the semantic similarity, when Web Services are described in semantic languages.

The truth of imperfections in matching with key words or predefined taxonomies has been proved by more and more researchers [2, 3, 10, 11, 13, 14]. Recently, there has been a significant attention drawn on the semantic similarity. But a stationary discovery model should be deficient in adaptability. In our work, we design a method using the semantic Web Service discovery based on genetic algorithm, and we also address a special model used to evaluate the similarity of ontological concepts. The discovery framework auto-assigns the weights of each similarity factor based on certain specimens. This infrastructure guarantees the high-efficiency in the discovery for Web Services.

This paper is organized as following: Section 2 addresses the related work about the state-of-the-art discovery methods for Web Services. Section 3 describes the overview of our framework. Section 4 presents several similarity calculating methods. Section 5 describes the experiment process. Finally, Section 6 gives some discussions and remarks in future.

II. RELATED WORK Mainly, there are two strategies used for Web Service

discovery: WSDL-based and Ontology-based. Paper [13] summarized the Web Service discovery methods in Figure 1.

Figure 1. Summary of Web Services Discovery Methods.

Paper [1] proposed an UDDI Registry By Example (URBE) algorithm to evaluate the similarity of Web service interfaces based on WSDL, but the process of parameters tuning lacks of rational supervision. Paper [2] presented a method in a Multi-Ontology and Federated Registry Environment. Paper [5] presented an approach in ontology matching in semantic Web. Paper [6] used the genetic algorithm in ontology matching. For more extensive research, Papers [7, 12] presented P2P technology used for Web Services discovery, and the central registry is not required. Similarly, Paper [8] addressed a crawler engine for Large-Scale discovery of Web Services, and it explores Web services across multiple UDDI Business Registries. And Paper [9] suggested the personal opinion on services functionality, quality or invocation cost should be considered by collaborative tagging system. Generally, the main problems of these methods are lack of flexibility, or too difficult to put into practice.

III. OVERVIEW OF OUR FRAMEWORK Our framework is designed for better performance as

well as practicability, and Figure 2 describes the structure of our framework.

Firstly, the model requires some utility functions for auxiliary. Secondly, Similarity for texts based on a certain Word Corpus is used to calculate the similarity between the descriptions of Web Services. Thirdly, Similarity for names refers to the Word Net and Word Corpus responsible for the similarity of names. Fourthly, Similarity for concepts stands for the semantic similarity. Finally, the Genetic Algorithm

2010 Fifth IEEE International Symposium on Service Oriented System Engineering

978-0-7695-4081-8/10 $26.00 2010 IEEEDOI 10.1109/SOSE.2010.30

197

model is used to tune the parameters which decide the contribution of each portion.

Figure 2. A Framework for Web Service Discovery.

IV. SIMILARITY CALCULATING METHODS The key components of our framework are the similarity

calculating methods based the abstract model of Web Service, including the similarity of texts, names, concepts, and also the similarity for the structure of the entire Web Service. In addition, we also implement an algorithm named MaxFunction, which is used for similarity calculation.

A. Abstract Model of Web Service Formally, the definition of abstract model of Web

Service which stands for the structure of semantic Web Service can be described as following:

Definition 1 ASM = {DT, WSN, OPSet {OPN, INPS {PN, C}, OUTPS {PN, C}}}.

Where DT: Textual description of the Web Service. WSN: Name of the Web Service. OPSet: Set of operations of the Web Service. OPN: Name of the operation. INPS: Set of inputs of the operation. PN: Name of a parameter. C: Ontological concept of a parameter. The OUTPS is similarly defined as INPS.

This ASM structure is the basis of our similarity calculating method, and our main job is to build a similarity framework for these ASM structures.

B. MaxFunction The calculation of similarity for name and properties of

concepts presented below relies on a special maximization function named MaxFunction, which explores the weighted bipartite graphs problem to fetch the maximum match.

Normally, the Kuhn-Munkres Algorithm [4] (Kuhn, 1955; Munkres, 1957) is the traditional algorithm applied in Perfect Matching of Bipartite Graph. But in our work, we have to get a new method described as following:

Definition 2: MaxFunction InPut: Similarity sim; Set set1; Set set2; OutPut: double maxValue

Set minLength=min(set1.length, set2.length); Set wMatrix=sim(set1.elementi,set2.elementj). Set location=new int[set1.length]; Set sumList=new ArrayList();

Process:maxMatch

Begin Match(0); return max(sumList)*2/(set1.length+set2.length);

End; Process:Match(x) Begin

If x is the last row and exists minLength edges selected Add the sum value to sumList; Return; For each column index do Begin

If no conflict Add the index to location Array; Match(x+1); //Recursion

End; End

From this definition, we can see that the core function is Match(X) and it is a recursive process. Actually, the subject of this algorithm is to find all max-matching based a matrix(M*N), and each matching keep the rule of the number of elements selected from each row and each column is 0 or 1. Then return 2*max/(m+n) as the final value.

C. Similarity of Texts Usually, the textual description of Web Service involved

in owl-s files shows principal implications of the service, so a similarity of texts is essential to our method.

Here we calculate the similarity of texts base the Term Frequency (TF). Firstly, we split the sentences of description to terms, and stem the terms based on Word Corpus, and then we set weights for each word according to TF. Namely, we assume that the description text is split as following:

W= {word1, word2, , wordn}. Then for each wordi, the weighti is defined as:

i1

Weight 1/ ( {1| })N

jwordi wordj

=

= = After that, we can establish the Vector Space

Model(VSM) for the comparing texts. And the cosine similarity can be the value of similarity of texts.

D. Similarity of Names The similarity of names is used frequently in this

framework. This process contains three phases. Firstly, we take the tokenization process for the textual

names, and we eliminate all kinds of special chars, such as -,_, , digits, and so on.

Then, we present a method for calculating the similarity between the related two terms, which comprises Syntactic Similarity and Semantic Similarity.

Definition 3: SimTerm(t1,t2) = (1-)*SimSyntacticTerm(t1,t2) + *SimSemanticTerm(t1, t2), [0,1] Where

( )( )

SimSyntacticTerm t1, t2 (max(| 1|,| 2 |)

Levenstein t1, t2 ) / max(| 1|, | 2 |)

t t

t t

=

198

( )( ) ( )

( /2)

SimSemanticTerm t1, t21, t1 synonyms t2 or t2 synonyms t1 .

, ( ( 1), ( 2))3

0, .

Le L dist synonyms t synonyms tand Lelse

=

=

get a solution for P, R and F values are all greater 0.95 or 200 generations reached, the genetic process ends.

For more reasonable point, we repeated the GA process for six times, and took the average values as the tuning result just like Table 1.

We can realize that it is possible to get a combination of parameters for high performance. In fact, we repeat this process for many times, and compute the average values, the result shows the stability for each time.

TABLE I. PARAMETER TUNING

Parameter Time-1 Time-2 Time-3 Time-4 Time-5 Time-6 Average Wtext 0.51 0.17 0.61 0.23 0.49 0.15 0.36 Wop 0.49 0.83 0.39 0.77 0.51 0.85 0.64

Wdesc 0.43 0.00 0.54 0.72 0.00 0.75 0.41 Wname 0.57 1.00 0.46 0.28 1.00 0.25 0.59

Wopname 0.55 0.12 0.84 0.10 0.46 0.06 0.36 Wparam 0.45 0.88 0.16 0.90 0.54 0.94 0.64

Wparamin 0.70 0.65 0.04 0.84 0.90 0.99 0.69 Wparamout 0.30 0.35 0.96 0.16 0.10 0.01 0.31

Wparamname 0.83 0.03 0.75 0.56 0.84 0.00 0.50 Wparamconcept 0.17 0.97 0.25 0.44 0.16 1.00 0.50

Wdist 0.41 0.41 0.73 0.36 0.08 0.88 0.48 Wprop 0.59 0.59 0.27 0.64 0.92 0.12 0.52

0.96 2.0 1.06 0.10 0.28 0.07 0.74 0.75 0.58 0.92 0.93 0.30 0.27 0.63

C. Case Study In this section, we provide the evaluation result from our

experiments with the combination of parameters generated from genetic process described in section 5.2. To test the matching method performance, we carry out 9 queries randomly according to two test cases: textual matching alone and ontology matching involved.

The main criterion of performance adopted is the average measures of Precision, Recall and F. The result is shown in Table 2.

From Table 2, we can see that the two cases both achieve better results, precision especially. But the ontology matching case shows better efficiency since each query services get higher values than the case of textual matching alone.

TABLE II. PARAMETER TUNING

Number QueryWebService Textual matching alone Ontology matching involved P R F P R F

1 Comedy Film finder service 0.93 0.75 0.83 0.93 0.79 0.85 2 BookShopping 0.90 0.73 0.81 0.95 0.73 0.83 3 HotelInfoService 0.93 0.67 0.78 0.96 0.69 0.81 4 AvailableVideoService 0.93 0.85 0.89 0.94 0.92 0.93 5 AvailableColaService 0.88 0.78 0.82 0.88 0.83 0.86 6 SurfingDestinationService 0.83 0.60 0.69 0.92 0.70 0.80 7 HikingSurfingDestination 1.00 0.52 0.68 1.00 0.57 0.73 8 NovelAuthorService 0.86 0.65 0.74 0.88 0.75 0.81 9 UniversityLecturerService 1.00 0.76 0.86 1.00 0.86 0.92

10 Average 0.92 0.70 0.79 0.94 0.76 0.84

D. Analysis and Conclusion Our experiments contain two phases. The first one is the

parameter tuning, from which we may draw some deductions as following:

1. The weight of textual similarity is lower than the weight of operation parts. This is a truth that the operation parts stands for the function of a Web Service.

2. The contribution of description in service interface is poor, that is because the description is not abundant in most interfaces.

3. The similarity of ontological concepts takes an important part in the framework.

The second one is the case study on performance based the criterion of P, R and F. In this study, we can realize the superiority of similarity for ontology matching, as the average values of performance from Ontology matching is better than that from Textual matching.

VI. DISCUSSION AND FUTURE WORK Service discovery is a key aspect in the SOA research

community. In this paper, we have presented a practical and adaptive Web Service discovery framework based on ontology similarity. When the framework was built, the web service requestor, after submitting the interface of the desired web service, can obtain a list of similar Web services in order of similarity.

200

For future work, we are looking for best mechanism applied in this framework in order to enhance the precision and recall of the matchmaking performance. For example, we will consider the QoS as a part of the Web Service interface, as the quality of Service is essential to most Service consumers.

ACKNOWLEDGMENT This work is partially supported by the National Natural

Science Foundation of China (NSFC) (No. 60873050), Opening Foundation of State Key Laboratory of Software Engineering in Wuhan University (SKLSE20080717), and Opening Foundation of State Key Laboratory for Novel Software Technology in Nanjing University (ZZKT2008F12).

REFERENCES [1] Pierluigi Plebani and Barbara Pernici, URBE: Web Service Retrieval

Based on Similarity Evaluation IEEE Transactions on Knowledge and Data Engineering, 2009, pp.1629-1642.

[2] Kaarthik Sivashanmugam; Kunal Verma; Amit Sheth, Discovery of Web services in a federated registry environment, Proceedings of the IEEE International Conference on Web Services (ICWS 2004), 2004, pp.270-278.

[3] Myo-Myo Naing, Ee-Peng Lim, Dion Hoe-Lian Goh, A Survey of Ontology-Based Web Annotation, Proceedings of the 1st International Conference on Computer Applications, 2003, pp.113-123.

[4] Yee Seng Chan,Hwee Tou Ng, MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation, Proceedings of ACL-08: HLT, 2008, pp. 5562.

[5] Rubo Zhang, Ying Wang, Jing Wang, Research on Ontology Matching Approach in Semantic Web, International Conference on Internet Computing in Science and Engineering, 2008, pp.254-257.

[6] Junli Wang, Zhijun Ding, Changjun Jiang, GAOM: Genetic Algorithm based Ontology Matching, Proceedings of the IEEE Asia-Pacific Conference on Services Computing, 2006, pp. 617-620.

[7] Fuyong Yuan, Jian Liu, Chunxia Yin, Yulian Zhang, A Novel Methodology for Web Services Discovery in Gnutella-like Networks, Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, 2008, pp.231-238.

[8] Eyhab Al-Masri and Qusay H. Mahmoud, WSCE: A Crawler Engine for Large-Scale Discovery of Web Services, IEEE International Conference on Web Services, 2007, pp. 1104-1111.

[9] Uddam CHUKMOL, Acha-Nabila BENHARKAT, Youssef AMGHAR, Enhancing Web Service Discovery by using Collaborative Tagging System, IEEE 4th International Conference on Next Generation Web Services Practices, 2008, pp.54-59.

[10] Colin Atkinson, Philipp Bostan, Oliver Hummel and Dietmar Stoll, A Practical Approach to Web Service Discovery and Retrieval, IEEE International Conference on Web Services, 2007, pp. 241-248.

[11] Shen Derong, Yu Ge, Cao Yu, Kou Yue, Nie Tiezheng, An Effective Web Services Discovery Strategy for Web Services Composition, Proceedings of the 2005 The Fifth International Conference on Computer and Information Technology,2005, pp. 257-263.

[12] Mohamed Gharzouli, Mahmoud Boufaida, A Generic P2P Collaborative Strategy for Discovering and Composing Semantic Web services, 2009 Fourth International Conference on Internet and Web Applications and Services, 2009, pp.449-454.

[13] Chen Wu and Elizabeth Chang, Searching services on the Web: A public Web services discovery approach, Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, 2008, pp.321-328.

[14] Jian Wu, Zhaohui Wu, Similarity-based Web Service Matchmaking, Proceedings of the 2005 IEEE International Conference on Services Computing, 2005, pp. 287-294.

201

a framework for web service discovery based on ontology similarity

Documents