[ieee 2011 ieee 8th international conference on e-business engineering (icebe) - beijing, china...

8
A Dominance-Based Approach for Selection in Service Composition Chaogang Tang 1,2,3 , Qing Li 2,3 , Yan Xiong 1,3 , An Liu 1,2,3 , and Shiting Wen 1,2,3 1 Department of Computer Science and Technology, University of Science & Technology of China, Hefei, China 2 Department of Computer Science, City University of Hong Kong, Hong Kong, China 3 Joint Research Lab of Excellence, CityU-USTC Advanced Research Institute, Suzhou, China [email protected] , [email protected] , [email protected] , [email protected] ,[email protected] Abstract—Considering the dynamics of Web service environments and concurrence of requests, we propose a service selection scheme, which adopts top-k dominating queries to generate a composition solution rather than only selects the best composition solution for a given request. The experimental results have investigated efficiency and effectiveness of our approach and shown that it outperforms baseline and traditional methods for service selection. Keywords—dynamics; top-k; dominating; concurrency; service selection I. INTRODUCTION Web services are platform-independent, loosely coupled, self-contained, programmable web-enabled applications that can be described, published, discovered, coordinated, and configured using XML artifacts (open standards) for the purpose of developing distributed interoperable applications. Due to the wide application of SOA, large quantities of Web services have been deployed over the Internet or the Web for service users. As the requirements from service users become more and more complicated, an atomic service often can not handle a request by itself due to the limited capabilities. This calls for composite services to combine existing distributed atomic services, so as to perform the complicated functionalities. The selection of component services (atomic service) for a composite service is vital yet, at the same time, also very arduous. This is because on one hand, the number of alternative Web services which provide the same functionality but differ in quality parameters, is growing quickly; on the other hand, inappropriate selection of component services can result in the degradation of a composite service in terms of the quality of web service (QoWS). For example, for a composite service that contains a parallel structure, the response time of the composite service is up to the component service having the maximum response time rather than other ones within the parallel module. Therefore the service selection in the parallel module is to make sure that the response time of the component service with the maximum response time is as small as possible. If we take into account the dynamics of Web service environments, the service selection will become even more complicated. This is because a current-best candidate service for a component service may not be the best one at a later time. In the dynamic Web service environments, the performance of the Internet is unpredictable; the reliability and effectiveness of remote Web services are also unclear. Therefore, it can not be guaranteed that the QoWS attributes of Web services do not fluctuate with the dynamic Web service environments. Specially, we list three reasons for this dynamic fluctuation: continuous responses to service requests. The overload of service server will give rise to the fluctuation of the overall performance of provided service, and usually this fluctuation turns into QoWS degradation (e.g. longer response time); network performance. Some QoWS attributes (e.g. response time) may vary or subject to the quality of network (e.g. long latency, network disconnection, etc.); intentional behaviors. For example, service providers sometimes do not deliver their claimed QoWS, so as to reduce cost or for other purposes. In this paper, we focus on the former two reasons that lead to QoWS fluctuation. One serious consequence that results from the aforementioned QoWS degradation is the increase of response time. As an important metrics of QoWS criteria, the growth of response time, which means that service users need to wait for a longer time than they expect, may even result in an unacceptable completion time, therefore considerable losses to service users may occur (e.g., in terms of lost business revenue, time, or penalties incurred by missing contractual deadlines). We observe that in the context of dynamic service environments, few works have taken concurrency into consideration when a composite service is planned. By concurrency here we mean that a lot of requests to a composite service arrive concurrently. We illustrate this by a simple example here. Suppose that a composite service CS consists of two component services denoted as s 1 and s 2 respectively. There are three candidate services (s 11 , s 12 and s 13 ) which can perform the functionality of s 1 and two candidate services (s 21 and s 22 ) for s 2 . According to the traditional service selection method which, based on a utility function, aggregates all the values from all the dimensions with respect to QoWS into a single value. Let us assume the best composition scheme for our example is s 11 and s 21 , which means that when CS receives a request, it will employ s 11 and s 21 to execute it. So far there have been a lot of real-time systems such as real-time ticket ordering system, stock trading, and navigational driving guide, etc., which are characterized by timeliness and lots of concurrent requests. Assume for our example that there are 1000 2011 Eighth IEEE International Conference on e-Business Engineering 978-0-7695-4518-9/11 $26.00 © 2011 IEEE DOI 10.1109/ICEBE.2011.16 36

Upload: shiting

Post on 21-Mar-2017

217 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

A Dominance-Based Approach for Selection in Service Composition Chaogang Tang1,2,3, Qing Li2,3, Yan Xiong1,3 , An Liu1,2,3, and Shiting Wen1,2,3

1Department of Computer Science and Technology, University of Science & Technology of China, Hefei, China

2Department of Computer Science, City University of Hong Kong, Hong Kong, China 3Joint Research Lab of Excellence, CityU-USTC Advanced Research Institute, Suzhou, China

[email protected], [email protected], [email protected],[email protected],[email protected]

Abstract—Considering the dynamics of Web service environments and concurrence of requests, we propose a service selection scheme, which adopts top-k dominating queries to generate a composition solution rather than only selects the best composition solution for a given request. The experimental results have investigated efficiency and effectiveness of our approach and shown that it outperforms baseline and traditional methods for service selection.

Keywords—dynamics; top-k; dominating; concurrency; service selection

I. INTRODUCTION

Web services are platform-independent, loosely coupled, self-contained, programmable web-enabled applications that can be described, published, discovered, coordinated, and configured using XML artifacts (open standards) for the purpose of developing distributed interoperable applications. Due to the wide application of SOA, large quantities of Web services have been deployed over the Internet or the Web for service users. As the requirements from service users become more and more complicated, an atomic service often can not handle a request by itself due to the limited capabilities. This calls for composite services to combine existing distributed atomic services, so as to perform the complicated functionalities. The selection of component services (atomic service) for a composite service is vital yet, at the same time, also very arduous. This is because on one hand, the number of alternative Web services which provide the same functionality but differ in quality parameters, is growing quickly; on the other hand, inappropriate selection of component services can result in the degradation of a composite service in terms of the quality of web service (QoWS). For example, for a composite service that contains a parallel structure, the response time of the composite service is up to the component service having the maximum response time rather than other ones within the parallel module. Therefore the service selection in the parallel module is to make sure that the response time of the component service with the maximum response time is as small as possible.

If we take into account the dynamics of Web service environments, the service selection will become even more complicated. This is because a current-best candidate service for a component service may not be the best one at a later time. In the dynamic Web service environments, the performance of the Internet is unpredictable; the reliability

and effectiveness of remote Web services are also unclear. Therefore, it can not be guaranteed that the QoWS attributes of Web services do not fluctuate with the dynamic Webservice environments. Specially, we list three reasons for this dynamic fluctuation:

• continuous responses to service requests. The overload of service server will give rise to the fluctuation of the overall performance of provided service, and usually this fluctuation turns into QoWS degradation (e.g. longer response time);

• network performance. Some QoWS attributes (e.g. response time) may vary or subject to the quality of network (e.g. long latency, network disconnection, etc.);

• intentional behaviors. For example, service providers sometimes do not deliver their claimed QoWS, so as to reduce cost or for other purposes.

In this paper, we focus on the former two reasons that lead to QoWS fluctuation. One serious consequence that results from the aforementioned QoWS degradation is the increase of response time. As an important metrics of QoWS criteria, the growth of response time, which means that service users need to wait for a longer time than they expect, may even result in an unacceptable completion time, therefore considerable losses to service users may occur (e.g., in terms of lost business revenue, time, or penalties incurred by missing contractual deadlines). We observe that in the context of dynamic service environments, few works have taken concurrency into consideration when a composite service is planned. By concurrency here we mean that a lot of requests to a composite service arrive concurrently. We illustrate this by a simple example here. Suppose that a composite service CS consists of two component services denoted as s1 and s2 respectively. There are three candidate services (s11, s12 and s13) which can perform the functionality of s1 and two candidate services (s21 and s22) for s2.According to the traditional service selection method which, based on a utility function, aggregates all the values from all the dimensions with respect to QoWS into a single value. Let us assume the best composition scheme for our example is s11 and s21, which means that when CS receives a request, it will employ s11 and s21 to execute it. So far there have been a lot of real-time systems such as real-time ticket ordering system, stock trading, and navigational driving guide, etc., which are characterized by timeliness and lots of concurrent requests. Assume for our example that there are 1000

2011 Eighth IEEE International Conference on e-Business Engineering

978-0-7695-4518-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ICEBE.2011.16

36

Page 2: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

requests to CS arriving concurrently after CS has begun to execute a request. Based on existing service selection scheme, all the requests should be assigned to the component services s11 and s21. However, this kind of service selection scheme may not be appropriate when we take into consideration the concurrency of requests. Since the server workload for each component service is, during a successive time period, limited and prone to overload when it has to respond to lots of requests. Thus, such a service selection scheme renders two serious consequences: a) later requests will not be assigned to the current-best component service(s) as expected, due to the degradation of QoWS; b) many requests must be waiting in line, which may result in an unacceptable completion time and possibly bring about considerable losses to service users. In order to tackle the problem above, we propose in this paper an approach to achieving load balancing for service selection in service composition. Compared with the widely adopted traditional service selection scheme in which only the best candidate service in each service class is considered, our approach is to obtain koutstanding candidate services for each service class so as to avoid/reduce the overload problem. Specifically, for each incoming request q, we designate randomly one from the k solutions from each component service and further generate a composition solution to respond to q.

A top-k dominating query retrieves k services which dominate the highest number of services in an abstract service class, and it is a natural solution since this query provides data analysts with an intuitive way for finding significant services. Moreover, a big advantage (over the traditional approaches) is that it does not require any utility function or other ranking functions to be specified by users. The top-k dominating query has therefore attracted significant attention in recent years, especially in the database community [5, 4, 7, 2, 3, 16]. Given a d-dimensional data set, a point p (p1,…,pd) dominates another point q (q1,…,qd), if and only if ∀ i∈ [1, d], pi qi.ζ A service can also be regarded as a d-dimensional point in terms of QoWS attributes, so we can apply this top-k dominating query technique for our purpose in the context of Web services. In this paper, we present algorithms to retrieve top-k services, and then conduct extensive experiments to validate our approach. The experimental results show that our approach outperforms the baseline and traditional approaches.

The rest of the paper is organized as follows: Section 2 surveys some related works. Section 3 introduces some preliminaries of spatial indexing and formally describes the top-k dominating query of Web services. Section 4 presents our approach to retrieving the top-k dominating services, followed by the experimental reports in Section 5. The conclusion and future work finally come in Section 6.

ζ We use to denote better than and to denote better than or equal to.

II. RELATED WORK

In this section, we discuss some related works in the areas of Web service, top-k dominating queries and skylines. Skyline provides a unique perspective on dominance relationship and it totally depends on the intrinsic characteristics of the data. Given a d-dimensional data set, a point p is a skyline if and only if p is not dominated by other points in the data set. Skyline queries, which gather together all special data objects that are not dominated by each other in a data set, have received significant attention over recent years. The first general efficient skyline algorithm BNL (block-nested-loop) was proposed by Börzsönyi et al. [8]; BNL scans over the data set by comparing each object with every other object, and retrieves a data object only if this object is not dominated by any other object in the data set. The SFS (sort-filter-skyline) [10] and LESS (linear elimination sort for skyline) [11] algorithms are later proposed in order to improve the efficiency of BNL. Other algorithms like [12, 13, 14, 15, 19] are also proposed to compute the skylines. Meanwhile, the top-k dominating query has also played an increasingly significant role in applications such as multi-criteria decision making, data cleaning, and data integration [9] and so on. However, the top-k dominating queries are a little different from the skyline queries. Given a data set, the data object in a top-kdominating queries is not always a skyline of this data set, but it is guaranteed that the top-1 dominating data object must be a skyline. This is because a data object which dominates the largest number of data objects must not be dominated by any other data object, so it must be a skyline. Fig.1 shows an example where nine data points are located in a 2-dimensional space. Assume that a point p dominates another point q when p has smaller coordinates than q in both dimensions. We can observe that p1 and p3 in Fig. 1 are skylines, and they dominate seven and five points respectively. However, given a top-2 dominating query, we can also observe that p1 and p2 are the top-2 dominating points instead of p1 and p3, for the reason that p1 obviously dominates the largest number of points and p2 dominates the second largest number of points (i.e. six points) more than the number of points dominated by p3 (i.e. five points).

Most of the existing works have so far concentrated on addressing how to retrieve the top k dominating data objects [2, 3, 4, 7, 16]. The authors in [16] proposed a skyline-based top-k dominating algorithm (STD) for top-k dominating queries, where the data is indexed by R-tree [17]. It first retrieves the skylines of data set, and then outputs the skyline (say, o) which dominates the largest number of data objects in this skyline set as the top-1 dominating data object. After that it removes o from the data set and repeats the above process for k times.

Web services as a key technique for implementing service-oriented architecture (SOA) has been attached with more and more importance. With the rapid development of Web services, quite a few services are emerging, which provide the same functionality but may differ in QoWS. Therefore, subsequent works have focused on how to select the desired service from all functionality-similar services. In

37

Page 3: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

[1], a framework to evaluate the QoWS from a vast number of Web services is constructed, with an aim to enable quality-driven Web service selection. The proposed QoWS computation model includes not only generic criteria (e.g. price, response time, and availability, etc.) but also domain specific criteria which vary with different application backgrounds. The work in [6] mainly focuses on the trust level between service users and service providers. For the reasons of lacking enough experiences for the service users to obtain the best selection of Web services according to its QoWS, and that no guaranteed level of QoWS can be offered by service providers, they introduce a third party certification entity to verify the conformity of the trust level as well as the consistency of the QoWS claimed by service providers. The authors of [23, 24] propose to employ skyline and top-kdominating techniques respectively to tackle the drawback of the traditional method, namely, to require service users to assign weights to each QoWS attribute. However, all the above works have considered neither the nature of the Web service environments (i.e. dynamics) nor concurrence.Although the work [24] has taken into consideration the dynamics of the Web services, their judgment on whether a service provider S can currently provide a better service than another provider T is implicitly based only on the corresponding transactions occurred in the past. Such an implication, however, does not always hold especially in the very dynamic Web service environments.

Figure 1. Comparison of skylines and top-k dominating queries

III. PRELIMINARIES

In this section, we firstly discuss some foundations on the data structure to be used in this paper for multi-dimensional spatial access, and then formally describe the top-kdominating query of Web services.

A. Data Structure for Multi-dimensional Space Indexing Given an abstract service class S with n services which

have the same functionality, we need to retrieve the top-kdominating services. Obviously, a naïve way to obtain these services is as follows. For each service si in S, compare it with the remaining services one by one in S, and then obtain the number of services dominated by si. Clearly, this nested loop method is very expensive in terms of both CPU and I/O’s, since it requires quadratic cost with regards to the service class size |S|.

A service can be viewed as a d-dimensional data point in terms of QoWS attributes. In order to quickly obtain the top-

k dominating services, we adopt a multi-dimensional index aR-tree [18, 20, 7] over d-dimensional services in a service class, which can provide faster access to data points than the sequential scan. R-trees have attracted significant attention these years, because they provide effective access methods for multi-dimensional data and for processing spatial queries, such as range queries and skyline queries. The aR-tree,which evolves from R-tree, augments to each non-leaf entry of the R-tree an aggregate measure of all data points in the subtree pointed by it. Fig. 2 shows an example of an aR-treein a 2-dimensional space, where each non-leaf entry records the number of data points contained in the subtree pointed by it. For instance, the entry e1 in the root node contains four data points tagged with the number “4” around the entry and entry e3 contains two data points. Through the aR-tree, we can effectively and efficiently prune the search space. Assume that given a query Q, we want to find out the number of all data points covered by Q. Firstly, we search the root node and find that entry e1 intersects Q and e2doesn’t. Then we further search the subtree pointed by e1 and prune away the subtree pointed by e2 for the reason that any data point indexed by the subtree of e2 can not be covered by Q. Finally, we find that both entries e3 and e4 are spatially covered by Q, so we can conclude that the data points contained by e3 and e4 must also be contained by Q. The answer (i.e., 4) to the query can at last be returned by using the aR-tree directly, without requiring comparing each data point for Q.

Figure 2. An example aR-tree

Before proceeding further, we go through some notations which will be used later. For an aR-tree entry ei, (i.e. a minimum bounding rectangle (MBR) [17, 7]), we represent it in j-th dimension using the interval [LB(ei, j), UB(ei, j)],where LB (ei, j) and UB (ei, j) are lower and upper bounds of ei in the j-th dimension, respectively. We can obtain LB (ei, j)and UB (ei, j) from the projection of ei in the j-th dimension. Therefore, the lower and upper bounds of ei can be represented as:

LB(ei)=( LB (ei, 1), LB (ei, 2), …, LB (ei, d)) UB(ei)=( UB (ei,1), UB (ei,2), …, UB (ei, d))

Note that both LB(ei) and UB(ei) are virtual (conceptual) data points, which means they do not correspond to actual data points. However, through the lower and upper bounds of an entry, we can better understand the dominance relationship among data points and MBRs. Let us use Fig. 3 (in a 2-dimensional space) as an example. For the sake of consistency and clear presentation, we make the same

38

Page 4: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

assumption as that in Fig.1. Therefore, for each entry, the lower bound dominates the upper bound, i.e., LB(e1) UB(e1), LB(e2) UB(e2), and so on. As can be seen from Fig.3:

a) If some data point p dominates the lower bound of an entry e (i.e. p LB(e)), p must dominate all data points indexed under e. For instance, since p1 LB(e2), p1 must dominate p2 and p3 which are indexed under e2.

b) If some data point p dominates the upper bound but does not dominate the lower bound of an entry, p then may not dominate all data points indexed under that entry. For instance, since p3 dominates UB(e2) and p3 does not dominateLB(e2), p3 can dominate p6, but p3 can not dominate p2.

c) If some data point p is dominated by the upper bound of entry, all the data points indexed under e dominate p. For example, since UB(e3) dominates p5, p4 dominates p5.

d) There exists no dominance relationship between entry and data point. We use to denote this no-dominance relationship, so we have p6 UB(e3) (or LB(e3)).

Figure 3. Illustration of Dominance relationship

B. Problem Formulation Let S be an abstract service class with n services, where

each service has d attributes with regards to QoWS. We use si.attrk to denote the k-th dimensional value of QoWS for service si such that si can be represented as si=<si.attr1,si.attr2,…, si.attrd>.

DEFINITION 1: (Service Domination). A service sidominates another service sj if and only if si.attrk sj.attrkfor 1 k d and there exists at least one dimension (1d) such that si.attr > sj.attr .

Please note that the QoWS parameters have different taxonomies from different viewpoints. In [20], the authors distinguished two types of QoWS parameters: positive and negative. For a positive QoWS parameter, a higher value means a higher quality. For a negative QoWS parameter, a higher value means a lower quality. For example, reliability is positive while response time is negative. In order to make the comparison in Definition 1 unambiguous and uniform, we need to normalize the QoWS parameters. Given a service with QoWS parameters, we convert the negative parameters into the positive parameters by using the inverse of the negative parameters. For example, in terms of response time (negative parameter), an service s1 with response time 100ms

is better than another service s2 with response time 200ms, since the response time of s1 is smaller than that of s2; after conversion, s1 is still better than s2 with regards to response time, since the response time of s1 (i.e. 1/100 ms) is larger than that of s2 (i.e. 1/200 ms).

PROPERTY 1: If a service si dominates another service sjand sj dominates sk, then si dominates sk. That is, if si sj and sj sk, then si sk.

This property is obvious, so the proof is omitted. However, it is very powerful for pruning the search space.

DEFINITION 2: (Domination Score). Given a service s, we define the dominating score of s, denoted by (s), as:

(s) =|{s’ | s s’ , s’ and s belong to the same service class}|

In order to retrieve the top-k dominating services, we only need to obtain k services, whose dominating scores are larger than those of the remaining services. We will detail how to calculate the dominating scores in the next section.

PROPERTY 2: Given two services s and s’, if s dominates s’, then the dominating score of s is larger than that of s’.That is, s s’ (s) > (s’).PROOF: According to Property 1, s will dominate any service which is dominated by s’ if s dominates s’. In other words, the dominating score of s is at least larger than or equal to that of s’ plus one, since s dominates s’ meanwhile. Hence,

(s) > (s’). However, if (s) > (s’) holds, it does not mean that s must dominate s’. Consider points p2 and p3 in Fig.1 for example, though (p2) (i.e. 6) is larger than

(p3)(i.e. 5), p2 does not dominate p3.

IV. COMPUTING TOP-K DOMINATING SERVICES

In this section, we present aR-tree based algorithms for efficiently computing the top-k dominating services. Assume that the services have already been indexed by an aR-tree, in which all the entries in the leaf nodes record the references to the services.

A. Calculating (s) As explained in Section 3.1, the naïve method is

prohibitive due to quadratic cost with regards to the service class size |S|, in terms of both CPU and I/O’s. In our approach, we try to avoid comparing each service to all other services by using aR-tree. For the example in Fig.3, if we know that p1 LB(e2), then there is no necessity to further check the dominance relationships between p1 and the data points indexed under e2, because it is out of question that p1dominates all data points indexed under e2, according to the lower and upper bounds of entry (cf. section 3.1).

Recall that there are four cases of dominance relationships between an entry e and other services (ref. section 3.1). In our algorithm, we aim to find out the services which dominate LB(e) (assume that LB(e) UB(e)) when

39

Page 5: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

entry e is examined. As for those services which are dominated by UB(e), the dominating scores of them can not be calculated with the help of e. Initially, Z is set to the root node of the tree and (s) is set to 0 for each service s∈ I. Let e be the current entry in Z to be examined. For each service s∈ I, we traverse the aR-tree to find out how many services indexed by the aR-tree are dominated by s (i.e. (s)). If e is a non-leaf entry (line 3) and the current-being-examined ssatisfies that: s LB(e), we then increase the dominating score of s by COUNT(e) which is the number of services indexed under e (lines 4-5). If e is a non-leaf entry and the dominance relationship between e and s belongs to one of the two cases where (1) s UB(e) and s LB(e), or (2) s UB(e) and LB(e) s, then s may dominate some services indexed under e. However, it can not guarantee that s can dominate all services indexed under e, since there does not exist dominance relationship between s and LB(e) in (1), and even LB(e) s in (2), there are still not exact dominance relationships between s and other services indexed under e.Take Fig. 3 for example. Let s be p1 and e be e6, respectively. We can see that s UB(e) and LB(e) s, but we still do not know the exact dominance relationship between s and p2 (orp3) without comparing them. Therefore, it is not easy for us to immediately decide the number of services in e dominated by s. In this situation, we need to invoke the algorithm recursively on the child nodes pointed by e (lines 8-9). At last, if Z is a leaf node (i.e., e is a leaf entry), we just need to increase the dominating score of any service that dominatesLB(e) by COUNT(e) (lines 12-15).

1.2.

Algorithm 1 Dominating Score Algorithm ( ) Dominating Score (Node , service set )

services

entries

algorithm

for all dofor all

DSAZ I

s I∈

3.4. 5.6.7.8.

is non-leaf node

( ):= ( ) ( ); ( ) ( ) or ( )

doif then

if then

elseif then

e ZZ

s LB(e)s s count e

s UB e s LB e LB(e) s UB e

ψ ψ

+

Λ

9.10.11. 12.13.14.

read the child node pointed by ; Dominating Score( , );

end ifend if

elseif then

Z' eZ' I

s LB(e)

15.16.17.18.

( ):= ( ) ( );

end ifend if

end forend for

s s count eψ ψ +

DSA can correctly calculate the dominating scores for all services s∈ I at a single traversal of the tree. Suppose that component services are already indexed by an aR-tree as denoted in Fig.3, where the data points represent the services. The service set I is composed of p1, p2, and p3. Now

we calculate the dominating scores of the services in I step by step. According to DSA, for p1 entry e1 should first be examined, and then e3 is next, for the reason that p1 UB(e1)and p1 LB(e1). Because p1 dominates LB(e3), the dominating score of p1 (i.e. (p1)) is increased by COUNT(e3) (i.e. 2). Then e4 is examined, and (p1) is increased by COUNT(e4) since p1 also dominates LB(e4). When e2 is examined, DSA will recursively examine its child node entries, i.e. e5 and e6 respectively, due to that p1 UB(e2) and p1 LB(e2). When e6 is examined, since LB(e6) p1 UB(e6), DSA will examine the child node pointed by e6. The child node is a leaf node, so the dominance relationships between p1 and other services (i.e., p2 and p3) are examined by comparing p1 with them. The situation can be handled similarly when the entry e5 is examined. Therefore, the dominating score of p1 can be counted correctly. Finally, p2 and p3 also can be counted correctly in the same way as p1 does.

B. Retrieving the Top-k Dominating Services We now present another algorithm based on DSA to

retrieve the top-k dominating services. Note that we can obtain the top-k services by only using DSA, because after the score for each service is calculated, we just need to select the k services whose dominating scores are larger than others’. However, if the number of services is very large, say, a thousand services in I, it is quite time-consuming to calculate the dominating scores, as it needs to exhaustively check all services in I each time (line 7 and 20). Besides, we may need some appropriate sorting algorithm to sort the calculated services for efficiently looking up the top-kservices, which, however, incurs extra computational overheads.

Based on this observation, we present below a depth-first retrieving (DFR) algorithm, as shown in Algorithm 2, with an aim to greatly reduce the number of services in I when calculating the dominating scores. In particular, we adopt the idea proposed in [4] by using the data points which have been pruned to further prune the search space, and combine it with DSA to retrieve the top-k services. When traversing the entries from the root node of the tree denoted by “R.root”in Algorithm 2, we employ a max-heap H to store the dominating scores of the lower bounds of entries in the current node (lines 3-7). By doing this, we can preferentially search and calculate the services which dominate more services than others do (lines 9-10). In order to further prune the search space, a list [4] is maintained, which stores the pruned services and their corresponding dominating scores. Specially, the services in are not dominated by each other. If there exists some service l in , such that l dominates the lower bound of entry e, we then do not need to check the services indexed under e, and can thus abandon e directly without putting it into H. This is because according to Property 1, all the services indexed under e are dominated by l. When extracting the top entry from H and starting to traverse it, we employ another condition to control the repeat (lines 8), in addition to checking whether H is empty. The variable ρ represents the k-th highest dominating score

40

Page 6: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

1.2.

Algorithm 2 Depth-first Retrieving Algorithm ( ) Depth-first retrieving algorithm (Node )

:= ; : ; : ; : 0; DSA(

algorithmDFR

Z

H hR.roo

ρΦ = Φ = Φ =

3.4.5.6.7.8.

, { | . }); entries , enheap( , < , >); | |>0 and 's top entry's score>

for all doif then

end if end for while

t LB(e) e R roote R.root

s s LB(e)H e (e)

H H

ψ

ρ

∈∈

9.10.11.12.13.

:=deheap( ); read the child node pointed by ; is non-leaf node DSA( , { | }); entries

do

if then

for all

e HZ e

ZR.root LB(e') e' Z∈

14.15. 16.17.18. 19.20.

, enheap( ,< , >); DSA( ,{ | });

do if then

end if end for

else

e' Zl l LB(e')

H e' (e')

R.root s s Z

ψ

∈∈

21. | |< 1 22.23.24. | |:= 125.

enheap( ,< , >);

for all doif then

else if then

h k

h k

s Z

h s (s)ψ−

26.27.28. 29.30.

enheap( ,< , >); : s top entry's score; deheap( );

elseif then

h s (s)h'

(s)h

ψρ

ψ ρ

=

>

31.32. 33.34.

enheap( ,< , >); : s top entry's score; sevices

else for all do

h s (s)h'

l

ψρ =

35.36.37.38.39.

, add ; ,

if then

end if end forif then

l l ss

l l s l

∈→

∃ ∈ Λ

40.41.42. 43. 44.45.

remove from ;

;

end if end if

end forend if

end whilereturn

l

h

which has been found so far. If the top entry’s dominating score from H is lower than ρ , we can stop extracting entries from H. At last we maintain a min-heap h that stores the top-k dominating services. Note that h, and ρ are updated in time after the indexed services are traversed (lines 20-42).

V. EXPERIMENTAL EVALUATION

In this section, we demonstrate the efficiency and effectiveness of our approach by conducting extensive experiments. We run our experiments on a PC with 2.2GHz Intel Pentium Duo2 CPU, 2048M of RAM, Microsoft Windows XP Operating System, J2SKD 1.6. It is worth noting that the aRtree we use to index services is completely dynamic [17], which means that inserts and deletes can be intermixed with search and no periodic reorganization is required. All the services can be pre-indexed by aRtree structure before they start responding to any requests.

A. Response Time vs. No. of Requests The QoWS attributes of services may sometimes degrade

according to Eq.(1) in Section 1, due to the overload of servers and dynamics of Web service environments. We have conducted the first experiment to confirm this. In this experiment, we examine the performance of a general real-life service with regarding to its response time under different number of concurrent requests.

Fig.4 shows the variation of the average response time for a service with different number of requests. Given a request, the normal (advertised) response time of the service we choose in this experiment is within and around 100ms. As can be seen in Fig.4, the average response time of the given service approximates in a linear fashion as the number of concurrent request increases.

0 20 40 60 80 1000

45

90

135

180

AV.

Res

pons

e Ti

me

(ms)

# of requests

S_100ms

Figure 4. The average response time vs. number of requests

B. Performance analysis In our next set of experiments, we evaluate the

performance of our approach by comparing it to two other methods, one is the traditional method described in Section 1 and the other is the random method. In our approach, we first use algorithm DFR to obtain top-k services for each component service in a composite service. For an incoming request, a composition solution which consists of services randomly chosen from top-k dominating services from each component service class, is then created to respond to the request. In the traditional method, the composition solution having the maximum value of UF is always designated to respond to the request; the random method however always selects a composition solution that consists of services randomly chosen from each component service class, as opposed to from top-k dominating services. First, we design

41

Page 7: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

a composite service which comprises four component services, each of which contains 10000 candidate services. Then we employ the aRtree structure to index these services. We conduct the following experiment afterwards. In particular, we employ different composition solutions (i.e. top-k, traditional and random) to respond to the incoming concurrent requests. Fig.5 demonstrates the performance comparison among the three methods. Note that the X-axis represents the number of concurrent requests and the maximum number of concurrent requests is set to 300. We set the value of variable k to 2,5,10 and 50 respectively in order to further study the performance of our approach under different k. For both our approach and the traditional method, when the number of concurrent requests increases, the average utility value (with respect to response time) also becomes larger, which conforms with the result of Fig.4. As for the random method, the performance is not stable, since the QoWS of services chosen randomly is uncertain. We also observe that no matter which value we set the variable k tobe, our approach always performs better than the traditional method.

120 160 200 240 2800

150

300

450

600

Res

pons

e Ti

me

(ms)

(a) # of Requests

traditionaltop_k=2rand_k=2

120 160 200 240 2800

150

300

450

600

Res

pons

e Ti

me

(ms)

(b) # of Requests

traditionaltop_k=5rand_k=5

120 160 200 240 2800

150

300

450

600

Resp

onse

Tim

e (m

s)

(c) # of Requests

traditionaltop_k=10rand_k=10

120 160 200 240 2800

150

300

450

600

Res

pons

e Ti

me

(ms)

(d) # of Requests

traditionaltop_k=50rand_k=50

Figure 5. Performance comparison

It is useful to observe that the advantage of our top-kapproach compared to the tradition method is even more obvious when the number of concurrent requests becomes larger. Meanwhile, we can also see that the performance of our approach becomes better as the value of k increases. Nevertheless, it does not always mean that the higher the value of k is, the better the performance of our approach is. This is because the overheads of our approach include not only the response time (incurred from performing the requests as shown in Fig.5), but also the overheads on retrieving the top-k dominating services. The latter however depends on the number of candidate services in each component service class. Although we can off-line retrieve the top-k dominating services, it is only fair to take this part of overheads into account when comparing our approach with the others objectively. Therefore, we demonstrate in Table 1 different overheads on time (measured by “ms”) under different number of services in a service class. Note

that in Table 1, “T” denotes the traditional selection method, “R” denotes random selection method and “O” denotes our approach. In order to ensure the fairness and accuracy of our measure, given different number of services, we run each of the three methods under different k 1000 times, and then we obtain the mean overheads on time for each method. We can observe that no matter which value k is of, when the number of services in service class increases the corresponding overheads also increase. This is because DFR (Algorithm 2) which is employed to retrieve the top-k dominating services is not independent of service class scale. When the service class scale is settled and the number of services is not so large (say, 103,104 or 105), the overheads of our approach is almost the same as that of the other two methods. When the number of service class is very large (e.g., 106), the difference in terms of overheads between our approach and other two methods becomes notable. In practice, however, the results are still acceptable from our viewpoint. One reason is that we can offline retrieve the top-k dominating services and we only need to re-calculate the top kdominating services when the information about QoWS needs to be updated. Another reason is that to our best knowledge, there are few service classes which contain services more than 105 in most real-life applications. From Table 1, we can also see that the random method incurs almost the same amount of time when compared to the other two methods, no matter if the number of services increases or not. This is because the random method chooses k services from each service class randomly, for which the time complexity is O(1).

TABLE I. OVERHEADS ON TIME WITH THE VARIATION OF K.

VI. CONCLUSIONS

In the traditional service selection scheme, a request is always assigned to the best composition solution. However, the dynamics of the Web service environments and also the concurrency of requests make the traditional scheme to be inappropriate in practice. In this paper, we have presented a service selection scheme which adopts top-k dominating queries to generate a composition solution. In particular, for a given request, we select the composition solution by adopting top-k dominating query for each component service to handle it. Through extensive experiments, we have verified the efficiency and effectiveness of our approach in comparison with the baseline and traditional service selection schemes. For the future work, we try to improve the overall performance of DFR and balance the overheads between retrieving the top-k dominating services and our service selection scheme.

S 103 104 105 106

k M T O R T O R T O R T O R

2 0 0 0 2 3 0 5 63 3 16 891 5

5 0 0 0 5 8 0 5 62 2 15 890 4

10 0 8 0 4 4 0 7 63 2 15 788 3

50 0 10 0 3 3 0 8 63 3 16 899 4

100 0 9 0 10 12 3 9 62 2 32 991 4

42

Page 8: [IEEE 2011 IEEE 8th International Conference on e-Business Engineering (ICEBE) - Beijing, China (2011.10.19-2011.10.21)] 2011 IEEE 8th International Conference on e-Business Engineering

ACKNOWLEDGMENT

The work is supported by the National Natural Science Foundation of China under Grant No. 61003044, the Natural Science Foundation of Jiangsu Province under Grant No. BK2010257, and State Key Laboratory of Software Engineering (SKLSE).

REFERENCES

[1] Liu, Y., Ngu, A., Zeng, L.: QoS computation and policing in dynamic web service selection. In: 13th international conference on World Wide Web, pp. 66-73. New York (2004)

[2] Lian, X., Chen, L.: Top-k dominating queries in uncertain databases. In: 12th International Conference on Extending Database Technology, pp. 660-671. Saint Petersburg (2009)

[3] Zhang, W., Lin, X., Pei, J., Wang, W.: Threshold-based Probabilistic Top-k Dominating Queries. VLDB J. 19(2), 283-305 (2010)

[4] Yiu, M., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695-718 (2009)

[5] Liu, D., Wan, C., Xiong, N., Park, J., Yeo, S.: Global Top-k Aggregate Queries Based on X-tuple in Uncertain Database. In: 24th IEEE International Conference on Advanced Information Networking and Applications Workshops, pp. 814-821. Perth (2010)

[6] Serhani, M., Dssouli, R., Hafid, A., Sahraoui, H.: A QoS Broker Based Architecture for Efficient Web Services Selection. In: IEEE International Conference on Web Services, pp. 113-120. Orlando (2005)

[7] Yiu, M., Mamoulis, N.: Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data. In: 33rd International Conference on Very Large Data Bases, pp.483-494. Austria (2007)

[8] Börzsönyi, S., Kossmann, D., Stocker, K.: The Skyline Operator. In: 17th International Conference on Data Engineering, pp. 421-430. Heidelberg (2001)

[9] Soliman, M., Ilyas, I., Chang, K.: Probabilistic top-k and ranking-aggregate queries, TODS. 33(3), 13:1-13:54 (2008)

[10] Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with Presorting. In: 19th International Conference on Data Engineering, pp.717-816. Bangalore (2003)

[11] Godfrey, P., Shipley, R., Gryz, J.: Maximal Vector Computation in Large Data Sets. In: 31st International Conference on Very Large Data Bases, pp.229-240. Trondheim (2005)

[12] Kossmann, D., Ramsak, F., Rost, S.: Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In: 28th International Conference on Very Large Data Bases, pp.275-286. Hong Kong (2002)

[13] Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive Skyline Computation in Database Systems. TODS. 30(1), 41–82 (2005)

[14] Tan, K., Eng, P., Ooi, B.: Efficient Progressive Skyline Computation. In: 27th International Conference on Very Large Data Bases, pp.301-310. Roma (2001)

[15] Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. TODS. 33(4), 31:1-31:49 (2008)

[16] Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. TODS. 30(1), 41-82 (2005)

[17] Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD’84, pp.47-57. ACM Press, Boston (1984)

[18] Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP Operations in Spatial Data Warehouses. In: Advances in Spatial and Temporal Databases, 7th International Symposium, pp.443-459. Redondo Beach (2001)

[19] Su, I., Chung, Y., Lee, C.: Top- Combinatorial Skyline Queries. In: Database Systems for Advanced Applications, 15th International Conference, pp.79-93. Tsukuba(2010)

[20] Lazaridis, I., Mehrotra, S.: Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure. In: SIGMOD 2001 Electronic Proceedings, pp. 401-412. Santa Barbara (2001)

[21] Zeng, L., Benatallsh, B., Ngu, A., Dumas, M., Kalagnanam, J., Chang, H.:QoS-Aware Middleware for Web Services Composition,” IEEE Transactions on Software Engineering. 30(5), 311-327 (2004)

[22] Yu, Q., Bouguettaya, A.: Computing Service Skylines over Sets of Services. In: IEEE International Conference on Web Services, pp.481-488. Miami (2010)

[23] Yu, Q., Bouguettaya, A.: Computing Service Skyline from Uncertain QoWS. TSC. 3(1), 16-29 (2010)

[24] Skoutas, D., Sacharidis, D., Simitsis A., Kantere, V., Sellis, T: Top-k Dominant Web Services Under Multi-Criteria Matching. In: 12th International Conference on Extending Database Technology: Advances in Database Technology, pp.898-909. New York (2009)

43