hiresome ii towards privacy aware cross

Upload: priya-mohan

Post on 01-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    1/11

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, TPDS-2013-08-0725 1

    HireSome-II: Towards Privacy-Aware Cross-Cloud Service Composition for Big Data

    ApplicationsWanchun Dou, Xuyun Zhang, Jianxun Liu, and Jinjun Chen, Senior Member,IEEE

    Abstract Cloud computing promises a scalable infrastructure for processing big data applications such as medical data

    analysis. Cross-cloud service composition provides a concrete approach capable for large-scale big data processing. However,

    the complexity of potential compositions of cloud services calls for new composition and aggregation methods, especially when

    some private clouds refuse to disclose all details of their service transaction records due to business privacy concerns in cross-

    cloud scenarios. Moreover, the credibility of cross-clouds and on-line service compositions will become suspicional, if a cloud

    fails to deliver its services according to its promised quality. In view of these challenges, we propose a privacy-aware cross-

    cloud service composition method, named HireSome-II (History record-based Service optimization method) based on its

    previous basic version HireSome-I. In our method, to enhance the credibility of a composition plan, the evaluation of a service is

    promoted by some of its QoS history records, rather than its advertised QoS values. Besides, the k-means algorithm is

    introduced into our method as a data filtering tool to select representative history records. As a result, HireSome-II can protect

    cloud privacy, as a cloud is not required to unveil all its transaction records. Furthermore, it significantly reduces the time

    complexity of developing a cross-cloud service composition plan as only representative ones are recruited, which is demanded

    for big data processing. Simulation and analytical results demonstrate the validity of our method compared to a benchmark.

    Index Termscloud, service composition, QoS, big data, transaction history records

    1 INTRODUCTION

    n recent years, Cloud Computing and big data receivesenormous attention internationally due to various busi-

    ness-driven promises and expectations such as lower up-front IT costs, a faster time to market, and opportunitiesfor creating value-add business [1], [2], [3]. As the latest

    computing paradigm, cloud is characterized by deliver-ing hardware and software resources as virtualized ser-vices by which users are free from the burden of acquir-ing the low-level system administration details [4], [5].Cloud computing promises a scalable infrastructure forprocessing big data applications such as the analysis ofhuge amount of medical data [1], [6], [7], [11], [15]. Cur-rently, Cloud providers including Amazon Web Services(AWS), Salesforce.com, or Google App Engine, give usersthe options to deploy their application over a network ofa nearly infinite resource pool [6]. By leveraging Cloudservices to host Web, big data applications can benefitfrom cloud advantages such as elasticity, pay-per-use,and abundance of resources with practically no capitalinvestment and modest operating cost proportional toactual use [1], [8], [9], [10].

    In practice, to satisfy different security and privacy re-quirements, cloud environments usually consist of publicclouds, private clouds and hybrid clouds, which lead arich ecosystem in big data applications [11], [12], [13].Generally, current implementations of public clouds

    mainly focus on providing easily scaled-up and scaled-down computing power and storage. If data centers ordomain-specific services center tend to avoid or delaymigrations of themselves to the public cloud due to mul-tiple hurdles, from risks and costs to security issues andservice level expectations, they often provide their ser-vices in the form of private cloud or local service host [2].For a complex web-based application, it probably coverssome public clouds, private clouds or some local servicehost [10], [14]. For instance, the healthcare cloud service, abig data application illustrated in [14], involves manyparticipants like governments, hospitals, pharmaceuticalresearch centres and end users. As a result, a healthcareapplication often covers a series of services respectivelyderived from public cloud, private cloud and local host.

    In practice, some big data centers or software servicescannot be migrated into a public cloud due to some secu-rity and privacy issues [14], [15]. If a web-based applica-tion covers some public cloud services, private cloud ser-vices and local web services in a hybrid way, cross-cloudcollaboration is an ambition for promoting complex web-based applications in the form of dynamic alliance forvalue-add applications [16]. It needs a unique distributedcomputing model in a network-aware business context.

    Cross-cloud service composition provides a concreteapproach capable for large-scale big data processing. Ex-isting (global) analysis techniques for service composi-

    Wanchun Dou is with the State Key Laboratory for Novel Software Tech-nology, the Department of Computer Science and Technology, NanjingUniversity, China, 210023. E-mail: [email protected].

    Jianxun Liu is with the School of Computer Science and Engineering,Hunan University of Science and Technology, China, 411201.Email:[email protected].

    Xuyun Zhang and Jinjun Chen are with the Faculty of Engineering andInformation Technology, University of Technology, Sydney, PO Box 123,Broadway NSW 2007, Australia.

    E-mail :{xyzhanggz, jinjun.chen}@gmail.com.

    I

    Digital Object Indentifier 10.1109/TPDS.2013.246 1045-9219/13/$31.00 2 013 IEEE

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    2/11

    2 IEEE TRANSACTIONS ON PARELLEL AND DISTRIBUTED SYSTEMS, TPDS-2013-08-0725

    tion, however, often mandate every participant serviceprovider to unveil the details of services for network-aware service composition, especially the QoS infor-mation of the services. Unfortunately, such an analysis isinfeasible when a private cloud or a local host refuses todisclose all its service in detail for privacy or businessreasons [17]. In such a scenario, it is a challenge to inte-

    grate services from a private cloud or local host with pub-lic cloud services such as Amazon EC2 and SQS for build-ing scalable and secure systems in the form of mashups.As the diversity of Cloud services is highly available to-day, the complexity of potential cross-cloud compositionsrequires new composition and aggregation models.

    On the other hand, as a cloud often hosts a lot of indi-vidual services, cross-cloud and on-line service composi-tion is heavily time-consuming for big data applications.It always challenges the efficiency of service compositiondevelopment on Internet [18], [19], [20], [21], [22]. Besides,for a web service which is not a cloud service and itsbandwidth probably fails to match to the cloud, it is a

    challenge to trade off the bandwidth between the webservice and the cloud in a scaled-up or scaled-down wayfor a cross-cloud composition application. Here, the timecost is heavy for cross-platform service composition.

    With these observations, it is a challenge to tradeoff theprivacy and the time cost in cross-cloud service composi-tion for processing big data applications. In view of thischallenge, an enhanced History record-based Service op-timization method named HireSome-II, is presented in thispaper for privacy-aware cross-cloud service compositionfor big data applications. In our previous work [23], asimilar method, named HireSome (could be treated asHireSome-I)has been investigated, which aims at enhanc-

    ing the credibility of service composition. HireSome-I isincapable of dealing with the privacy issue in cross-cloudservice composition.Compared to HireSome-I, HireSome-IIgreatly speeds up the process of selecting a (near-to-) op-timal service composition plan, and protects the privacyof a cloud service for cross-cloud service composition.

    The remainder of the paper is organized as follows.Section 2 presents some preliminary knowledge. Section 3investigates a benchmark for evaluating a history record-based service composition. Section 4 elaborates HireSome-II for cross-cloud service composition. Section 5 demon-strates comprehensive simulation experiments to evaluatethe efficiency and effectiveness of our method. Section 6

    presents related work and comparison analysis. Finally,Section 7 addresses our conclusions and future work.

    2 PRELIMINARY KNOWLEDGE

    In this section, we briefly introduce some preliminaryknowledge about objective function and utility functionleveraged to guide the selection of service compositionplans, and the k-means clustering algorithm employed toselect representative transaction records without breach-ing service privacy.

    To facilitate the subsequent discussion, some basic no-tions are defined as follows. Theoretically, a service com-

    position plan consists of a set of services respectively se-

    lected from different service pools. More precisely, oneservice per service pool is selected to constitute a compo-sition plan. Since there are probably many available Webservices in a service pool that provides overlapping oridentical functionality, albeit with different Quality ofService (QoS), a choice is required to determine whichservices for the expected optimal service composition.

    Specifically, an objective function is presented in [19] todetermine the desirability of a composition plan. Thisobjective function relies on MCDM and SAW technique[24]. Concretely, in [19], five typical QoS criteria are speci-fied, i.e., execution price, execution duration, reputation,successful execution rate, and availability. Furthermore,the authors specified that for these QoS criteria, some arenegative, while others are positive. For a negative criteri-on, the higher its value is, the lower quality it owns (e.g.,execution time and execution price). For a positive criteri-on, the higher its value is, the higher quality it owns (e.g.,reputation and availability). Following this idea, in [22], amore general utility function is presented by introducing

    a set of QoS criteria into their application. With thesespecifications, the utility function is defined by (1) forevaluating a composition plan CP.

    In (1), there are negative criteria and positive criteria.Moreover, U(CP) represents the utility value of a servicecomposition plan CP. Wlis the weight of the l-th negativequality criterion associated with a users preference, andWkis the weight of the k-th positive quality criterion asso-ciated with a users preference, in which

    . Ql(CP) denotes the value of the

    l-th negative quality criterion of CP.Qk(CP) denotes the

    value of the k-th positive quality criterion of CP. maxlQ and

    min

    lQ are respectively the CPs maximum and minimum

    values of its l-th negative quality criterion. They are com-puted by the participant services k-th positive quality

    criterion in a certain way. maxkQ and

    min

    kQ are the CPs

    maximum and minimum values of its k-th positive quali-ty criterion, respectively. They are computed by the par-ticipant services k-th positive quality criterion with a cer-tain way. Therefore, an objective function, i.e.,Max(U(CP)), is used for maximizing the user satisfaction

    expressed as utility functions over QoS attributes.We recruit the k-means clustering algorithm, a well-

    known and commonly used partitioning method, to selectrepresentative history records without spoiling the priva-cy of a cloud service [25], [26]. Generally, the k-meansalgorithm takes the input parameter, k, and partitions nobjects into kclusters so that the intracluster similarity ishigh but the intercluster similarity is low. Cluster simi-larity is measured by the mean value of the objects in acluster, which can be viewed as the clusters centroid orcenter of gravity. Typically, a square-error criterion isused to evaluate the clustering effect. It is defined by (2).

    max min

    max min max min

    1 1

    ( ) ( )( ) ( ) ( ) (1)l l k k

    l k

    l kl l k k

    Q Q CP Q CP QU CP W W

    Q Q Q Q

    (2)

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    3/11

    W. DOU ET AL.: HIRESOME-II: TOWARDS PRIVACY-AWARE CROSS-CLOUD SERVICE COMPOSITION FOR BIG DATA APPLICATIONS 3

    In (2), Eis the sum of the square-error for all objects inthe data set; p is the point in space representing a givenobject; and is the mean of cluster (bothpand aremultidimensional). The smaller the value of the E is, thebetter the clustering effect is. In other words, for each ob-ject in each cluster, the distance from the object to its clus-ter center is squared, and the distances are summed. This

    criterion tries to make the resulting kclusters as compactand as separate as possible.

    3 A BENCHMARK FOR EVALUATING A HISTORYRECORD-BASED SERVICE COMPOSITION PLAN

    3.1 A Benchmark for History Record-based ServiceComposition Evaluation

    Current service optimization approaches often assumethat the quality delivered by service providers does notchange over time [22]. However, this assumption is oftenspoilt by the dynamic network environment [19]. There-fore, a service provider may fail to deliver their serviceswith promised quality. It often causes some fluctua-tions in service performance, and the evaluation of a ser-vice will be suspicional, if the evaluation is scored by theQoS values promised in advance. In [27], the prom-ised QoS values are treated as tentative QoS values.

    To address this challenge, we leverage history recordsassociated with a services past transaction to reflect theservices quality in statistical probability. We aim at filterout the latent mendacious information. Technically, ourwork starts from a referred service composition evalua-tion method, which will play as a benchmark in this pa-per. For simplifying our discussion, a services historytransaction records specified by it QoS values will be in-dicated as a services history records in short.

    With these observations, some basic concepts are de-fined, as follows, to facilitate our further discussion.

    Definition 1: (Service Composition Class). For twogiven tasks T1and T2engaged in a task scheduling, let P1-WService= (WS11, WS12, , WS1n) and P2-WService= (WS21,WS22, , WS2m) be two service pools, in whichP1-WServiceconsists of the candidate services for executing T1and P2-WService consists of the candidate web services for exe-cuting T2. In this paper, (WS1i,WS2j) will be treated as aservice composition class between P1-WService and P1-WService for orchestrating T1 and T2s later execution,where WS1iP1-WServiceand WS2jP2-WService are held.

    From Definition 1, we can deduce that there are nmservice composition classes between P1-WService and P2-WService for orchestrating T1 and T2s later execution. Aservice composition class herein specifies a service com-position plan associated with a global task scheduling.

    Definition 2: (Service Composition Instance). For aservice composition class (WS1i,WS2j), let HR1ibe the setof WS1is QoS history records, and let HR2jbe the set ofWS2js QoS history records. In this paper, (hr1i-h,hr2j-k) willbe treated as a service composition instance instantiatedby service composition class (WS1i, WS2j), in which hr1i-hHR1i, and hr2j-kHR2jare held.

    From Definition 2, we can deduce that if WS1i has M1iQoS history records, and WS2j has M2j QoS history rec-

    ords, there are M1iM2jservice composition instances in-stantiated by service composition class (WS1i, WS2j).Moreover, with Definition 1 and Definition 2, a referredmethod is specified by Definition 3 for selecting an opti-mal service composition plan. It plays as a benchmark forhistory record-based service composition evaluation.

    Definition 3 (Benchmark for history record-based

    service composition evaluation). For a service composi-tion class CS: (WS1i,WS2j), its utility value can be comput-ed by the average utility values of its service compositioninstances. For a group of service composition classes, theservice composition class that has the largest utility valuewill be selected as an optimal optimal service compositionplan for satisfying a global task scheduling.

    The utility values can be computed according to (1) asspecified in Section 2. The benchmark for achieving anoptimal service composition plan is specified by Fig. 1.

    Fig. 1. The specifications of a benchmark for achieving an optimalservice composition plan.

    3.2 Case Study

    Here, a case study (See Fig. 2) is presented to illustrate thenotions of Definition 1, Definition 2, and Definition 3. Thebasic background of the example is derived from a mul-timedia delivery application that firstly appeared in [28].

    Fig. 2. A cross-cloud service composition scenario.

    In Fig. 2, a smartphone user requests the latest newsfrom a service provider. Available multimedia contentconsists of a news ticker and some topical videos availa-ble in MPEG2 only. The news provider has no adaptationcapabilities, so additional services are required to servethe users request: a transcoding service for adapting themultimedia content to fit the target format, a text transla-tion service for the ticker, and a compression service forintegrating the ticker with the video stream for the lim-

    ited smartphone display.Suppose there are three candidate services for T1s exe-

    1. Taking advantage of the preliminary knowledge of utility func-tion and objective function as presented in Section 2, each ser-vice composition instances utility value can be calculated.

    2. For a service composition class CS, an average value of its ser-vice composition instances utility values can be computed,which will be treated as CSs utility values.

    3. The service composition class that owns the largest utility valuewill be treated as the optimal service composition plan.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    4/11

    4 IEEE TRANSACTIONS ON PARELLEL AND DISTRIBUTED SYSTEMS, TPDS-2013-08-0725

    cution, two candidate services for T2s execution, and twocandidate services for T3s execution. Moreover, supposethat for WS11, there are seven QoS history records, and forWS12, there are five QoS history records, etc. More detailsof the case can be found in Appendix A.1 (All appendices are

    included in the supplemental file). With these assumptions,the optimal service composition plan can be developed by

    following the benchmark as specified by Fig. 1.

    4 HIRESOME-II: TOWARD PRIVACY-AWARECROSS-CLOUD SERVICE COMPOSITION

    In our method, a tree structure is recruited to specify theservice composition context. Concretely, a Task-Servicetree is defined (See Definition 4), to incorporate a task anda group of candidate services into an integrated applica-tion context. Here, the candidate services are the qualifiedservices that can fulfill the task executions specificationin functional and non-functional properties.

    Definition 4 (Task-Service tree). For a task and the

    candidate services that can fulfill the task executionsspecification in functional and non-functional properties,a Task-Service tree is a two-level tree structure that con-sists of a main root node and a group of leaf nodes, wherethe main root node is instantiated by the task and the leafnodes are instantiated by the candidate services.

    For example, Fig. 3 illustrates two Task-Service treesrespectively initiated by task T1and task T2, in which, T1-WS1and T1-WS2 are the candidate services that can fulfillT1s executions specification in functional and non-functional properties, and T2-WS1, T2-WS2 and T2-WS3arethe candidate services that can fulfill T2s executionsspecification in functional and non-functional properties.

    Fig. 3. Task-Service tree instances.

    In a Task-Service tree, for a candidate service, theQoSs history records associated with its non-functionalproperties reflected in its past executions will be dividedinto two clusters by taking advantage of the well-knownk-means clustering algorithm introduced in Section 2.Here, the k-means clustering algorithm is put into practicewith k= 2 in our method.

    With these processes, two peer clusters and their repre-

    sentative history records can beselected respectively fromthese two clusters.Definition 5 (Peer cluster and representative history

    record). For a candidate service of a task, its history rec-ords will be grouped into two clusters by taking ad-vantage of the well-known k-means (k= 2) clustering algo-rithm. These two clusters will be treated as peer clustersfor each other. For these two peer clusters, two repre-sentative history records will be selected respectivelyfrom these two clusters. Concretely speaking, for a peercluster, its representative history record is a history rec-ord that owns the best utility value in this cluster.

    Please note that in Definition 5, two representative his-

    tory records respectively belong to different clusters pro-duced by introducing k-means (k= 2) clustering algorithm

    into Ti-WSjs history records classification. According tok-means clustering algorithms properties, these two clus-ters do not overlap. Using the representative history rec-ords, a Task-Service tree can evolve into a Task-Service-HistoryRecord tree as defined by Definition 6.

    Definition 6 (Task-Service-HistoryRecord tree). ATask-Service-HistoryRecord tree is evolved from a Task-

    Service tree by adding two leaf nodes for each candidateservice, where the new leaf nodes are instanced by repre-sentative history records of the candidate services.

    With these scenarios as specified by Definition 5 andDefinition 6, Fig. 3 can be evolved into Fig. 4 for illustrat-ing their hierarchical relationship among a task, the taskscandidate services, and the candidate services repre-sentative history records. In Fig. 4, the 1sthistory recordlayer consists of all candidate services representativehistory records. Compared to Fig. 3, Fig. 4 demonstrates athree-level Task-Service-HistoryRecord tree evolved fromFig. 3, by integrating each candidate services representa-tive history records into the tree structure demonstrated

    by Fig. 3. In the three-level tree structure, the representa-tive history records of a candidate services are recruitedas leaf nodes of the candidates, which makes up of a newlevel indicated by the 1sthistory record layer.

    In Fig. 4, a candidate service and its two representativehistory records make up of a local binary tree. Algorithm1 specifies the developing process of the tree evolvement.

    Here, if there is only one element contained in HR,there will be only one leaf node associated with candidateWS, i.e., there will be no local binary tree as appeared in

    Fig. 4. Hierarchical relationship among a task, the tasks candidate ser-vices, and the candidate services representative history records.

    Algorithm 1: TreeEvolvingProcess-1(WS, HR)

    Input: A candidate service WSand its history records set HRassoci-ated with its past performances.

    Output:A tree in which WS plays as a root node and the two repre-sentative history records play as its leaf nodes.

    1: If HR.size 2 then

    2: {SubHR[1],SubHR[2]}KMeansCluster(HR, 2) ;

    // k-means (k= 2)clustering algorithm;//SubHR[1] and SubHR[2] are two clusters

    3: For( i=1 to 2 ) do4: IfSubHR[i].size>2 then5: Rep-HR[i]maxUtility(SubHR[i])

    //Rep-HR[1] and Rep-HR[2] are two representative history//records

    6: Else7: Rep-HR[i]SubHR[i].element

    8: End if9: WS.addChild(Rep-HR[i])

    //Task-Service tree is evolved by adding a leaf node10: End for11:Else12: WS.addChild(HR.element)13:End if

    T2 Task level

    T2-WS1 T2-WS3T2-WS2

    T1

    T1-WS1 T1-WS2 Service level

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    5/11

    W. DOU ET AL.: HIRESOME-II: TOWARDS PRIVACY-AWARE CROSS-CLOUD SERVICE COMPOSITION FOR BIG DATA APPLICATIONS 5

    Fig. 4. As an example, Fig. 5 demonstrates the tree evolv-ing process promoted by the k-means clustering algorithm.Concretely, in Fig. 5, as task T1s candidate service, serviceT1-WS1s history records associated with its past perfor-mances is sorted into two clusters, i.e., cluster-1 and clus-ter-2. Two representative history records, i.e., TW11-HR1and TW11-HR2, respectively distinguished from the twoclusters play as T1-WS1s two leaf nodes.

    Similar to the method as mentioned in [22], once a ser-

    vices history records are grouped into two clusters bytaking advantage of k-means (k = 2) algorithm, a repre-sentative cluster is selected from the two peer clusters inorder to evaluate service composition plan. If so, theevaluating process will be greatly simplified compared tothe benchmark presented in Section 3, as there are lesselements in service composition. Technically, the repre-sentative history records will be recruited for achievingour goal. More specifically, only one of the two clustersderived from a services history records will be selectedfor further investigation. Which one survives depends onits representative history records local performance. Inour method, a representative history records local per-

    formance is evaluated by its contribution score.Suppose that there are ntasks executed in a sequential

    style. For a task Ti, there are mi candidate web services.For a candidate web service of themicandidate web ser-vices, suppose that it has more than one history recordassociated with its past performance. With these scenari-os, we can get two representative history records for acandidate web service, taking advantage of k-means clus-tering algorithm with k=2. Furthermore, we also deducethat there are 2mirepresentative history records associ-ated with task Ti. As a result, we can get 2m12m22m32mncomposition instances made up of the representa-tive history records associate with the 1st history recordlayer. For example, in Fig. 4, in the task level, there aretwo tasks, i.e., T1and T2engaged in service compositioncontext. For T1, there are two candidates services, i.e.,m1=2, as shown in service level. For T2, there are threecandidates services, i.e., m2=3, as shown in service level.Suppose that for each candidate services, it owns tworepresentative history records. As a result, there are twen-ty-four composition instances, i.e., (2m1)(2m2)=(22)(23)=24, made up of representative history records.

    With these analyses, a representative history recordscontribution score is defined by Definition 7.

    Definition 7 (A representative history records localcontribution score).At a history record layer, for a repre-sentative history record, the average utility value of the

    composition instances it is engaged in, will be used as itslocal contribution scorecorresponding to this level.

    Here, an instance is presented for illustrating how tocompute the contribution score of a representative historyrecord. Corresponding to Fig. 3, service T1-WS1s has tworepresentative history records, i.e., TW11-HR1 and TW11-HR2, produces at a same level. For TW11-HR1, there are six

    composition instances it is engaged in, i.e., {TW11-HR1,TW21-HR1}, {TW11-HR1, TW21-HR2}, {TW11-HR1, TW22-HR1},{TW11-HR1, TW22-HR2}, {TW11-HR1, TW23-HR1}, and {TW11-HR1, TW23-HR2}. For TW11-HR2, there are also six composi-tion instances it is engaged in, i.e., {TW11-HR2, TW21-HR1},{TW11-HR2, TW21-HR2}, {TW11-HR2, TW22-HR1}, {TW11-HR2,TW22-HR2}, {TW11-HR2, TW23-HR1}, and {TW11-HR2, TW23-HR2}. With these observations, the average utility valuesof the composition instances that TW11-HR1is engaged in,i.e., Con-ScoreRepR(11-1) = (U11-21-1+U11-21-2+U11-22-1+U11-22-2+U11-23-1+U11-23-2)/6, will be treated as TW11-HR1s local contri-bution score. Similarly, the average utility values of thecomposition instances that TW11-HR2 is engaged in, i.e.,

    Con-ScoreRepR(11-2) = (U11-21-3 + U11-21-4 + U11-22-3 + U11-22-4 + U11-23-3 + U11-23-4)/6, is TW11-HR2s local contribution score.

    For two peer clusters, the cluster that its representativehistory records local contribution score own higher valuewill be survived, while the other one will be out in laterdiscussion. It means that the out one will be trimmedfrom the Task-Service-HistoryRecord tree. Fig. 6 demon-strates an instance corresponding to the tree trimmingprocess. Corresponding to part of a Task-Service-HistoryRecord tree as demonstrated in Fig. 5, TW11-HR2slocal contribution score is lower than TW11-HR1s, and it

    will be trimmed from the Task-Service-HistoryRecordtree. It also means that Cluster-2 will be out at the sametime. As TW11-HR1 is survived, Cluster-1 that TW11-HR1

    belongs to will be recruited for further discussion.Algorithm 2 specifies the trimming process of a Task-

    Service-HistoryRecord tree.Essentially, service composition plan development is a

    well-through decision to select a composition class that

    Fig. 6. Cluster selection corresponding to tree trimming process.

    Fig. 5. An instance of a Task-Service trees evolution.

    T1

    Cluster-2

    TW11-HR1

    TW11-HR2

    T1-WS1

    Cluster-1

    Cluster-2 representative

    history record

    Cluster-1 representativehistory record

    Set of Service T1-WS1s

    representative history

    records

    Impose k-means(k=2)

    clustering algorithm on

    this data set

    T1

    TW11-HR1

    TW11-HR2

    T1-WS1

    Cluster-1

    Cluster-2 representative

    history record

    Cluster-1 representative

    history record

    Set of Service T1-WS1s

    representative history

    records

    Cluster-2

    Algorithm 2: TreeTrim(Rep-HR[1], Rep-HR[2])

    Input: A binary tree.Output: A tree with a single leaf node .1:If contributionScore(LeftLeafNode)>ContributionScore(RightLeafNode)

    then2: ParentNode.DeleteChild(RightLeafNode) ;3: Else4: ParentNode.DeleteChild(LeftLeafNode);

    5:End if

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    6/11

    6 IEEE TRANSACTIONS ON PARELLEL AND DISTRIBUTED SYSTEMS, TPDS-2013-08-0725

    owns the best global ultity value. Concretely speaking, aservice composition class is scored by its local utility val-ue as defined by Definition 8.

    Definition 8 (A composition classs local utility value).Corresponding to a history record layer, the average val-ue of a composition classs instances will be used as thecomposition classs local utility value.

    For example, corresponding to Fig. 4, composition class{T1-WS1,T2-WS1} has four instances, i.e., {TW11-HR1, TW21-HR1}, {TW11-HR1, TW21-HR2}, {TW11-HR2, TW21-HR1}, and{TW11-HR2, TW21-HR2}. These four composition instancesutility values are respectively indicated by U11-21-1, U11-21-2,U11-21-3, and U11-21-4. As a result, at the 1st history recordlayer, the local utility value of composition class {T1-WS1,T2-WS1}, i.e., U11-21, can be computed as follows: U11-21=(U11-21-1 U11-21-2U11-21-3 U11-21-4)/4.

    More details of these computing processes of the twenty-four

    composition instancescan be found in Appendix B.1.For a survived cluster, if it contained two history rec-

    ords or more than two history records, Algorithm 1will

    be repeatedly used for developing a further depth binarytree, by imposing k-means (k=2) clustering algorithm onthe survived cluster. This process will be repeatedly doneuntil a survived cluster contains only one element, i.e., itconsists of only one representative history record. Here,please note that not all the survived clusters reach thisgoal at a same layer. As different survived clusters mayfinish their extensions at different history record layers,the length is variable for different tree evolving paths. Forexample, in Fig. 7, for the four final survived clusters, i.e.,cluster-i, cluster-j, cluster-mand cluster-n, associated withdifferent leaf nodes, cluster-ireaches the final goal, i.e., it

    contains only one element, at the 2ndhistory record layer,cluster-j reaches the final goal at the 3rd history recordlayer, while cluster-mand cluster-nreach their final goalsat the 4thhistory record layer.

    However, for Algorithm 2, the local score of a composi-tion class is promoted at a same layer. For a web service,if there is no representative history record appeared at ahistory record layer, how to compute a compositionclasss local utility value ? To deal with these situaition, atree regulating algorithm is specified by Algorithm 3,which guarantees a consistent process for computing acomposition classs local utility value piloting by Algo-rithm 2. Concretely speaking, Algorithm 3 aims at regula-ting all the survived clusters to finish their extensions at asame history record layers, and all the tree evolving pathshave the same length value.

    Corresponding to Fig. 7, Fig. 8 illustrates the tree regu-

    lating process according to Algorithm 3. In Fig. 8, the final

    survived representative history record associated withCluster-i is moved from the 2ndhistory record to the 3rd

    history record layer and from the 3rdhistory record to the4thhistory record layer. It plays as a new leaf node of it-self, and takes part in the process for computing a compo-

    sition classs local utility value, respectively promoted inthe 3rdhistory record layer and in the 4 thhistory recordlayer. Similarly, the final survived representative historyrecord associated with Cluster-j is also moved from the

    3rdhistory record to the 4thhistory record layer, and takespart in the process for computing a composition classslocal utility value in the 4thhistory record layer.

    With this processing, each branch of a Task-Service treeis extended to the same depth. Promoted by the improvedTask-Service-HistoryRecord tree, a composition classsglobal utility value can be achieved, which is the accumu-lated value of a composition classs local utility valuesrespectively derived from different history record layer.

    Fig. 8. A regulated tree piloted by Algorithm 3.

    Fig. 7. An example of tree extention with different lengths ofevolving path.

    Algorithm 4:GlobalUntilityValueCompute()

    Input: A Task-Service-HistoryRecord tree .

    Output: A composition class that owns the largest global utility val1: For i= 1 to N do

    //N is the number of the history record layers

    2: For j= 1 to M do//M is the number of the composition class

    3: CCLocalUV[i, j] = The average utility value of CCLocalUV[i,j]instances engaged in the ithhistory recordlayer

    4: End for5: End for6: Fori= 1 to N

    7: Forj= 1 to M8: CCLocalUV[j] = CCLocalUV[i,j]9: End for10: End for11: Select the composition class that owes the largest global util

    value as final composition plan.

    4thhistory record layer

    T2

    ws2

    T1

    T1-WS1

    1sthistory record layer

    2ndhistory record layer

    3rdhistor record la er

    T1-WS2

    TW12-HR2

    Cluster-i

    Cluster-j

    Cluster-nCluster-m

    Algorithm 3: TreeEvolvingProcess-2 (Rep-HR[1], Rep-HR[2])

    Input: A Task-Service-HistoryRecord tree .Output: A regulated Task-Service-HistoryRecord tree .1: For two survived cluster Sur-HR[i] and Sur-HR[j] at a same histo-

    ry record layer do2: IfSur-HR[i].size= 1 and Sur-HR[j].size1 then

    //Suppose that Node[i] and Node[j] are respectively the

    //survived representative history records associated//with Sur-HR[i] and Sur-HR[j]

    3: Algorithm 1TreeEvolvingProcess-1(Node[i], Sur-HR[i]);//It is equal to Node[i].addChild(Sur-HR[i].element) as//specified by twelfth line in Algorithm1

    4: Algorithm1TreeEvolvingProcess-1(Node[j], Sur-HR[j]) ;5: End if6: End for

    T2

    ws2

    T1

    T1-WS1

    1sthistory record layer

    2ndhistory record layer

    3rdhistory record layer

    4thhistory record layer

    T1-WS2

    TW12-HR2

    Cluster-i

    Cluster-j

    Cluster-nCluster-m

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    7/11

    W. DOU ET AL.: HIRESOME-II: TOWARDS PRIVACY-AWARE CROSS-CLOUD SERVICE COMPOSITION FOR BIG DATA APPLICATIONS 7

    As a result, a composition class that owns the largestglobal utility value is selected as the final compositionplan.Algorithm 4specifies the final selecting process.

    With these discussions, the HireSome-II method will bespecified in detail by Fig. 9.

    5 EXPERIMENTAL EVALUATION

    In this section, we evaluate HireSome-II empirically bycomparing it to the Benchmarkpresented in Section 2 and

    HireSome-Iin [23] in terms of time cost and validity.

    5.1 Experiment Context

    Concretely, five typical QoS criteria are assigned to a can-didate service. Their value domains are specified in ad-

    vance (See TABLE 1). Around the five QoS criteria, a setof history record values are randomly generated for eachcandidate service. For better simulating real-life applica-tion in a probability distribution, similar to [29], the ran-domly generated history record values are pre-processedby Tlocation-scale distribution method.

    Technically, our experiments are conducted in a hybridcloud environment.Due to the space limit, more details aboutthe experiment settings can be found in Appendix C.1.

    5.2 Experiment Process and Results

    5.2.1 Comparison analysis amongHireSome-I,HireSome-II and the Benchmarkin time cost

    Suppose that there are ntasks, and for a task there are m

    candidate services. Moreover suppose that there are phistory records associated with a candidate service. Forthe benchmark as we presented in Section 3, there are(pm)n service composition instances, and the time com-plexity is O((pm)n) in theory. For HireSome-II method, eachcandidate service has one or two representative records inone level. The time complexity is O((2m)nh), where hrep-

    resents the average height of the history record tree.To simplify the comparison analysis without loss of

    generality, in our experiment, the number of tasks is fixedwith 4, and the number of QoS criteria is fixed with 5.Moreover, 2, 3, 4, and 5 candidate web services are re-spectively assigned to a task, and 10, 15, 20, 25, 30 QoShistory records are respectively assigned to a candidateservice for further compassion analysis.

    Fig. 10(a), Fig. 10(b), Fig. 10(c), and Fig. 10(d) indicatethe evaluation results with these assumptions. Please notethat with these assumptions, the simulation is conductedfor 100 times initiated by different set of history records.The time cost as indicated by Fig. 11 is the average time

    cost for producing an optimal composition plan.From Fig. 10, we can find that, compared with

    HireSome-I and the Benchmark, HireSome-II have the leasttime to produce the optimal service composition plan.Besides, in Fig. 10, we can find that once the number ofweb services per task is fixed, the time cost for producingthe optimal service composition plan is affected by thenumber of history records per web service. Furthermore,compared with the Benchmark method, the time cost ofHireSome-I andHireSome-II changes very slowly with theincreasing of the number of history records per web ser-vice, while for the Benchmark, the time cost curve increas-es acutely with the increasing of the number of history

    records per web service. Therefore, compared to theBenchmark, HireSome-I andHireSome-II have an advantagein large-scale service composition.5.2.2 Comparison analysis amongHireSome-I,

    Fig. 9. The specifications of HireSome-II method.

    (a) Number of web services = 2 (b) Number of web services = 3

    (c) Number of web services = 4 (d) Number of web services = 5

    Fig. 10. Comparison analysis among HireSome-I,HireSome-II and

    Benchmarkin time cost.

    TABLE 1TYPICAL QOS PROPERTIES USED IN OUR EXPERIMENT

    QoS item Domain

    Execution price [40 cents, 45cents]

    Execution latency [5ms, 10 ms]

    Reputation [0.2,1.0]

    Successful execution rate [0.5,1.0]

    Availability [0.3,1.0]

    Step1: For a task engaged in a global task scheduling and itscandidate web services, a tree structure, named Task-Service tree, is developed for modeling their relation, inwhich the task is treated as a root node and each candi-date web service is treated as a leaf node of the task root.

    Step2: For a candidate web service, its history records aregrouped into two clusters taking advantage of k-means(k=2) clustering algorithm. As a result, two representativehistory records are selected respectively from the twoclusters.

    Step3:Taking advantage of Algorithm 1, the Task-Service tree isevolved into a Task-Service-HistoryRecord tree, by introduc-ing the representative history records into the Task-Servicetree as candidate services leaf nodes.

    Step4: For n tasks contained in a task scheduling, there are nTask-Service-HistoryRecord trees. Taking advantage ofAlgorithm 2, Algorithm 3,and Algorithm 4, a composi-tion classs global utility value is computed.

    Step5:As a result of Algorithm 4, a composition class that owes alargest utility value is selected as the final service compo-sition plan.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    8/11

    8 IEEE TRANSACTIONS ON PARELLEL AND DISTRIBUTED SYSTEMS, TPDS-2013-08-0725

    HireSome-IIand the Benchmarkin precisionSimilar to the first experiment, four situations are takeninto consideration in this experiment as illustrated by Fig.11. In this experiment, HireSome-I and HireSome-II areconducted for 100 times with different sets of history rec-ords, respectively. We aim at investigating the probabilitythat the final composition plan respectively produced by

    HireSome-I and HireSome-II is as same as the optimal onecomputed by the Benchmark. If not, two situations are in-vestigate, i.e., if the composition plans produced by theBenchmarkis ranked by their utility values, the probabilitythat the final composition plan respectively produced byHireSome-I and HireSome-II fall into the top3 and top5 ofthe ranked composition plans produced by the Benchmark.From Fig. 11, we can find that HireSome-II is superior toHireSome-I in performance, especially when the numberof candidate services is increased for a task.

    For example, in Fig. 11(a), HireSome-I and HireSome-IIare nearly equal to each other in their performance; whilein Fig. 11(c) and in Fig. 11(d), we can find that HireSome-II

    is greatlysuperior toHireSome-I in performance.

    5.2.3 Comparison analysis betweenHireSome-IandHireSome-II in their optimalityIn [22], a strategy is presented to evaluate a methods op-timality. Similar to this strategy, the optimality ofHireSome-I and HireSome-II will be computed taking ad-vantage of the following formula:

    Optimality = U/UBenchmark, (3)

    whereUis the utility value of the final service compositionplan achieved by HireSome-I and HireSome-II. UBenchmark isthe utility value of the optimal service composition planachieved by the Benchmark method, i.e., the exhaustivecomposition method, and its optimality value is 1.

    Similar to the first and second experiment, four situa-tions are taken into consideration, and the simulation isconducted for 100 times. Fig. 12 indicates the average op-timality value of HireSome-I and HireSome-II. From Fig. 12,we can find that, in most cases, the optimality of HireSome-I and HireSome-II performs are close to the Benchmarkmethod, and HireSome-II performs better than HireSome-Iin achieving a (close-to-) optimal plan.

    (a) Number of web services = 2

    (b) Number of web services = 3

    (c) Number of web services = 4

    (d) Number of web services = 5

    Fig. 11. Comparison of precision between HireSome-I and HireSome-II with respect to Top1, Top3 & Top5.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    9/11

    W. DOU ET AL.: HIRESOME-II: TOWARDS PRIVACY-AWARE CROSS-CLOUD SERVICE COMPOSITION FOR BIG DATA APPLICATIONS 9

    6 RELATED WORKS AND COMPARISON ANALYSIS

    Service-Oriented Computing (SOC) enables the composi-tion of services provided with varying Quality of Service(QoS) levels in a loosely coupled way. Selecting a set ofservices for a (near-) optimal composition plan in terms ofQoS is crucial when many functionally equivalent ser-vices are available [16]. Therefore, service composition isa classic issue in service computing domain. Quality-aware composition of web services has been fully investi-gated in [18], [19], [20], [21], [22], to name a few.

    In [18], [19], [20], the authors propose a per-service-class optimization as well as a global optimization using

    integer programming. As opposed to integer program-ming, in [21], a genetic algorithm based approach is pro-posed, where the genome length is determined by thenumber of abstract services that require a choice to bemade. GA-based approach focuses on dealing with non-linear constraints. It has the advantage that it is scalablewhen the number of concrete services per abstract serviceincreases. Considering that more and more functionallyequivalent services are available on Internet, Alrifai et al. pro-poses an interesting mechanism for cutting through thesearch space of candidate web-services, by using skylinequeries [22] offline. Skyline queries identify non-dominatedweb services on at least one QoS criteria. A non-dominated

    web-service means a web-service that has at least one QoSdimension where it is strictly better than any other web-service and at least equal on all other QoS dimensions.

    Technically, linear programming model is often re-cruited in service composition evaluation [19], [22]. Inpractice, various composition styles, e.g., sequential, par-allel, alternative and loops can be engaged in a composi-tion plan. In this paper, we focus on investigating the se-quential composition model, as other styles can be re-duced or transformed into the sequential model by pre-sent mature techniques as mentioned in [19].

    Generally speaking, service composition is promoted inan open web environment. For a private cloud, the priva-

    cy and security are crucial issues in cloud service access.

    It often leads to an awkward situation that some QoS in-formation may be unavailable in cross-cloud compositionevaluation. It is just the reason that although it is assumedthat the history records can be obtained through somemonitoring mechanism [30], [31], there is few general QoSdataset widely recruited for testing the performance andaccuracy of history record-aware service composition as

    mentioned in [32].In view of this challenge, an enhanced History record-

    based Service optimization method, named HireSome-II,is presented in this paper for cross-cloud service composi-tion. This method aims at a tradeoff between the privacyand the time cost in cross-cloud service composition.Concretely, our approach differs from the above ap-proaches in three respects:

    (1) The k-means algorithm is implemented inside acloud, and only a few representative history records areengaged in composition evaluation. It protect the privacyof a cloud, as the method does not require the cloud tounveil all its cloud services QoS information, while in

    [18], [19], [20], etc., the composition approaches did nottake into account of the privacy issues, and held the as-sumption that all the QoS information can be available.

    (2) The tree mechanism presented in this paper is simi-lar to the tree mechanism presented in [22]. However, ourtree mechanism is promoted by imposing k-means algo-rithm on history records, while the tree mechanism pre-sented in [22] is set up by imposing skyline queries oncandidate web-services. Besides, in [22], the tree mecha-nism is initiated by a binary tree; while our tree mecha-nism is initiated by a Task-Service tree, which is a multi-fork tree and is more compatible for real life systems.

    (3) Compared to its previous version investigated in

    [23], HireSome-II not only protects the privacy of a cloudservice for cross-cloud service composition, but alsogreatly speeds up the calculating process for selecting a(near-to-) optimal service composition plan with higheroptimality and precision. It is suitable for developing across-cloud service composition plan over big data of his-tory records with privacy consideration.

    7 CONCLUSION AND FUTURE WORK

    In this paper, an enhanced History record-based Serviceoptimization method, named HireSome-II based on theprevious basic one of HireSome-I, has been developed for

    privacy-aware cross-cloud service composition for pro-cessing big data applications. It can effectively promotecross-cloud service composition in the situation where acloud refuses to disclose all details of its service transac-tion records for business privacy issues in cross-cloudscenario. Our composition evaluation approach achievestwo advantages. Firstly, our method significantly reducesthe time complexity as only some representative historyrecords are recruited, which is highly demanded for bigdata applications. Secondly, our method protects cloudprivacy as a cloud is not required to unveil all of its trans-action records, which accordingly protects privacy in bigdata. Simulation and analytical results have demonstrat-

    ed the validity of our method compared to a benchmark.

    Fig. 12. Comparison of optimality with HireSome-I and HireSome-II.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    10/11

    10 IEEE TRANSACTIONS ON PARELLEL AND DISTRIBUTED SYSTEMS, TPDS-2013-08-0725

    For future work, we plan to apply our method to somespecific cloud systems for processing big data applica-tions. Besides, as the privacy preservation for big dataanalysis, share and mining is a challenging research issuedue to increasingly larger volume of datasets in cloud, wealso plan to investigate the scalability of privacy preserva-tion in big data applications with cloud service access.

    ACKNOWLEDGMENT

    This paper is partly supported by project National ScienceFoundation of China under Grant 61073032, and 91318301;National Key Technology R&D Program of the Ministryof Science and Technology under Grant 2011BAK21B06.

    REFERENCES

    [1] S. Tai, et al., Cloud Service Engineering, Proc. 32nd

    ACM/IEEE Intl Conf. on Software Engineering (ICSE10), vol. 2,

    pp. 475-476, May 2010.

    [2] V. Nallur and R. Bahsoon, A Decentralized Self-Adaptation

    Mechanism for Service-based Applications in The Cloud, IEEETrans. Softw. Eng., vol. 39, no. 5, pp. 591-612, May 2013.

    [3] H.Y. Lin, W.G. Tzeng, A Secure Erasure Code-Based Cloud

    Storage System with Secure Data Forwarding, IEEE Trans. Paral-

    lel Distrib. Syst., vol. 23, no. 6, pp. 995-1003, Oct. 2012.

    [4] A. Gambi, and G. Toffetti, Modeling Cloud Performance with

    Kriging, Proc.34th Intl Conf on Software Engineering (ICSE12),

    pp. 1439-1440, June 2012.

    [5] R.Buyya, et al., Cloud Computing and Emerging IT Platforms:

    Vision, Hype, and Reality for Delivering Computing as The 5th

    Utility, Future Gener. Comput. Syst., vol. 25, no. 6, pp. 599616,June 2009.

    [6] J. Cao, et al., Optimal Multiserver Configuration for Profit

    Maximization in Cloud Computing, IEEE Trans. Parallel Dis-trib. Syst., vol. 24, no. 6, pp. 1087-1096, Oct. 2013.

    [7] M. Armbrust, et al., Above the Clouds: A Berkeley View of

    Cloud Computing, Communications of The ACM, Commun.

    ACM,vol. 53, no. 4, pp. 50-58, April 2010.

    [8] M. Zhang, R. Ranjan, A. Haller, S. Nepal, M. Menzel, "A De-

    clarative Recommender System for Cloud Infrastructure Ser-

    vices Selection", Proc. 9th International Conference Economics of

    Grids, Clouds, Systems, and Services (GECON12), pp. 102-113,

    November 2012.

    [9] M. Menzel and R. Ranjan, CloudGenius: Decision Support for

    Web Server Cloud Migration, Proc. 21st Intl Conf. on World

    Wide Web (WWW12), pp. 979-988, April 2012.

    [10] A.Iosup, et al., Performance Analysis of Cloud Computing Ser-vices for Many-Tasks Scientific Computing, IEEE Trans. Parallel

    Distrib. Syst., vol. 22, no. 6, pp. 931-945, June 2011.

    [11] X. Zhang, et al., A Scalable Two-Phase Top-Down Specializa-

    tion Approach for Data Anonymization Using MapReduce on

    Cloud, IEEE Trans. Parallel Distrib. Syst., in press, accepted on

    6 Feb. 2013.

    [12] M. Li, et al., Scalable and Secure Sharing of Personal Health

    Records in Cloud Computing Using Attribute-based Encryp-

    tion, IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 1, pp. 131-

    143, January 2013.

    [13] X. Zhang, et al., An Efficient Quasi-Identifier Index based

    Approach for Privacy Preservation over Incremental Data Sets

    on Cloud,J. Comput. Syst. Sci., vol. 79, no. 5, pp. 542555, Aug.

    2013.

    [14] X. Zhang, et al., A Privacy Leakage Upper-bound Constraint

    based Approach for Cost-effective Privacy Preserving of Inter-

    mediate Datasets in Cloud, IEEE IEEE Trans. Parallel Distrib.

    Syst., vol. 24, no. 6, pp. 1192-1202, June 2013.

    [15] X. Zhang, et al, A Hybrid Approach for Scalable Sub-Tree

    Anonymization over Big Data using MapReduce on Cloud, J.

    Comput. Syst. Sci., in press, accepted on 28 Aug. 2013.[16] A. Klein, F. Ishikawa and S. Honiden, Towards Network-

    Aware Service Composition in The Cloud, Proc. 21st Intl Conf.

    on World Wide Web (WWW12), pp. 959-968, April 2012.

    [17] C. Ye, S.C. Cheung, and W.K. Chan, Publishing and Composi-

    tion of Atomicity-Equivalent Services for B2B Collaboration,

    Proc. 28th Intl Conf. on Software Engineering (ICSE06), pp. 351-

    360, May 2006.

    [18] B. Benatallah, et al., Declarative Composition and Peer-to-Peer

    Provisioning of Dynamic Web Services.Proc. 18th Intl Conf. on

    Data Engineering, pp. 297308, 2002.

    [19] L. Zeng, et al., QoS-Aware Middleware for Web Services

    Composition, IEEE Trans. Softw. Eng., vol. 30, no. 5, pp. 311-

    327, May 2004.[20] L. Zeng, et al., Quality Driven Web Services Composition.

    Proc. 12th Intl Conf. on World Wide Web (WWW 03), pp. 411-421,

    May 2003.

    [21] G. Canfora, et al., An Approach for QoS-Aware Service Com-

    position based on Genetic Algorithms, Proc. 2005 Conf. on Ge-

    netic and Evolutionary Computation (GECCO05), pp. 1069-1075,

    June 2005.

    [22] M.Alrifai, D.Skoutas, and T.Risse. Selecting Skyline Services for

    QoS-based Web Service Composition, Proc. 19th Intl Conf. on

    World Wide Web (WWW 10), pp. 11-20, April 2010.

    [23] W. Lin, et al., A History Record-Based Service Optimization

    Method for QoS-Aware Service Composition, Proc. 2011 IEEE

    Intl Conf. on Web Services (ICWS 2011), pp. 666-673, July, 2011.[24] H.C.L and K. Yoon, Multiple Criteria Decision Making: Meth-

    ods and Applications,Lecture Notes in Economics and Mathemat-

    ical Systems, vol. 186, Springer-Verlag, 1981.

    [25] S. Lloyd, Least Squares Quantization in PCM, IEEE Trans. Inf.

    Theory, vol. 28, no. 2, pp. 129-137, Mar. 1982.

    [26] J. Han, and K. Micheline, Data Mining: Concepts and Tech-

    niques (2nd Edition), published by [M]. Morgan Kaufmann,

    2006.

    [27] Y. Qi, and A. Bouguettaya, Computing Service Skyline from

    Uncertain QoWS, IEEE Trans. on Service Computing, vol. 3, no.

    1, pp. 16-29, Jan.-March 2010.

    [28] M. Wagner, and W. Kellerer, Web Services Selection for Dis-

    tributed Composition of Multimedia Content, Proc. 12th Annu-al ACM Intl Conf. on Multimedia, pp. 104-107, Oct.2004.

    [29] S. Rosario, et al., Probabilistic QoS and Soft Contracts for

    Transaction-Based Web Services Orchestrations, IEEE Trans.

    on Services Computing, vol. 1, no. 4, pp. 187-200, Oct. 2008.

    [30] R. Jurca, B. Faltings, and W. Binder, Reliable QoS Monitoring

    Based on Client Feedback, Proc. 17th Intl World Wide Web

    Conference (WWW07), pp. 1003-1012, May 2007.

    [31] F. Barbon, et al., Run-Time Monitoring of Instances and Clas-

    ses of Web Service Compositions, Proc. 4th Intl Conf. on Web

    Services (ICWS06),pp. 63-71, May 2006.[32] K. Kritikos, and D. Plexousakis, Mixed-Integer Programming

    for QoS-based Web Service Matchmaking, IEEE Trans. on Ser-

    vices Computing, Vol. 2, No. 6, pp.122-139, April-June 2009.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/9/2019 HireSome II Towards Privacy Aware Cross

    11/11

    W. DOU ET AL.: HIRESOME-II: TOWARDS PRIVACY-AWARE CROSS-CLOUD SERVICE COMPOSITION FOR BIG DATA APPLICATIONS 11

    Wanchun Dou received his PhD degree inMechanical and Electronic Engineering fromNanjing University of Science and Technology,China, in 2001. From Apr. 2001 to Dec. 2002,he did his postdoctoral research in the De-partment of Computer Science and Technolo-gy, Nanjing University, China. Now, he is a fullprofessor of the State Key Laboratory forNovel Software Technology, Nanjing Universi-

    ty, China. From Apr. 2005 to Jun. 2005 andfrom Nov. 2008 to Feb. 2009, he respectively visited the Departmentof Computer Science and Engineering, Hong Kong University ofScience and Technology, as a visiting scholar. Up to now, he haschaired three NSFC projects and published more than 60 researchpapers in international journals and international conferences. Hisresearch interests include workflow, cloud computing and servicecomputing.

    Xuyun Zhang is currently working towardsthe PhD degree at the Faculty of Engineering& IT, University of Technology, Sydney, Aus-tralia. Before joining UTS, he has received hisMaster's and Bachelors degree in ComputerScience from Nanjing University, China. Hisresearch interests include cloud computing,

    privacy and security, Big Data, MapReduceand OpenStack. He has published severalpapers in refereed international journals in-

    cluding IEEE Transactions on Parallel and Distributed Systems(TPDS).

    Jianxun Liu received his M.S. and Ph.D.degrees in computer science from CentralSouth University in 1997 and Shanghai JiaoTong University in 2003, respectively. He isnow a professor and a vice dean of the schoolof computer science and engineering, HunanUniversity of Science and Technology. He isalso the director of Hunan Knowledge GridLaboratory at the same university. His currentresearch interests include Service and Cloud

    Computing, Workflow Management Systems and LBS applications.He has published more than 60 academic papers in internationaltechnique journals and conference proceedings and served as PCco-chair or PC member of about 50 International Conferences.

    Jinjun Chen is an Associate Professor fromFaculty of Engineering and IT, University ofTechnology Sydney (UTS), Australia. He isthe Director of Lab of Cloud Computing andDistributed Systems at UTS. He holds a PhDin Computer Science and Software Engineer-ing from Swinburne University of Technology,

    Australia. Dr Chens research interests includecloud computing, big data, workflow manage-ment, privacy and security, and related vari-

    ous research topics. His research results have been published in

    more than 100 papers in high quality journals and at conferences,including IEEE Transactions on Service Computing, ACM Transac-tions on Autonomous and Adaptive Systems, ACM Transactions onSoftware Engineering and Methodology (TOSEM), IEEE Transac-tions on Software Engineering (TSE), and IEEE Transactions onParallel and Distributed Systems (TPDS). He received SwinburneVice-Chancellors Research Award for early career researchers(2008), IEEE Computer Society Outstanding Leadership Award(2008-2009) and (2010-2011), IEEE Computer Society Service

    Award (2007), Swinburne Faculty of ICT Research Thesis Excel-lence Award (2007). He is an Associate Editor for IEEE Transactionson Parallel and Distributed Systems. He is the Vice Chair of IEEEComputer Societys Technical Committee on Scalable Computing(TCSC), Vice Chair of Steering Committee of Australasian Symposi-um on Parallel and Distributed Computing, Founder and Coordinatorof IEEE TCSC Technical Area on Workflow Management in Scalable

    Computing Environments, Founder and steering committee co-chair

    of International Conference on Cloud and Green Computing, andInternational Conference on Big Data and Distributed Systems.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.