smartspace multiagent based distributed platform for semantic service discovery

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 44, NO. 7, JULY 2014 805

SMARTSPACE: Multiagent Based DistributedPlatform for Semantic Service Discovery

Sourish Dasgupta, Anoop Aroor, Feichen Shen, and Yugyung Lee, Member, IEEE

Abstract—Service discovery is an integral issue in the areaof service oriented computing (SOC). A centralized platformbased service discovery suffers from major drawbacks such asscalability and a single point of failure. A P2P based design incurshigh maintenance overhead for a distributed service registry andquerying task. In this paper, we have proposed SMARTSPACE—a hybrid multiagent based distributed platform for efficient se-mantic service discovery. By utilizing reactive agents in modelingservices, users’ requests, and registry management middleware,the proposed service discovery algorithm, SmartDiscover, is ableto achieve fast, scalable, parallel, and concurrent service findingwithin a systemic environment that can be highly dynamic,asynchronous, and concurrent. We have conducted the Smart-Discover experiments within the JADE 3.7 agent frameworkon top of both IBM Cloud Cluster and NetLogo simulationenvironments. The results showed promising positive outcomes interms of average query response time and the number of messageexchanges to maintain the distributed registry. The accuracyof SmartDiscover was measured and compared with the widelyaccepted benchmark OWL-S MX approach.

Index Terms—Multiagent systems, service clustering, servicediscovery, service matchmaking, web services

I. Introduction

SERVICE discovery problems in service oriented architec-ture (SOA) can be informally described as the task of

efficiently and accurately finding a relevant set of servicesthat satisfies a given service request from a user. Some of thereasons that make this problem computationally difficult arethat: 1) the number of services can be large and increasing(e.g., Web Services); 2) the system can be highly dynamicbecause of asynchronous service communication, addition,modification, and deletion of services; 3) the system is morelikely to be distributed because of independent ownershipand hosting of services; and 4) accurate service matchmakingfor service discovery can be computationally very expensive.

Manuscript received October 13, 2012; revised March 31, 2013; acceptedJuly 18, 2013. Date of publication September 17, 2013; date of current versionJune 12, 2014. This work was supported by the National Science Foundationunder Grant IIS 0742666. This paper was recommended by Associate EditorW. A. Gruver

S. Dasgupta and A. Aroor are with the Dhirubhai Ambani Institute ofInformation and Communication Technology, Gandhinagar, GJ 382007, India(e-mail: sourish−[email protected]; [email protected]).

F. Shen and Y. Lee are with the Computer Science Electrical EngineeringDepartment, University of Missouri, Kansas City, MO 64110 USA (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMC.2013.2281582

Hence, we need a service discovery system that is accurate,efficient, and scalable.

Many approaches have been proposed to tackle the servicediscovery problem. The methods followed in all of theseapproaches can be summarized into two steps. The first stepis to organize available service descriptions. The second stepis to run a discovery algorithm to efficiently navigate thesearch space of services and find those that are relevant. Twoconventional ways of organizing services are: 1) the thematicway, where services are organized based on their domains[43] according to the thematic categorization such as NAICS[28] and UNSPCS [44]; and 2) a functional feature basedclassification [14] and clustering [8].

Another important aspect of service discovery is servicematchmaking—the problem of pair-wise matching of servicedescriptions to understand the functional similarity betweenservices. The functional similarity analysis is the basis forservice organization and also service discovery for a givenquery description. The two common service matchmakingparadigms are syntactic [36] and semantic matchmaking [1],[23], [22], [31]. Syntactic matchmaking is based on a com-puting keyword-based similarity using statistical similaritymeasures (e.g., cosine similarity, KL divergence) or semantic-lexicon (e.g., WordNet) based statistical similarity measures.Semantic matchmaking techniques use ontologies and matchservices using description logic (DL) based subsumption rea-soning or taxonomic structural reasoning. These techniqueshave shown to give more accurate results as compared to thesyntactic approaches [18].

Service discovery can be designed to be centralized ordistributed. However, a centralized service discovery suffersfrom some major drawbacks that are innate to all central-ized systems. First of all, the discovery is not scalable foran increasing number of queries in a concurrent systemcausing considerable overhead and leading to high servicelatency. Secondly, a centralized discovery system is poten-tially a single point of failure since the discovery modulethat is physically running on a single server is vulnerableto unexpected termination. Finally, a centralized system canlead to high network traffic and thereby network bottleneck.These problems have motivated alternative research studies ondistributed discovery algorithms. Of the many contributions tothis problem, multiagent system (MAS) based SOA platformshave recently emerged as one of the important paradigmsin building distributed SOA platforms. Apart from makingSOA systems more distributed, modeling services as intelligent

2168-2216 c© 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

806 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 44, NO. 7, JULY 2014

agents are seen as a way of making services more flexible,proactive, and adaptive [16]. Such service agents can also bedesigned to co-operate when handling sudden changes likeupdate of services or complex requests for services.

In this paper, we first introduce SMARTSPACE—a multi-agent based distributed SOA middleware. In SMARTSPACE,services and users’ queries are modeled as agents. The middle-ware is distributed into a system of federated registries, each ofwhich is managed by a pair of middleware agents. The initialversion of SMARTSPACE was discussed in the first author’sdissertation [9]. Secondly, we propose a novel distributedalgorithm called SmartDiscover that can be further dividedinto three component distributed algorithms: 1) SmartCluster,2) SmartDirect, and 3) SmartMap. SmartCluster is responsiblefor efficient clustering of service descriptions into a semantictaxonomic cluster (STC) space [10], [11]. SmartDirect andSmartMap algorithms are responsible for satisfying a userquery by exploiting the structural and semantic propertiesof the STC space to efficiently discover the relevant serviceagents. The major contributions of this paper are as follows.

1) The SMARTSPACE middleware is elaborated.2) An efficient, sound, and scalable service organization

algorithm called SmartCluster is proposed.3) A novel directory data structure called the DA-directory

is discussed. The directory supports the efficient locationof the correct registry in the federated registry environ-ment.

4) An efficient service retrieval algorithm called SmartMapis proposed.

5) Empirical studies have shown that the algorithm signif-icantly prunes the search space while mapping a queryto a set of relevant services; thereby, reducing the queryresponse time.

II. Problem Statement

A. Problem Overview

The problem of service discovery begins with a consumerrequest for a service. These requests, in general, have a desire(output) component and an input component. The desire com-ponent is usually the output that the consumer expects fromthe service while the input component is the information thatthe consumer provides to the services. The service discoveryproblem is to find all services in a given service pool that cansatisfy the consumer request. Hence, essentially the problemof service discovery boils down to searching through thegiven service pool and finding the services whose outputmatches the consumer’s desired output, as well as whoseinput matches the consumer’s given input. Such a searchprocess generates a pool of usable services from which theconsumer has to select one based on his/her preferences. Thisproblem demands to solve the related problem of serviceorganization. An efficient organization of services will influ-ence the computational complexity of a discovery algorithmby determining the extent to which the search space can bepruned. Service organization involves characterizing servicesusing specific features. The features that are adopted in this

paper are service input and service output. These features arethe most defining features of a service in terms of specifyingits functionality. Additional features could be the preconditionand postcondition of services, but they are not used in thispaper.

B. Problem Formulation

Given a dynamic set of services S and a consumer’s servicerequest Q, find a set of servicesSR ⊆ S � fm : Q, S →SR, where f m is a matchmaking function that maps Q to therelevant set SR. Matchmaking in this paper is purely functionalin the sense that the input component of Q should match theinput requirement of each service in SR and the desired outputof Q should match the generated output of each service in SR.

III. Approach Overview

A. Semantic Service Matchmaking—Motivation

The use of semantics for annotating services tackles some ofthe important problems that are innate in non-semantic basedapproaches. Some of these problems are: 1) polysemy (i.e.,same words having the different meaning); 2) synonymy (i.e.,different words having the same meaning); and 3) meronymy(i.e., words having partitive relationship with each other).In SMARTSPACE, services and users’ queries are thereforesemanticized to avoid such issues. More specifically, everyservice’s input and output parameters are defined using a con-cept from a global domain ontology serving as a base ontology.Similarly, query input and output parameters are assumedto be mapped to concepts from the same global ontology.Another major advantage of mapping parameters to conceptsis that since the concepts come from an ontology, which isdefined formally using description logic (DL), we can ascertainsubsumptive relationships among the parameters (i.e., parent-child relationship) and thereby, the services themselves. Thisability to find subsumptive relationships among services helpsus to organize the entire service pool into a set of taxonomicclusters called the semantic taxonomic cluster (STC) [10],[11]. The STC taxonomy (simplistically, taxonomy) representsa group of services that are functionally similar. More detailsare provided in Section V.

B. Efficient DL Subsumption Reasoning

A significant disadvantage of the DL subsumption basedservice matchmaking is that the DL subsumption reasoningcan be as worse as EXP-TIME [3]. To overcome this limita-tion, we use a semantic-preserving bit encoding scheme calledBaseOntoEncoding where every concept in the ontology isassigned a bit code. An initial base taxonomy of all primitiveconcepts is constructed on the subsumption axioms in thebase ontology. The concepts are then assigned codes based ontheir position in the taxonomy. Such an assignment of codepreserves interconcept subsumptive relations. These codes areused to efficiently test DL subsumption between any twoconcepts in linear time. Derived concepts that are encoded(based on the primitive concept codes) are used in servicematchmaking. The process of assigning codes to the conceptsand matchmaking is explained in more detail in Section IV.

DASGUPTA et al.: SMARTSPACE: MULTIAGENT BASED DISTRIBUTED PLATFORM FOR SEMANTIC SERVICE DISCOVERY 807

Fig. 1. SMARTSPACE Platform.

C. SMARTSPACE Design Overview

The core feature of the SMARTSPACE platform is itsfederated registry system that follows a super-peer orienteddesign principle (Fig. 1). The underlying network overlay isa community of P2P whose member nodes are of two types:1) super-peer node, and 2) peer node. Super-peer nodes arefully connected to each other at any given point in time. Thesuper-peer nodes maintain a distributed service registry, sothat each super-peer node stores a part of the service registrydata. The peer node, on the other hand, hosts services and/orallows users to initiate queries. The service descriptions of theservices hosted in the peer nodes are sent to the super-peernodes where they get added to the distributed service registry.Similarly, the queries initiated at the peer-nodes are also sentto the super-peer nodes where services relevant to the queriesare retrieved using the distributed service registry. The goal ofthis underlying architecture is to:

1) efficiently cluster the services into a hierarchical STCspace across the dedicated super-peer nodes in a dis-tributed manner,

2) minimize the message exchange between super-peernodes that is required for the registry maintenance duringservice clustering and service discovery,

3) efficiently locate the super-peer node containing a par-ticular service among the set of federated registries withminimum registry lookups, and

4) efficient discovery of relevant services within an identi-fied super-peer node with maximum search space prun-ing.

The design of SMARTSPACE is based on the reactivemodel of a multiagent system (MAS) [49], [6] that has thepotential to deal with the challenges of implementing large-scale parallel and concurrent SOA based systems. Also, theuse of multiple agents can improve load balancing significantlyand also improve the overall performance of service discovery.Finally, agent modeling makes the platform resilient to unex-pected bottlenecks or a sudden crash of nodes because of themobility feature of the agents. In the SMARTSPACE platform,

the service providers’ peer nodes host agents called serviceagents (SA), which represent the actual service and incorporatethe executable code. Each of the dedicated super-peer nodeshosts a pair of special middleware agents called directoryagents (DA) and blackboard agents (BA). The user requestis also embodied in an agent called user agent (UA). TheUA modeling would be beneficial in maintaining an ac-tive user representative even if the user is not logged intoSMARTSPACE after sending the request. It also provides theflexibility to have negotiation based service discovery whenrequired. This is based on user preferences, but negotiation isbeyond the scope of this paper. It can also allow for mobileservice discovery if services need to be operated from one lo-cation to another. In terms of transferring service descriptions,there can be two design choices: 1) file-based message passing,and 2) agent based mobility. The advantages of the file basedapproach are that it reduces the number of agents that needto be created for service discovery and hence, significantlydecreases the overall network overhead in terms of interagentmessage passing and memory utilization. On the other hand,the disadvantage of this approach is that if the SOA system ishighly concurrent with considerable query overhead per super-peer, then the average query response time is significantlyaffected. It is because the BA has to organize the servicedescription (i.e., file) into STCs and then map queries of thefile in a centralized manner that forces sequential processingof files and queries. However, if sufficient memory is availablewith super-peer, then the same job can be distributed toagents representing service descriptions. Another reason forusing mobile agents to incorporate service descriptions is tohandle dynamic behavior. Each mobile agent can keep track ofchanges in the corresponding service it represents and hence,get updated automatically. Such light-weight agents are calledservice helper agents (SHAs). SHAs do not need to have theexecutable code, but they move from a source (a peer nodewhere its corresponding SA resides) to a destination (a super-peer node where it is organized into the STC space). We haveimplemented both file-based and SHA-based approaches andthe results are given in Section VI. The rest of the paper willbe discussed in terms of the SHA based design with no lossof generality.

The information about the STC clusters and the super-peernodes contain a particular STC that is stored using a twolevel directory. The first level, called the DA-directory (sinceit is maintained by DAs), contains information that helps inlocating the subset of super-peer nodes containing the desiredservice(s). The second level, called the BA-directory (since itis maintained by BAs) contains information which helps infinding the clusters living within a particular super-peer nodethat contain the desired services for a user query. The two-leveldesign approach to service organization has some advantagessuch as: 1) the DA-directory eliminates queries that do nothave any relevant service right at level one, thereby stoppingunnecessary searches but conducting a service only in a super-peer where they might find relevant services; and 2) the BA-directory makes the organization of services more efficientby identifying the new services to the right clusters and theright super-peer nodes. The data structure of the directory is


Fig. 2. BaseOntoEncoding Example.

tightly coupled with the bit-code based semantic encoding ofservice descriptions. The characteristic features of the datastructure and its corresponding associated algorithms are basedon the service matchmaking algorithm using the semantic bit-encoding technique in the next section. A detailed discussionof the directory structures is given in Section V.

IV. Encoding-Based Service Matchmaking

The foundation of our service matchmaking algorithm isa concept encoding scheme, called BaseOntoEncoding. Thescheme enables encoding of a primitive in a generalized (i.e.,axiomatic) ontology into a corresponding binary bit string sothat the semantic interpretation of the subsumption axioms canbe preserved. In other words, if a subsumption axiom exists,then we can deduce this relationship using their correspondingcodes by doing an algebraic operation. It is to be noted that anydefinitional ontology can be converted into a correspondingequivalent axiomatic ontology. We now provide the sketch ofthe algorithm in the following subsection.

A. BaseOntoEncoding

The BaseOntoEncoding algorithm is a simple topologicalspanning over the base ontological taxonomy TBS (generatedinitially by a DL reasoner) to codify all the base concepts.The span starts with assigning code [0∗] to the universalparent concept and finishes with assigning the code [1∗] tothe universal child concept ⊥ (Fig. 2). The superscript [∗]means that the 0/1 is repeated over an n bit string lengthwhere n refers to the current number of concepts in TBS .During the topological spanning a new 1 bit, called the mostsignificant bit (MSB), is assigned to a concept that signifiesits unique identity. The assignment is done at the code stringposition that corresponds to the order of the topological span.For example, in Fig. 2, the order of the visit to the conceptCar is five. Hence, a one bit is assigned to the 5th positionof the code string (i.e., 10000). While assigning a code toa concept at a particular visit, all the codes of its parentconcepts are ORed (disjunction) together so that all of theirrespective uniqueness can be inherited. This ORed code is

then concatenated with the newly assigned 1 bit. Thus, theconcept Car is encoded as [0*]10011 where the code 11 atpositions one and two is inherited from the code of its singleparent LandVehicle (whose code is [0*]11). Codes assignedin this manner are called the DLcode. If there are multipleparents, then their DLcode will be inherited by the child (e.g.,the code of Sea Plane in Fig 2.). An important property of theencoding algorithm is that it always assigns a unique code toany given concept within the base ontology. The uniqueness isguaranteed by the assignment of the most significant bit duringthe topological span. The following theorem states there is anecessary and sufficient condition for the two given concepts(C and D) to have a subsumptive relation.

Theorem 1: C � D ↔ DLcode(C) ∨ DLcode (D) =DLcode(C).

Proof: See [9] for proof.An example for DLcode subsumption can be that of theconcept Car and the concept LandVehicle in Fig. 2.

Car � LandVehicle ↔ DLcode (Car) ∨ DLcode

(LandVehicle) = [0∗] 10011∨ [0∗] 11 = [0∗] 10011 = DLCode

(Car) .

Thus, the concept LandVehicle subsumes the concept Car.

B. Scalability of BaseOntoEncoding

In reality, more than one ontology might be used to rep-resent some of services parameters. In such cases, ontologymerging is necessary. But it is a non-trivial task becauseit involves some linguistic and formal representational am-biguity resolutions. In the context of the FaCT system[15],BaseOntoEncoding only encodes the primitive concepts ofmultiple base ontologies used in annotating services. Forthe sake of simplicity, all primitives having same labels areassumed to have a common interpretation. However, in thecase of synonymy where a primitive concept, say Car, in oneontology that has the same interpretation as another primitiveconcept Auto. In a different ontology the semantic ambiguityis resolved using the WordNet. It can be observed that givenany realistic service domain ontology, since the number ofprimitives is limited, we can safely assume that the totalprimitives of the merged ontology are also limited. On rareoccasions, the encoded ontology will also be updated. Thederived concepts in the ontologies will not be directly resolvedsince by encoding the primitives we are able to encode thederived concepts without any increase in the code length.This is because the derived concepts can be encoded by OR-like/AND-like operations over the primitive concept codes,thereby preserving the length of the primitive codes. Differentencoding rules that might be applied to the DL operators andquantifiers are beyond the scope of this paper.

C. Query Model and Encoding

The user service request (or query) has two essential partsthat a user provides: 1) the desire (output) information, calledQueryDesireOutput (QO), and 2) the input information, calledQueryInput (QI). We formalize the query as follows.

Definition 1: Query Model: A query Q is an ordered pair ofQI, QO � QI ∩Q O = ∅,where QI is a non-empty set of the


given input parameters of the query and QO is a non-emptyset of the expected or desired output parameters of the query.Using the BaseOntoEncoding algorithm, we encode queriesinto a string of bits (1s and 0s) so that the taxonomic propertiesof the query parameters (both QI and QO) are preserved.

1) Single Parametric Query: If a QI or QO contains onlyone parameter, it gets the bit code of the concept, whichit carries. For example, if the parameter is Car, then itis encoded as DLcode (Car) = [0*]10011.

2) Multiparametric QO: If there is more than one parameterin the QO, then the encoding can be done by disjunction(ORing) of the codes of all the parameters. For example,if the output parameter set is Car, Confirmation, then itis encoded as follows:DLcode (QO) =DLcode(car)∨DLcode (confirmation) .

The idea is to have a single DLcode for the entire QOso that a holistic bit comparison can be done whilecomparing the QO with any SO. More details are givenin Section IV–E.

3) Multiparametric QI: If there is more than one param-eter in the QI, then the bit-code for the QI is a setconsisting of DLcodes of all the parameters of theQI . For example, if the input parameter set is {name,ID}, then the bit-code of the input query is a set of{DLcode (name) , DLcode(ID)}

D. Service Model & Encoding

Service descriptions can be formally modeled in terms ofservice input (SI) and service output (SO).

Definition 2: Service Model. A service description S isdefined as an ordered pairSI, SO � SI ∩S O = ∅, where SIis a nonempty set of input parameters of the service, whileSO is a nonempty set of output parameters of the service. Thetwo descriptions SI and SO of a service are encoded, and theencoding process is similar to Query Encoding.

1) Single Parametric Service: If an SI or SO containsonly one parameter, it gets the bit code of the conceptthat it carries. For example, if the parameter is {car},then it is encoded as DLcode(car) = [0∗]10011.

2) Multiparametric SO: If there is more than one pa-rameter in SO, then the encoding can be done bydisjunction (ORing) of the codes of all the parameters,e.g., If the output parameter set is {AutoSpecification,RentConfirmation}, then it is encoded as:

DLcode (SO) = DLcode(AutoSpecification) ∨DLcode (RentConfirmation)

An important point to note here is that the ORing ofthe SO parameters in a single unified code is equivalent tojoining concepts using a union operator in DL. Hence, servicelevel matchmaking cannot be done by only comparing theunified code of SO and the unified code of QO. This isbecause for a particular QO and SO having the same unifiedDLcode does not imply that the QO and SO will form amatch. For example, consider a QO with two parameters:{[0∗] 0011, [0∗] 1100} . The unified DLcode of QO will be[0∗]0011 ∨ [0∗]1100 = [0∗] 1111. Now consider a SO whose

parameters are {[0∗]1001, [0∗]0110}. The unified DLcode ofSO will also be[0∗]1001 ∨ [0∗]0110 = [0∗] 1111. In thisexample, the unified codes of QO and SO are the same, butthe individual parameters of QO and SO do not match (i.e.,[0∗]0011 �= [0∗]1001 and [0∗]1100 �= [0∗]0110). Hence, theservice and the query cannot form a match. Thus, match-making based only on the unified DLcode will result in falsepositives. This happens since the DLcode equivalent of a DLunion is not guaranteed to be unique. To solve this problem,we again test the subsumption using the individual outputparameter codes. This is explained in Sections IV–E, and V–C.

3) Multiparametric SI: If there is more than one param-eter in SI, then the bit-code for SI is a set of DLcodeof all the parameters in the SI . For example, if the inputparameter set is {name, address}, then the bit-code ofSI is a set of {DLcode (name) , DLcode(address)}

E. Service/Query Semantic Matchmaking

In this section, we introduce a service matchmaking tech-nique for computing semantic similarity between two servicesor between a service and a query. The fundamental idea is toutilize a semantic subsumptive relation between two servicedescriptions (and similarly, between a query and a service) tofigure out whether for a given SO, there exists any followingrelationships with another service: 1) exact match of SO;2) child match (plug-in) of SO; 3) parent match (calledsubsume match) of SO; 4) sibling match of SO; and 5) nomatch of SO. In the case of a pair-wise service-query matchingfor service retrieval, the given SO is matched likewise withthe QO. In [19], [25], [18], [31], and [41] the sibling casewas not considered for the service-query matchmaking. OurDLcode Matchmaking (Details are available at [9]) was basedon four semantic matchmaking cases as follows.

Exact Match (EM)—The exact match is a case of semanticmatchmaking between two semantic descriptions if, for everyDL concept within one description, there exists a correspond-ing equivalent DL concept within the other description. Plug-in Match (PM)—The plug-in match is a case of semanticmatchmaking between two semantic descriptions if at leastone DL concept exists within one of the descriptions thatdefinitionally satisfies (i.e., subsumed by) at least one DLconcept within the other description. Subsume Match (SM)—The subsume match is just the inverse match of the plug-inmatch. Sibling Match (SBM)—The sibling match is a case ofsemantic matchmaking between two semantic descriptions Xand Y if: 1) there is no EM or PM or SM between X andY, and 2) at least one parameter in X exists that has a match(either subsumed by or subsumes or least common subsumer)with any other parameter of Y.

V. Service Discovery

A. Semantic Taxonomic Cluster (STC) Space

Before coming to the SmartDiscover algorithm, we first givea brief formal description of STC as proposed in [10], [11].

Definition 3 [Semantic Service Taxonomy (STC)]: STC isa partial-order 〈s, �M〉where s is a service and the order is


Fig. 3. DA-directory as Dynamically Maintained by aDA.

over the match relation � (EM, PM, SM, SBM) so that aunique supremum (or least specific predecessor) exists and iscalled the root service. STC (in brief taxonomy) has somebasic properties as shown here.

1) STC is a cluster of feature similar (FS) services wherethe feature is SO.

2) FS in an STC can be of four types: 1) EM, 2) PM, 3)SM, and 4) SBM.

3) FS with respect to an STC is non-distance based. Inother words, the similarity condition is not based onany measure but rather on the type of �M match type.

We now define an STC space as follows:Definition 4 (STC Space): The STC space is a dynamic set

of STCs.In the context of SMARTSPACE, the STC space is either

a hierarchical organization of files or SHA (Service HelperAgents) depending upon the modeling choice. The STC spacehas several properties that make it unique from cluster spacesgenerated in conventional learning algorithms. The memberSTCs within the space are not necessarily disjointed fromeach other. This is because a particular SHA (for the latterdesign choice) can have a PM match with more than oneparent SHA, each of which is a member of separate STCs.For example, a car rental SHA having SO = {car info, rentalconfirmation} may have PM with both a vehicle rental SHAhaving SO = {vehicle confirmation} and a vehicle lookup SHAhaving SO = {vehicle info}. In this example, the vehicle rentalSHA and the vehicle lookup SHA belong to two differentSTCs (i.e., STCs having two distinct root SHAs).

B. SMARTSPACE Directories

There are two major directories: the DA-directory and theBA-directory that are designed for the SMARTSPACE agentsto accomplish SmartDiscover.

DA − directory: The DA agent maintains the DA-directory. The DA-directory is an efficient lookup directorythat is based on the DL-Encoding of QO and the DL-Encodingof the SO of the SHAs. The DA-directory is a table of N2-tuples DA ID, MCDA IDwhere N is the total number ofsuper-peer nodes (each containing a DA/BA pair). Thus, ifthere are four DAs in total within a system of four super-peers, then each DA will have four rows in its DA-directorycorresponding to each of these four DAs. DA−ID is the uniqueID of a DA agent, and MCDA−ID is a code called mashcode (MC) (Fig. 3). The mash code (MC) associated witha particular DA is an n-ary disjunction of the DLcodes of

the root SHAs living in the STC space within the super-peer node where the DA lives. The root SHAs representsa particular taxonomy within this STC space since it is themost generic SHA in terms of its SO parameters. Thus, themash code contains all the significant 1-bits of each root SHAand hence, represents the entire STC space of that super-peernode. Each DA holds an identical copy of the DA-directory.Hence, if there is any update within a particular row of theDA-directory then all the other DAs need to be notified aboutthe update. More about update process is given in Sections V.Fand V.G.

BA − directory: The BA-directory keeps the BA’s ownrecord of the SHA taxonomies. The BA-directory is a tableof M 2-tuples SHA ID, DLcode(SHA ID)where M is thetotal number of taxonomies in the super-peer node whereBA lives, SHA−ID is the unique ID of a root SHA, andDLcode(SHA−ID) is the corresponding DLcode of the SO ofthe SHA (Fig. 4). Unlike the DA-directory, the BA-directoryis completely unique to a particular BA and is not shared orcopied. The BA-directory improves the SmartCluster processand the SmartMap process by decreasing the number of inter-agent communications in these two processes. The directoryalso helps a BA to communicate with its corresponding DAwhen an existing taxonomy gets deleted from the STC space.

C. SmartDirect Algorithm

The service clustering algorithm SmartCluster and the querymatching algorithm SmartMap both require a common algo-rithmic module called SmartDirect. This algorithm is run bythe DA to make sure that a particular SHA can be fit intothe STC cluster space inside the super-peer where it resides(for SmartCluster) or to make sure whether there exists arelevant service in that STC space for a UA request (forSmartMap). In other words, it checks to see if an SO or a QO(respectively) have been satisfied. A particular super-peer issaid to satisfy a Query if there is at least one SHA in that super-peer node which has f� match (parent or child or sibling). TheDA-directories are specifically designed for this purpose. Wehereby discuss the case of the SmartDirect in the SmartMap,which is identical to its use in the SmartCluster (Fig. 5). Sinceall DAs have the same identical DA-directory, a UA just needsto request only one DA agent with the QO. For checking thesatisfiability of the QO, two binary bit operators, AND andXOR are used. A DA performs a SmartDirect AND operationover its own (MC) and DLcode of QO. The result of the ANDoperation is then XORed with the DLcode of QO. There canbe three cases for the QO satisfiability testing: 1) failure, 2)partial success, and 3) complete success. Failure: If the AND

operation results in 0, then there is no match at all. In this case,the DA agent looks into the remaining tuples to find a match.If all the tuples end up giving 0 as the AND result, then theDA lets the UA know that there is no SHA (and hence, no SAis retrieved.) that can satisfy its QO. Partial Success: If theAND operation does not result in 0, then the DA agent doesan XOR operation with the result and the DLcode of the QO.If the XOR result is non-zero (called residue code), then thereexists a partial match between the QO of the requesting agentand the available SHA in the DA’s container. This indicates


Fig. 4. BA-directory as Dynamically Maintained by the BA.

Fig. 5. Smart Discover: SmartDirect followed by SmartMap.

that there are more SHAs in the other super-peer nodes whichcan satisfy the query as well. In such a case, the DA then takesthe residue code and matches that with the other MCs in itsdirectory. The residue code is essentially the left-over 1-bitsof the QO that are still to be matched. Complete Success: Ifthe XOR result in the previous case is 0, then the DA agentknows that the SHA agents of its own super-peer satisfy theQO completely. In the case of a complete success the DAagent does not need to look into the other tuples for the matchbecause of the exclusive existence of a particular STC clusterin a single super-peer node.

Once the DA knows about the satisfiability of a QO and alsothe container where the QO, if satisfiable has a solution set, itredirects the query to the BA agent for further processing ofthe QO (SmartMap). We now prove that the AND followed bythe XOR (denoted by ⊕) operation is a sound and completemethod to understand the satisfiability of a QO.

Theorem 2: Given a DL-Encoded QO called Q and a set ofMC (say SMC), Q is satisfiable if ∃M ∈ SMC, (Q ∧ M) �= 0.

Proof: Since M contains all the significant 1-bits of the rootSHAs, if there exists a corresponding 1-bit of M in Q then theAND operation produces a corresponding 1-bit. Hence, thereexists a taxonomy corresponding to the significant 1-bit of Mthat has a f� match with Q. Therefore, Q is satisfiable.

If we assume that Q is satisfiable then Q must have an f�match with at least one M in SMC. An f� match implies thatthere must exist a match that is either: 1) exact, or 2) plug-in, or 3) subsume, or 4) sibling. According to definitions, for

Fig. 6. SmartDirect during SmartCluster Process.

any of these matches, at least one-pair of corresponding 1-bitsmust exist. Hence, ∃MεSMC, (Q ∧ M) �= 0.

We now prove that XOR is a sound and complete test forpartial success as follows:

Theorem 3: Given a DL-Encoded QO called Q and a set ofMC (say SMC), there exists a partial success with a DA (sayDx) if ∃Dx, M � (Q ∧ M) = R �= 0 → R ⊕ Q �= 0.

Proof: If ∃Dx, M � (Q ∧ M) = R �= 0 → R ⊕ Q �= 0 isassumed to be true then: (a) R must have at least one 1-bitwhose corresponding bit in Q is a 0-bit or (b) Q must haveat least one 1-bit whose corresponding bit in R is a 0-bit. Thefirst case is contradictory since R is an AND product and mustcontain all the matching 1-bits of Q with M. Therefore, thesecond case must be true. If that is so, then there are 1-bitsin Q that are yet to be matched. Therefore, Q has a partialsuccess with DX.

If we assume that Q has a partial success with DX then Qmust have at least one 1-bit whose corresponding bit in M is a0-bit. Thus, R will not contain this 1-bit of Q. Hence, the XORoperation will output at least one 1-bit corresponding to theunmatched 1-bit of Q. In other words, ∃Dx, M � (Q ∧ M) =R �= 0 → R ⊕ Q �= 0 is true.

D. SmartCluster: Distributed STC Clustering

We now discuss the distributed service clustering algorithmcalled SmartCluster and how the DA agent and BA agenthelp to carry out the clustering process. SmartCluster is adistributed version of the semantic taxonomical clustering(STC) elaborated in [9]–[11]. SmartCluster is based on mutualcommunication between SHA and the middleware agents (DAand BA). The algorithm is explained as follows.

Step 1: Whenever a new SHA agent communicates withthe DA agent living in the super-node to know if it can livein the existing STC space, the DA agent checks if there existsa taxonomy that can accommodate the new SHA.

Step 2: This checking is done by calling SmartDirect(Fig. 6). There are three possibilities after the SmartDirectoperation is done by the DA: 1) the SHA has no MC match atall with any of the DAs, 2) the SHA has an MC match withthe current DA, and 3) the SHA MC matches with the DAsin different super-peer nodes.

Step 3.1: NoMCMatchwithanyoftherow: This case im-plies the SHA has to form a new taxonomy of its own. In


Fig. 7. SmartCluster: Pruning Search Space & Iteration.

this case, the SHA starts living in the current super-peer nodeas a root node of a new taxonomy. The DA updates its DA-directory by ORing the code of the new root SHA with theexisting MC for the current super-peer node. All other DAagents are subsequently notified about this change.

Step 3.2: MC Match with current DA’s MC: This caseimplies that the new SHA has to stay in the current super-peer node. In other words, it has anf� match with one ormore taxonomies present in the current STC space. In such asituation, the DA agent communicates with the correspondingBA agent and asks for all those matching taxonomies.

Step 3.2.1: The BA then looks up its BA-directory tuplesone by one and does a subsumption match. In this way, theBA filters out all root SHAs that have an f� match (parent,child, or exact) with the new SHA.

Step 3.2.2: After the filtering process, the BA then notifiesthese root SHAs about the new SHA.

Step 3.2.3: Each individual root SHA then computes whatkind of f� match it has with the new arrival. If the f�matchis subsume, then it tells the new SHA to consider itself asits new child. If the f� match is plug-in, it sees whetherthe new SHA can be a parent or a sibling of its currentchildren SHAs.

Step 3.2.3.1: If the test (called test of parenthood) ispositive, then it tells the new SHA that is now its new parentand also tells the affected children SHA that they have a newparent (i.e., the new SHA) and the new SHA gets clustered.

Step 3.2.3.2: However, if the test is negative, then it justtells its immediate children SHA to repeat the entire test ofparenthood individually. This eventually stops either if the testof parenthood is positive or if the SHA agent under consid-eration has no children. Thus, from an individual point ofview, a particular SHA only communicates with its immediatechildren and the new SHA (Fig. 7).

Step 3.3: MC Match with DA’s in different

super − peer nodes: This case implies that there aretaxonomies existing in other super-peer nodes where the newSHA has an f� match. In such a situation, the DA agent tellsthe new SHA to move to those matching super-peer nodes.After this operation is completed, each of the SHA agentsstarts communicating with the BA agents in the same way asexplained in the previous match case.

The sequence diagram of SmartCluster is given in Fig. 8.

Fig. 8. SmartCluster Sequence Diagram.

E. SmartMap: Query Processing Algorithm

SmartMap is a distributed algorithm for processing the QO(QueryDesireOutput) when it gets redirected from a DA agentto a set of BA agents. The algorithm follows the same logicas that of the SmartCluster except that the goal is to find asolution set of SHA that can satisfy the query.

Step 1: The SmartMap process is initiated by a BA agentwhen a QO is redirected to the BA agent by a DA agent.

Step 2: Each BA agent then looks up the BA-directory andmaps the QO onto the matching STC space of the SHA. TheBA agent looks up its BA-directory tuples one by one anddoes an f� match (subsume). In this way, the BA filters outall root SHA that have f� match as subsume with QO.

Step 3: After the filtering process the BA agent notifies theentire root SHA containing the QO.

Step 4: Each individual root SHA then computes whatkind of f� match it has with the new QO. Whenever theroot SHA finds out the type of f� match it has with theQO, it immediately contacts the UA agent telling that thecorresponding SHA is a potential matcher for the QO andalso telling whether the SHA is a strong matcher (for an exactor plugin relation between SO and QO) or a weak matcher(for a subsume or sibling relation between SO and QO).

Step 4.1: If the match is a weak match then the queryis forwarded (message: QUERY−FORWARD) to all the rootSHA’s children.

Step 4.2: The children SHA apply the same algorithm tocheck if each of them is a strong or weak matcher for thegiven QO. The process recursively ends if a set of leaf SHAis reached.

Step 4.2.1: If any SHA is a strong matcher then each of itschildren will also be a strong matcher and the SHA sends amessage (QUERY−REPLY ) to all its children to reply to theUA as strong matchers.

The sequence diagram of SmartDirect and SmartMap con-stituting SmartDiscover is shown in Fig. 9. The SmartMapalgorithm is given below:


F. DA-directory Update

Updates in DA-directory can take place for four reasons:CASE 1: The MC of service changes in the directory

because of a new service into the STC space. This happensdue to a complete mismatch of all parameters of an incomingnew SHA when the SO is compared to the MC. This resultsin a new root SHA in the STC space with no children.

CASE 2-A: The MC changes because of a partial mismatchof some parameters of an incoming new SHA when comparedto the MC. This leads to residue matching with MCs of other

Fig. 9. SmartDiscover Sequence Diagram.

DAs in the rest of the tuples. In such a case, the new SHAmay become a new root SHA with children SHAs.

CASE 2-B: It may also happen the same way as theprevious case so that there is a non-zero residue after all MCsmatch. In such a case, this non-zero residue code is tested forsubsumption with each of the individual parameter codes ofthe SHA. Every parameter, which subsumes the residue code,is then ORed to compute a mashed up code. This mashedup code, called MCX, represents all the parameters in theSO that has a subsumption relationship with the residue codeand hence, does not have a representation in the DA-directory.Now, to make the DA-directory complete, any DA-directoryrow, which has a match with the original SHA code is updatedby MCX ORing with the original DA-directory MC. TheMCX is also sent to the SHA as a part of the response alongwith the matched super-peer node IDs. When the SHA movesto the matched super-peer nodes to get itself clustered, theMCX will be selected and added to the BA-directory by thecorresponding BAs in the super-peer nodes before the SHAagent is clustered in the super-peer node.

CASE 3: The MC changes because of an existing taxonomydeletion within the STC space. However, this case is relativelyrare because the category changes in the STC space may nothappen often due to the nature of functional service categories.

CASE 4: The DA agent gets killed because of a node crash.This case is also not a very frequent event for super-peer nodesand requires only deletion of the corresponding tuple in theDA-directories of other existing DAs.

It is important to note that the DA-directory is updated onlywhen a SHA that has a parameter belonging to new domainontology is clustered. Every time a DA-update happens, thenew domain information is entered into the DA-directory.In a given ontology, the total number of domain-specificontological taxonomies is limited. Hence, as more and moredomains are indexed into the DA, the probability of a newSHA creating a new STC goes down. To analyze the growth ofDA-updates (or a new taxonomy creation) over an increasingnumber of services, this problem was handled similar to thecoupon collector problem described in [27].

We see each taxonomy as a coupon and every new incomingservice as an attempt to pick one of the coupons randomly


(uniform distribution) with replacement. It has been shownthat the expected number of trials required to pick all the ndifferent coupons is given by the expression:

E(the number of trials to pick all n coupons) = O(n ∗ log(n))Hence, equivalently we can claim that the number of

services to be clustered to encounter all T taxonomies is equalto:

E(services to be clustered for T ) = O(T ∗ log(T )).Since S = T ∗ log(T )vs.T is a curve whose slope is

continuously increasing, the curve Tvs.S (i.e., the number ofnew taxonomies encountered versus the number of services tobe clustered) will be a function with a continuously decreasingslope. In other words, the rate of DA-updates will keep ondecreasing as more and more services get clustered, thusmaking the DA-update highly scalable.

G. BA-Directory Update

The BA-directory undergoes a lot more updates than theDA-directory. This is because the joining or leaving of the rootSHA happens more often than the creation and extinction oftaxonomies as a whole.

CASE 1. SHA joining: Whenever a new root SHA agentis identified by the BA agent, it has three options to take: 1)

create a new tuple and insert the root information, 2) modifyan existing tuple and update the root information, and 3) nochanges in the BA-directory. The first option is taken if thenew root SHA has no f�match (subsume or plug-in) withany root entry in the directory. This means that a new STChas been created in the super-peer node. The second option istaken if the new root SHA has a subsume match with at leastone root entry (i.e., the new SHA is a parent of the existingroot SHA). In such a case the matched root entry is updatedwith the new root information. The root information updatealso happens when a BA receives the MCX message with theSHA request. The MCX message is ORed with the existingmatched root entry to get the new root entry and then thematched original root entry is updated by the new root entry.The last option does not make any change.

CASE 2. SHA leaving: Whenever an old root SHA leavesthe STC space (mostly because it will be terminated) then theBA agent just has to remove its corresponding tuple from theBA-directory and insert new tuples that correspond to the oldroot’s immediate children within its STC. The BA only reportsto the DAs if there is no child SHA (i.e., the STC gets deleted).

VI. Experimental Results

A. Setup

System Setup: We carried out the experiments on two dif-ferent system infrastructures: 1) a single laptop system (O/S:64 bit Windows 7; CPU: Intel Quad Core i7 @ 2.40 GHz;RAM: 8 GB), and 2) IBM Cloud host with five instances(O/S: Windows v2008 R2; vCPU: 2; RAM: 4 GB). Out of thefive nodes, two are in Raleigh, USA, one in Boulder, USA,one in Ehningen, Germany, and one in Markham, Canada.

Agent Platform Setup: The implementation of theSMARTSPACE framework was built on top of two platforms:1) JADE v3.7 MAS platform [4], and 2) NetLogo v5.0.4 MASsimulation platform [48] as follows:

JADE based SMARTSPACE: JADE is a foundationfor intelligent physical agents (FIPA) compliant agent de-velopment toolkit that provides efficient support for agent-based distributed system development. Agent communicationin JADE based SMARTSPACE implementation is over http-based media transfer protocol (MTP). In SMARTSPACE, eachnode (super-peer and peer) is implemented as a JADE con-tainer in the IBM Cloud instances where software agents canbe created and maintained. The Java runtime environment wasEclipse Helios. More detailed information regarding JADE canbe found in [4]. While JADE based experimentation representsthe real-time deployment scenario of SMARTSPACE that isdue to the limitation of infrastructural support (with only fiveinstances on the IBM Cloud), there was some limitation toreveal a realistic large-scale evaluation of SMARTSPACE. Itcould be noted that we can have an efficient, concurrent, andparallel SMARTSPACE deployment of, at the most, five super-peer nodes on five instances.

NetLogo based SMARTSPACE: Referring to the agentsimulation research [13], we designed the SMARTSPACE ex-periments using a NetLogo simulation environment. NetLogois a widely accepted agent simulation platform developed by


CCL, Northwestern University, USA. Netlogo simulates theunderlying large-scale P2P network overlay of SMARTSPACEand hence, can support several thousands of agents and hun-dreds of super-peers that cannot be achieved with the heavy-weight JADE implementation. Netlogo based SMARTSPACEwas implemented on a single laptop system. We restricted ourexperimental study to a maximum of 3000 services (which isthe approximate number of services on the web as reportedin [2]).

Experimental Setup: The simulation framework consistedof two parts: to build a domain that is semantically representedas a set of ontologies and to construct the service hosting P2Poverlay on top of both JADE and NetLogo agent platforms.For the first module of the simulation, called OntoGenerator,we synthetically constructed a set of ontologies and thenencoded them using BaseOntoEncoding. The ontologies wereconstructed as random acyclic graphs where we could controlthe subsumptive sparseness in terms of the diversity factor(DF). We define DF as follows:

Definition 5: DF of a given set of ontologies Os is themaximum probability with which two concepts, Ci and Cj,

belonging to Os, have a mutual subsumption with each other.From the definition, we can understand that if the DF value

is high, chances are also high that we get a very narrow Os

that in itself signifies a very specialized domain in terms ofdepth of individual ontologies. In contrast, if the DF value isset low, then Os signifies a very generic and diverse domain.We also designed the simulation environment so that we havecontrolled over the size of Os in terms of the total number ofconcepts. By keeping the Os size fixed if the DF is increasedthere is a high chance that the specificity of the domain will beincreased and we will get a highly specialized domain. For thesecond module of the simulation, called OverlayDesigner, wedesigned a control mechanism for adjusting the total number ofpeer nodes and the total number of super-peer nodes within theP2P network overlay. We then implemented a random servicegenerator that created a pool of services by randomly selectingDL-Encoded concepts from Os as service input and serviceoutput parameters. After this stage, services were randomlyassigned to the peer nodes.

B. Evaluation Measure

Evaluation of the SMARTSPACE platform was imple-mented from three different perspectives:

1) By Measuring the runtime efficiency and message over-head of the SmartCluster algorithm for service clusteringand measuring the impact of various parameters like thenumber of super-peer nodes, the number of services tobe clustered, and the domain specificity.

2) By measuring the efficiency of the SmartDirect andSmartMap algorithms for service retrieval, i.e., by mea-suring the variation of the query response time for adifferent number of clustered services in the system.

3) By Measuring the accuracy of SmartDiscover and com-paring it with the widely accepted benchmark servicediscovery algorithm based on OWL-S MX measure [18].The accuracy was tested in terms of precision and recallon the standard test dataset OWL-S TC v2 [29].

Fig. 10. Centralized SmartCluster performance.

Fig. 11. Distributed SmartCluster performance.

We did a comparative analysis of a distributed versus acentralized SMARTSPACE system. A simulation environmentthat represents a SMARTSPACE instance was set.

VII. Results

For the purpose of a qualitative comparative analysis forthe SmartDiscover algorithm, we conducted a comprehensivereview of the current research on distributed service discov-ery. We faced some difficulties in conducting a comparativeanalysis with existing agent-based works. First, many workson agent-based service discovery were architecturally designoriented and did not provide experimental results (see [12],[17], [20], [24], [30], [34], [37], and [41]). Secondly, some ofthe works have different evaluation objectives such as servicecomposition, although there are some comment issues onservice composition and service discovery (see [21], [39], [42],and [47]). To counter these issues, the distributed performanceof SmartDiscover was evaluated compared to the centralizedversion (with only one super-peer node for one DA-BA pair).

SmartCluster Evaluation: We first observed the runtimeperformance of SmartCluster in terms of JADE based imple-mentation and NetLogo based implementation. JADE basedimplementation was done on two types of system resources:1) single laptop, and 2) IBM Cloud instances (five virtualinstances). For the single laptop performance, we comparedboth the design choices: in-memory file based and SHA based(Fig. 10). As expected, we observed a significant improvementin the runtime for the former case as the number of SAsincreases in the system (with a maximum of 100 sec for400 SAs as compared to the SHA-based performance of291 sec). It is to be noted that the observation made in boththe cases was for a centralized setup with 1 super-peer (i.e.,


Fig. 12. Centralized/distributed SmartCluster performance.

Fig. 13. SmartCluster performance with DA-BA effect.

one DA-BA pair). We also saw that due to lack of memoryof the single machine, the experiment could not be scaledup more than 400 SAs for the SHA-based implementationwhile the same experiment could run 4000 SAs for the file-based one. A similar behavioral pattern was observed whenthe same experiment was conducted on the IBM Cloud setup(Fig. 11). We saw a considerable improvement in runtimefrom 130 sec for the SHA based (2000 SAs) to 90 sec for thefile based (same 2000 SAs). We also observed a significantgain of scale in terms of this experiment when done on theCloud environment. However, the trade-off (as explained inSection III) is this model may not be beneficial for highlyconcurrent and resourceful distributed systems.

We conducted the same experiments on NetLogo withthe SHA based implementation to understand the worst casescenario in a distributed setup (Fig. 12). We saw that asignificant improvement is achieved when the SMARTSPACEis switched from the centralized mode (one DA-BA pair)having a maximum runtime of 864 tick sec (approximate timefor one NetLogo tick time) to a distributed mode of 10 DA-BApairs having the maximum runtime of 116 tick sec that furtherreduces it to 92 tick sec for 20 DA-BA. This observation wasmade for a maximum of 3000 SAs. Similar supportive resultscan be observed in Fig. 13 as well.

We then analyzed the effect of the DF (i.e., domain speci-ficity) on SmartCluster (Fig. 14). The results show a significantrise in the runtime as the DF gets increased from 0.1 (146 ticksec for 3000 services) to 0.9 (946 tick sec for 3000 services).We observed that the distribution of SHAs (the same will befor file based) decreases significantly as the DF value increasesand gets concentrated mostly on a few super-peer nodes. Thisis because the higher DF for a given merged service ontology

Fig. 14. SmartCluster performance with effect of domain specificity.

Fig. 15. SmartCluster message overhead with effect of domain size.

Fig. 16. SmartCluster message overhead with increasing #services.

increases the probability of functionally similar SAs gettinginjected into the system. As a consequence, the effect of theDA-BA pairs (10 for this observation) was highly dampened.

We also analyzed the effect of a number of concepts (i.e.,domain size) on SmartCluster (Fig 15). We observed that theruntime for 3000 SAs linearly rose from 112 tick sec for 4000concepts to 190 tick sec for 8000 concepts given a constantdomain specificity of 0.1. This observation can be attributedto the fact that that the probability of SAs being functionallysimilar to each other drastically goes down with an increasein concept size leading to injection of a lot more root SHAsinto SMARTSPACE. This in turn leads to an increase inthe SmartDirect AND-XOR based MC checking in the DAdirectories. Moreover, the new root SHA goes through a lotof false matchmaking with existing SHAs to check whetherthere exists any set of child SHAs of this new root SHA. Thissignificantly increases the overall clustering time.

To analyze the inter DA and DA-BA message overhead per-formance in SMARTSPACE, we took some more observations


Fig. 17. SmartCluster message overhead with effect of DA-BA.

Fig. 18. SmartCluster message overhead with effect of DF.

on SmartCluster. In Fig. 16, there was a slow rise of messageoverhead for a setup of 5 DA-BA pairs from 55 messages(for 500 SAs) to 140 messages (for 3000 SAs). The messagesaccount for the number of DA updates. The result shows theefficiency of cluster directory management in terms of messageoverhead. The reason for this is as the number of SAs isincreased, the possibility of formation of new SA categoriesgets lowered. Updates are only required when new clustertaxonomies are formed. For a given service domain, it is moreor less constant.

We observed in Fig 17 the effect of DA-BA on the messageoverhead. The number of update message was significantlyincreased from 110 messages for 5 DA-BA pairs to 400messages for 20 DA-BA pairs (both on 1500 services). Thegrowth is because for each update, a particular DA has tocommunicate with more DAs although the curve has no sig-nificant difference as the number of SAs increase. We observedin Fig. 18 the effect of the DF on the message overhead. Thenumber of update messages shows a characteristic significantdecline from 110 messages for 0.1 DF to 25 messages for0.5 DF (both results on 1500 services). The reason is as thespecificity increases the number of updates decreases since theprobability of the formation of new root SHAs per super-peernode decreases.

We finally observed in Fig. 19 the effect of the domainontology size (#concepts) on the message overhead. Theresults show that as the number of concepts increase, thenumber of messages significantly declines from 115 messagesfor 2000 concepts to 35 for 5000 concepts keeping the DFfixed to 0.1. This happens because, with the increase in thenumber of concepts, the probability of the SAs being mutuallysimilar decreases leading to the injection of a lot more root

Fig. 19. SmartCluster message overhead with Effect of Concept Size.

Fig. 20. Centralized SmartMap query response time.

Fig. 21. Distributed SmartMap query response time.

SHAs. This in turn dramatically decreases the inter DA-BAcommunication for clustering new matched SHAs. Also newSHA redirections to other super-peers get highly reducedleading to reduction in the new inter SHA-BA communication.

SmartMap Evaluation: The SmartMap evaluation is donewith respect to two objectives: 1) average query response timeperformance, and 2) average accuracy as compared to thebenchmark discovery algorithm based on OWL-S MX. Thetimer starts when a UA is created and ends when the UAgets all the results (i.e., responding SHAs) back. For queryresponse time on a single machine with the JADE file basedimplementation we saw a very promising low response timeof 3.4 sec for 1000 SAs and 7.3 sec for 4000 SAs (Fig. 20).The growth can be observed to be linear with an acceptableslope as the number of services increase.

Similar results were observed in Fig. 21 when we conductedthe experiment on the IBM Cloud setup. The results showthat the average query response time grows from 0.5 secfor 100 SAs to 1.5 sec for 2000 SAs (for the file based


Fig. 22. SmartMap query response time—centralized versus distributed.

implementation). For the SHA based implementation, it grewfrom 1.2 sec for 100 SAs to 2.4 sec for 2000 SAs. This showsthat in a distributed setup, the JADE based implementation ofSmartMap can also be significantly improved. The growth isfurther lowered if the system is distributed as simulated in theNetLogo SHA based that is implemented in Fig. 22. Similarto the results of SmartCluster in Fig. 12, we see a significantimprovement in the results of SmartMap from 3.6 sec for 3000SAs (1 DA-BA pair) to 1.22 sec for the same 3000 SAs whenwe deploy 10 DA-BA pairs.

In terms of service discovery accuracy, we compared themean 11 point interpolated precision versus the average recallof the SmartMap algorithm with that of 5 different service dis-covery techniques proposed in the OWL-S MX [18] (Fig. 23).To evaluate the accuracy of SmartMap, we used the query setgiven in the OWL-S TC v2 data set (871 web services). Adetailed discussion of the accuracy measurement is given in[9]. The query set contains 29 queries, each accompanied byits corresponding expert-evaluated set of relevant services.

The objective was to understand how our DL Encodingbased matchmaking scheme (as incorporated in SmartMap)performed when compared to other matchmaking techniquesthat have used the same OWL-S TC v2 data set as we did.The methods M0-M4 shown in Fig. 23 are the differenttypes of query matching algorithms compared by OWL-S MX.M0 is a pure description logic [3] based matching algorithmthat considers only the semantic definitions of the serviceparameter terms. They are hybrid matchmaking techniquesthat use both semantic definitions of parameters as wellas tokens from service descriptions. M1 makes use of lossof information measures (LOI), M2 uses extended Jacquardsimilarity coefficient, M3 uses the cosine similarity values, andM4 uses the Jensen-Shannon information divergence basedsimilarity values. We found that SmartMap had a significantimprovement over the chosen benchmark (Fig. 23).

The reason for SmartMap having much better accuracyperformance in comparison to M0 is that although both ofthem are purely based on subsumption matching of parameters,the clustering technique that uses M0 is based on an ad-hoc comparison with the innate assumption that clusters aremutually disjoint. This falsely excludes services that mayhave a subsumption match with member services in multipleclusters. Also, the case of sibling matching (e.g., car rentaland bus rental) is not accounted for in M0-M4 matching.

Fig. 23. SmartMap average accuracy (benchmark: OWL-S MX).

Again, this falsely splits services into separate disjoint clusters.Moreover, the M0, which is based on the Paolucci order [32],allows false inclusion of services within clusters as strongmatches. This is because of a higher universal preference ofthe plug-in match over the subsume match, which assumes thematch strength order is preserved.

VIII. Related Work

A. Non-Agent Based Service Discovery

Most of the research work on non–agent based servicediscovery focuses on two approaches: 1) centralized discoveryand 2) distributed discovery. Centralized approaches have beenproposed in [5], [38], and [43]. The UDDI-based approachused syntactic matching [43]. In [5], ontology based matchingwas done, and a hyper-graph based service discovery wasefficiently conducted using both centralized and distributedapproaches [38]. In the centralized version, a single centralregistry encoded all service descriptions and used multidimen-sional indexes (e.g., R-tree) to efficiently discover services, butthey suffered from the issues of scalability and a single pointof failure.

Various distributed approaches were proposed to counter thedrawbacks of the centralized approaches in [7], [25], [31],[35], and [50]. In EASY [25], the matchmaking was semanticand based on a prime number based encoding technique. Forlarge domains, the prime number based code might growvery quickly and finding new prime numbers would be moredifficult. They also developed EASY-Ariadne [25], whichwas a distributed service discovery model. This distributedregistry was developed and indexing of the registries was doneusing codes generated by hashing service parameters. Thishashing based technique to find the right registry containingthe required service details could have false positive cases.A comprehensive comparison of many important distributedservice discovery approaches was given in [33]. In [32], aGnutella based P2P system was developed that used OWL-Sas the service description language. Service requests generatedat a given node were broadcast through the network usingcontrolled-flooding.

Syntactic DHT based systems [36] were very efficientin terms of message passing; however, they supported onlykeyword-based, exact queries that were only syntactic. DHTbased P2P systems ([7], [35], [50]) were developed for seman-tic service discovery. In [50], every service was mapped to a


particular domain concept. A key-data pair of concept and itsrelevant service catalog was stored in the Chord DHT. A newquery was mapped to a particular concept and the service cat-alog related to this concept was retrieved. In situations whereone query can be satisfied by a concept or its child concepts, aseparate DHT chord algorithm needed to be run for every childconcept. This will increase the message overhead in the sys-tem. However, in SmartDiscover all the child concepts can bediscovered simultaneously by a simple operation (i.e., plug-inmatches).

The DHT + SON based approach [33] proposed to over-come the drawback by creating a semantic overlay network.Semantic neighbors can be retrieved for a given service bymeasuring the semantic similarities with other services. Super-peers [46] are a federation of registries containing informa-tion about a particular domain. Together, these super-peersformed a distributed registry. This approach worked wellin a static environment, because of providing scalable, butstatic, solutions with its very low message overhead. Thereis a lack of mechanism to ensure registries are updated inreal-time; thus, this may affect the accuracy during the servicediscovery.

B. Multiagent System Based Service Discovery

In service discovery in a MAS ([12], [17],[20], [24], [26],[30], [34], [37], [39], [42], [45], [47]), a service was modeledas an agent and specialized middle agents like matchmakingagents or broker agents was used [40]. In [20], [26], [34],and [47], a centralized middle agent was used for servicediscovery with a centralized knowledge base that formeda bottleneck either because of a single agent handling thematchmaking or because of the middle agents depending ona single global knowledge base. In [30], multiple servicefinder agents (SFA), each having its own registry, handled userrequests, but these requests were broadcasted to every SFAcausing high message overhead. In [17], along with the useragents and the service agents, a whole set of mediator agentswere proposed for service matchmaking and for discoveryservice agents on behalf of the User agents. In [12], a task-ontology based service discovery framework was proposed.The ontology contains the definitions of predefined task-basedqueries. The architecture consists of service requestor agents(for representing the queries) and service provider agents (fordecomposing a complex query based on the task ontology),and a matchmaking agent. In all of these agent-based ap-proaches, except [42], the service discovery was handled eitherby a single middle agent or by multiple middle agents havinga single repository, thus making it hard to scale.

AgentCities based framework was proposed in [34]. It hasbeen identified in this paper that the current framework wasneither robust nor scalable because of the star-based topology.Moreover, the global knowledge of the system formed abottleneck. In [39], an agent based system was designed forservice selection, which was treated as separate from servicediscovery. Service selection was defined as selecting the rightservice from a given set of discovered services.

In [21], an agent-based architecture was proposed for rep-resenting service instances, identifying participant services

in a composition, tracking service agents, and processing acomposition plan. The main limitations of this frameworkwere: 1) the composition process was centralized at thecomposite agent and hence, not scalable; and 2) the globalsystem knowledge was computationally expensive. Similarly,in [24] along with user agents and service agents, a set ofmediator agents was proposed for service matchmaking anddiscovery service agents for user agents.

In [37], services information documents were stored indistributed servers, and servers were regarded as independentagents. A searching-tree creating algorithm was used to joinservers into a tree structure and a recursive searching algorithmwas used to distribute a searching request over the tree. In [42],a group of similar services were modeled as agents. Servicediscovery was done by flooding the network with requests,and the matched agents responded back. This caused a highmessage overhead.

SmartDiscover offers an agent based hybrid-P2P approach,which is similar to the super-peer approach, which combinedthe positives of P2P based approaches. Super-peer basedapproaches were scalable with low message overhead. Unlikeother super-peer based approaches [46], [25], we used liveagents to represent service descriptions. This made the serviceclustering process more of the self-organizing process thancentralized algorithms trying to modify a data structure. Thewhole model becomes flexible because now the SHA canlive in any of the super-peer nodes and still be part of thesame taxonomy. More importantly, by modeling services andservice representatives as agents, agent capabilities can befurther utilized in SMARTSPACE to realize more complicatedfunctions like service composition, negotiation, and contextawareness.

IX. Conclusion and Future Work

In this paper, we have presented the SMARTSPACE plat-form that is based on the multiagent based distributed se-mantic service discovery. Specifically, the capability of theSmartDiscover algorithm in providing a faster and moreaccurate service discovery, which is applicable to dynamicand non-deterministic environments, is described. We obtainedpromising positive results of service discovery in terms ofboth average query response time and the number of messageexchanges required to maintain the distributed registry.

References

[1] R. Akkiraju, R. Goodwin, P. Doshi, and S. Roeder, “A method forsemantically enhancing the service discovery capabilities of UDDI,” inProc. IJCAI Workshop Inf. Integr. Web IIWeb, 2003, pp. 1–6.

[2] E. Al-Masri and Q. H. Mahmoud, “Investigating web services on theworld wide web," in Proc. 17th Int. Conf. World Wide Web, Aug. 2008,pp. 795–804.

[3] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, The DL Handbook. Cambridge, MA, USA: Cambridge Univ.Press, 2003, pp. 146–148.

[4] F. Bellifemine, G. Caire, and D. Greenwood, Developing Multi-agentSystems with JADE. West Sussex, England, U.K.: Wiley, 2004.

[5] B. Benatallah, M.-S. Hacid, A. Leger, C. Rey, and F. Toumani, “Onautomating web services discovery,” VLDB J., vol. 14, no. 1, pp. 84–96,Mar. 2005.


[6] R. Brooks, “A robust layered control system for a mobile robot,” IEEEJ. Robot. Autom., vol. 12, no. 1, pp. 14–23, Mar. 1986.

[7] D. Chakraborty, A. Joshi, Y. Yesha, and T. Finin, “Toward Distributedservice discovery in pervasive computing environments,” IEEE Trans.Mobile Comput., vol. 5, no. 2, pp. 97–112, Feb. 2006.

[8] M. A. Corella and P. A Castells, “Heuristic approach to semantic webservices classification,” in Proc. 10th Int. Conf. KES, 2006, pp. 598–600.

[9] S. Dasgupta, “A semantic framework for event-driven service composi-tion,” Ph.D Dissertation, Univ. of Missouri—Kansas City, Kansas City,MO, USA, 2011.

[10] S. Dasgupta, S. Bhat, and Y. Lee, “Taxonomic clustering and querymatching for efficient service discovery, application and experiences,”in Proc. ICWS, 2011, pp 363–370.

[11] S. Dasgupta, S. Bhat, and Y. Lee, “Taxonomic clustering of web servicefor efficient discovery,” in Proc. ACM CIKM, Oct. 2010, pp. 1617–1620.

[12] V. Ermolayev and N. Keberle, “Towards a framework for agent-enabledsemantic web service composition,” Int. J. Web Services Res., vol. 1,no. 3, pp. 63–87, 2004.

[13] G. Fortino and W. Russo, “ELDAMeth: An agent-oriented methodologyfor simulation-based prototyping of distributed agent systems,” Inf.Softw. Technol., vol. 54, no. 6, pp. 608–624, 2012.

[14] A. Heß and N. Kushmerick, “Learning to attach semantic metadata toweb services,” in Proc. 2nd ISWC, 2003, pp. 258–273.

[15] I. Horrocks, “The fact system in automated reasoning with analytictableaux and related methods,” in Proc. Int. Conf. Tableaux, vol. 1397,1998, pp. 307–312.

[16] M. Huhns, M. P. Singh, M. Burstein, and K. Decker, “Research direc-tions for service-oriented multiagent systems,” IEEE Internet Comput.,vol. 9, no. 6, pp. 65–70, Nov.–Dec. 2005.

[17] M. A. Ketel, “Mobile Agent based framework for web services,” in Proc.47th Annu. Southeast Regional Conf., 2009, p. 10.

[18] M. Klusch, B. Fries, and K. Sycara, “OWL-S MX: A hybrid semanticweb service matchmaker for OWL-S services,” Web Semantics: Sci.,Services Agents WWW , vol. 7, no. 2, pp. 121–133, Apr. 2009.

[19] R. Lara, M. A. Corella, and P. Castells, “A flexible model for web servicediscovery,” in Proc. 1st Int. Workshop SMR, 2006, p. 51.

[20] J. Lee, S.-J. Lee, H.-M. Chen, and C.-L. Wu, “Composing web ser-vices enacted by autonomous agents through agent-centric contract netprotocol,” Inf. Softw. Technol., vol. 54, no. 9, pp. 951–967, 2012.

[21] Z. Maamar, S. K. Mostefaoui, and H. Yahyaoui, “Toward an agent-based and context-oriented approach for Web services composi-tion,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 5, pp. 686–697,May 2005.

[22] D. Martin, M. Burstein, D. Mcdermott, S. Mcilraith, M. Paolucci,K. Sycara, D. L. Mcguinness, E. Sirin, and N. Srinivasan, “Bringingsemantics to web services with OWL-S,” World Wide Web, vol. 10, no.3, pp. 243–277, 2007.

[23] S. A. McIlraith and T. C. Son, “Semantic web services,” IEEE Intell.Syst., vol. 16, no. 2, pp. 46–53, Mar. 2001.

[24] D. Mennie and B. Pagurek, “An architecture to support dynamic com-position of service components,” in Proc. 5th Int. WCOP, 2000, pp.1-8.

[25] S.B. Mokhtar, D. Preuveneers, N. Georgantas, V. Issarny, and Y. Berbers,“EASY: Efficient semantic service discovery in pervasive computingenvironments with QoS and context support,” J. Syst. Softw., vol. 81,no. 5, pp. 785–808, 2008.

[26] K. Mong and S. Senior, “Agent-based cloud computing,” IEEE Trans.Services Comput., vol. 5, no. 4, pp. 1–14, Oct.–Dec. 2011.

[27] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge, MA,USA: Cambridge Univ. Press , pp. 57–63, 1995.

[28] North American industry classification system [Online]. Available:http://www.census.gov/eos/www/naics/

[29] OWL-S TC datasets [Online]. Available: http://lpis.csd.auth.gr/systems/OWLS-SLR/datasets.html

[30] E. Paikari, E. Livani, and M. Moshirpour, “Multi-agent system forsemantic web service composition,” in Knowledge Science, Engineeringand Management. Lecture Notes in Computer Science, vol. 7091. Berlin,Germany: Springer, 2011, pp. 305–317.

[31] M. Paolucci, T. Kawamura, T. Payne, and K. Sycara, “Semantic match-ing of web services capabilities,” in Proc. ISWC, 2002, pp. 333–347.

[32] M. Paolucci, K. Sycara, T. Nishimura, and N. Srinivasan, “UsingDAML-S for P2P discovery,” in Proc. ICWS, 2003, pp. 1–5.

[33] G. Pirro, D. Talia, and P. Trunfio, “A DHT-based semantic overlaynetwork for service discovery,” Future Generat. Comput. Syst., vol. 28,no. 4, 689–707, 2012.

[34] A. Poggi, M. Tomaiuolo, and P. Turci, “An Agent-Based ServiceOriented Architecture,” in Proc. WOA, 2007, pp. 157–165.

[35] M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, “A scalable andontology-based P2P infrastructure for Semantic Web Services,” in Proc.2nd Int. Conf. Peer-to-Peer Comput., 2002, pp. 104–111.

[36] C. Schmidt and M. Parashar, “A peer-to-peer approach to web servicediscovery,” World Wide Web Internet Web Inf. Syst., vol. 7, no. 2,pp. 211–229, 2004.

[37] L. Shijian, “A new agent based service discovery mechanism,” in Proc.IEEE Int. Conf. Syst., Man, Cybern., vol. 4, Oct. 2004, pp. 3296–3300.

[38] D. Skoutas, D. Sacharidis, V. Kantere, and T. Sellis, “Efficient semanticweb service discovery in centralized and P2P environments,” in Proc.Semantic Web ISWC, vol. 5318, 2008, pp. 583–598.

[39] R. M. Sreenath and M. P. Singh, “Agent-based service selection,”Web Semantics: Sci., Services Agents World Wide Web, vol. 1, no. 3,pp. 261–279, Apr. 2004.

[40] K. Sycara, M. Paolucci, J. Soudry, and N. Srinivasan, “Dynamic dis-covery and coordination of agent-based semantic web services,” IEEEInternet Comput., vol. 8, no. 3, pp. 66–73, Jun. 2004.

[41] K. Sycara, S. Widoff, M. Klusch, and J. Lu, “Larks: Dynamic match-making among heterogeneous software agents in cyberspace,” Autonom.Agents Multiagent Syst., vol. 5, no. 2, pp. 173–203, 2002.

[42] H. Tong, J. Cao, S. Zhang, and M. Li, “A distributed algorithm forweb service composition based on service agent model,” IEEE Trans.Parallel Distributed Syst., vol. 22, no. 12, pp. 2008–2021, Dec. 2011.

[43] UDDI. (200). The UDDI technical white paper [Online]. Available:http://www.uddi.org/

[44] United Nations. United Nations standard products and service code[Online]. Available: http://www.unspsc.org

[45] E. del Val, M. Rebollo, and V. Botti, “Enhancing decentralized servicediscovery in open service-oriented multiagent systems,” AutonomousAgents Multi-Agent Syst., pp. 1–30, Oct. 2012.

[46] K. Verma, K. Sivashanmugam, A. Sheth, A. Patil, S. Oundhakar, and J.Miller, “METEOR-S WSDI: A scalable P2P infrastructure of registriesfor semantic publication and discovery of web services,” Inf. Technol.Manage., vol. 6, no. 1, pp. 17–39, Jan. 2005.

[47] X. Wang, W. Niu, G. Li, X. Yang, and Z. Shi, “mining frequent agentaction patterns for effective multi-agent-based web service composition,”in Proc. 7th Int. Conf. ADMI , 2011, pp. 211–227.

[48] U. Wilensky. (1999). NetLogo itself [Online]. Available:http://ccl.northwestern.edu/netlogo/

[49] M. Wooldridge, An Introduction to Multiagent Systems. New York, NY,USA: Wiley, 2002.

[50] S. Yu, J. Liu, and J. Le, “Decentralized web service organizationcombining semantic web and peer to peer computing,” in Web Services.Lecture Notes in Computer Science, vol. 3250, pp. 116–127, 2004.

Sourish Dasgupta received the Ph.D. degree incomputer science from the University of Missouri,Kansas City, MO, USA, in 2011.

Currently, he is an Assistant Professor with theDhirubhai Ambani Institute of Information andCommunication Technology, Gandhinagar, India.His current research interests include the areas ofservice oriented computing, distributed multiagentsystems, and semantic web.

Anoop Aroor received the bachelor’s degree inelectronics and communication engineering fromVishveshvaraya Technological University, Belgaum,Karnataka, India, in 2009. He is currently pursuingthe master’s degree in information and communica-tion technology at the Dhirubhai Ambani Institute ofInformation and Communication Technology, Gand-hinagar, India.

He was with iGATE-Patni, Bangalore, India.His current research interests include service ori-ented computing, multiagent systems, and artificial

intelligence.


Feichen Shen received the B.S. degree in computerscience from Nanjing University, Jiangsu, China,and the M.S. degree in computer science from theUniversity of Missouri, Kansas City, MO, USA. Heis currently pursuing the Ph.D. degree in computerscience at the University of Missouri.

He completed his internship at the BiomedicalStatistics and Informatics branch, Health ScienceResearch Department, Mayo Clinic, Rochester, MN,USA.

Yugyung Lee received the B.S. degree from theUniversity of Washington, Seattle, WA, USA, in1991, and the Ph.D. degree in computer science fromthe New Jersey Institute of Technology, New Jersey,NJ, USA, in 1997. She has received grants supportedby the National Science Foundation and the NationalInstitute of Health.

Currently, she is an Associate professor in theCSEE Department, University of Missouri, KansasCity, MO, USA. Her current research interests in-clude semantic web, machine learning, cloud com-

puting, service oriented architecture, and mobile applications.

smartspace multiagent based distributed platform for semantic service discovery

Education