introduction to cyber foraging, tools and techniques

Upload: richygill

Post on 03-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    1/38

    Introduction to Cyber Foraging, Tools and Techniques

    R. Gill

    June 30, 2010

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    2/38

    2

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    3/38

    Contents

    1 Introduction 1

    1.1 Example Design Brief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1.1 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    2 Cyber Foraging 5

    2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Cyber Foraging Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2.1 Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2.2 Chroma Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2.3 AIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2.4 Networked integrated Multimedia Middleware (NMM) . . . . . . . . . . . . . . . . . 8

    2.2.5 Goyal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.2.6 Slingshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2.7 Vivendi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2.8 EyeDentify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2.9 DiET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2.10 Instant-X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.11 Scavenger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.4.1 Surrogate Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3 Appendix 25

    i

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    4/38

    ii CONTENTS

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    5/38

    List of Figures

    1.1 Cyber foraging architecture showing mobile client in a pervasive compute environment with

    bi-directional communication to surrogates and remote multimedia content server . . . . . . . 3

    2.1 Classification of cyber foraging systems as thin and thick ATA and AAA . . . . . . . . . . . . 16

    3.1 The Spectra API from [Flinn02a] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.2 Simple Architecture of Spectra from [Flinn02a] . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Example of a generated Tactic File from [Balan03a] . . . . . . . . . . . . . . . . . . . . . . 26

    3.4 A description of Chroma Tactic components from [Balan03a] . . . . . . . . . . . . . . . . . . 26

    3.5 Overall architecture of AID from [Messer02a] . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.6 A simple NMM flow graph from [Lohse05a] . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.7 NMM buffering and of nodes from [Lohse05a] . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.8 NMM registry hierachy from [Lohse05a] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.9 NMM code snippet adapted from from [Lohse05a] . . . . . . . . . . . . . . . . . . . . . . . 27

    3.10 Goyal architecture from [Goyal04a] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.11 Instantitating a new application replica in Slingshot adapted from Su [Su05a] . . . . . . . . . 28

    3.12 An example of Vivendi tactic file from [Balan07a] . . . . . . . . . . . . . . . . . . . . . . . . 283.13 An example of Vivendi file with tactic definition and remote invocation parameters from [Balan07a] 29

    3.14 The Ibis framework employed in EyeDentify from [Kemp09b] . . . . . . . . . . . . . . . . . 29

    3.15 DiET mobile code reductin flowpath from [Kim09a] . . . . . . . . . . . . . . . . . . . . . . 29

    3.16 DiET API showing main working components from [Kim09a]] . . . . . . . . . . . . . . . . . 30

    3.17 Scavenger client surrogate interface showing surrogate daemon components frontend and code

    execution environment from [Kristensen09a] . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    iii

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    6/38

    iv LIST OF FIGURES

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    7/38

    Glossary of Terms

    JVM Java Virtual Machine (JVM)

    Proxy Object Proxy Object acts as an intermediary between the client and an accessible object.

    The purpose of the proxy object is to monitor the life span of the accessible object

    and to forward calls to the accessible object only if it is not destroyed.

    RTP Real-time Transport Protocol

    RTPS Real-Time Publish-Subscribe (RTPS) Wire Protocol provides two main communication models:

    the publish-subscribe protocol, which transfers data from publishers to subscribers;

    and the Composite State Transfer (CST) protocol, which transfers state.RPC Remote Procedure Call (RPC) is an inter-process communication that allows a computer

    program to cause a subroutine or procedure to execute in program to cause a subroutine

    or procedure to execute in subroutine or procedure to execute in the details for this

    remote interaction.

    Overhead Any combination of excess or indirect computation time , memory, bandwidth, or other

    resources that are required to attain a particular goal

    Broadcast Transferring a message to all recipients simultaneously.

    Unicast Transmitting the same data to all possible destinations

    Multicast Delivery of a message or information to a group of destination computers simultaneously

    in a single transmission, uses User Datagram Protocol UDP

    SIP Session Initation Protocol, The Session Initiation Protocol is an IETF-defined

    signaling protocol, widely used for controlling multimedia communication sessions such asvoice and video callsover Internet Protocol (IP).

    COTS Commercial Of The Shelf software

    v

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    8/38

    vi LIST OF FIGURES

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    9/38

    The exponential uptake of mobile computing devices, such as smartphones and PDAs, is problematic with re-

    gard to streaming multimedia rich content to a mobile device. The advantage and attraction of a portable light

    mobile computing device is mobility. Portable power supply technology has not increased the power require-

    ments of increased CPU and processing capabilities of mobile devices such as smartphones. Subsequently,

    a shortfall exists between available and required power for processing and playing of streamed multimedia

    content. However, with portability comes constraint on physical size, consequently such devices are compute

    resource limited and unable to execute applications requiring high compute processing such as digitising and

    rendering multimedia rich content. The CPU, memory and energy overheads of such multimedia applications

    outstrip the capabilities of thin mobile clients, and handheld devices. Although smart phone CPU and memory

    has increased recently, the drain on compute energy still limits usage of high intensive compute application

    processes. One method to reduce required power requirement is to offload power consuming processes to

    surrogates, called cyber foraging [Satyanaranyanan01a]. Access and availability to compute surrogates is pre-

    dicted to become ubiquitous in future pervasive compute environments. The implementation of cyber foraging

    requires particular essential capabilities such as surrogate (service) discovery, establishing trust, partitioning

    what application task operations to give the surrogate, and scheduling surrogate tasks.

    In this work a number of current cyber foraging tools and techniques available to the mobile application

    developer are described. A typical mobile application design brief is included to provide a context in which

    cyber foraging might be used.

    1.1 Example Design Brief

    The example application is initially required to automatically search and discover available surrogates. When

    required, the application compares compute resources of local (on mobile client) entity and remote entity(surrogate), with minimum required compute resources to complete multimedia processing tasks. If local

    compute resource is less than minimum required compute resource, the application offloads the processing

    task to the surrogate. The surrogate performs the task and streams processed multimedia content back to the

    application. The Use Case scenarios describe human interaction and operation of the application in a pervasive

    compute environment.

    1.1.1 Use Case

    This use case is intended to show how a the proposed system may be used in practice, and illustrate what is

    required of the system to fulfill its goals.

    Hubert is at an airport departures waiting for his plane to go to France. The airport departure lounge is a

    designated pervasive compute environment. Hubert is an avid football fan and takes out his mobile phone to

    watch the highlights of the latest match played by his favourite football team. Hubert has recently registered

    1

    1. Introduction

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    10/38

    2 CHAPTER 1. INTRODUCTION

    with a service provider that provides high definition video footage of replays of recent football matches. His

    phone has already discovered that it is in a pervasive compute environment. Hubert starts his phone application

    to watch the goals scored by his favourite football team in their most recent match. Unknown to Hubert his

    mobile phone only has the computing power to process high definition video at the lowest quality and with

    intermittent stops and jerks during playback. His phone application detects the shortfall in compute power and

    redirects the video from the service provider to surrogate(s) in the departure lounge that carryout the video

    processing and stream the video to Huberts phone, where he watches the footage in high quality.

    When Hubert arrives in France, he tries out his new phone language translation application. Again, his

    phone does not have the computing power to run the application. Hughbert looks for a cyber cafe or public

    building such as a library or museum that normally have pervasive compute environments. He spots an cyber

    cafe sign and walks towards it, as he approaches the cafe entrance his phone discovers the surrogates and

    displays a message that the language translation application is ready to be executed. Outside the cafe, Hubert

    asks some locals directions to his hotel using the language translation application on his phone. During this,

    his phone offloads the translation application processing tasks to the surrogates in the cafe, that process the

    tasks and return processed data back to the phone application. Eventually, one of the locals recognises the

    name of Huberts hotel and with the help of the translation application Hubert was able to understand the route

    he must take to get to his hotel.

    Before setting off to his hotel Hubert takes a picture of the local who gave him directions, but his mobile

    phone could not execute the language translation application and the phones high mega pixel camera at the

    same time. Instead of forcing Hubert to close the language translation application, the phone offloads the

    camera application processing to the surrogates in the cyber cafe.

    1.1.2 Design Considerations

    While we use the term mobile client, in this work the scope of mobile is limited to mobility within a single

    pervasive compute environment. The use case describes a cyber foraging architecture similar to that shown

    in network A and B shown in Figure 1.1. In both A and B, communication channel 1 is between client and

    surrogates within a pervasive compute environment. Communication channel 1 is required for access between

    client and surrogate, for surrogate discovery, offloading/distributing application tasks to surrogate, distribution

    of client and surrogate details, and sending processed task data back to mobile client. Communication channel

    2 is required for access between client and remote multimedia content provider, for remote application exe-

    cution/invocation and transfer of network data and network entity details. Communication channel 3 enables

    access between remote server and pervasive compute environment, for transfer of content for processing and

    network details.

    Previous approaches to supporting client application using cyber foraging, have taken either a thin client

    approach, or thick client approach. In network A, previous solutions to run remote applications have used

    thin client solutions such as VNC [Richardson98a] SSH [SSH11a], or web based services such as GoToMyPC

    [Citrix11a]. Thin client solutions have the advantage of not requiring modification to the application, and

    ease of use. However, thin client solutions require low network connection latency to be effective, and are

    consequently reliant on bandwidth. To decrease reliance on latency and bandwidth, approaches using thicker

    client solutions have been used.

    A thick client approach is one in which a part of the application executes on the mobile client, therefore

    when there is inadequate bandwidth, a low quality degraded version of the application may run on the client.

    Mobile clients that run a complete application are the extreme of the thick client approach, these make any

    attempt at cyber foraging completely redundant. The use case describes an example of both thin and thick client

    approaches with cyber foraging support. Network A and B are basically thin and thick client implementations,

    respectively.

    The thick client approach may be classified into two methods. In method one, called application-transparent

    adaptation ( in this work ATA), existing application APIs on surrogates and remote servers are used to control

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    11/38

    1.1. EXAMPLE DESIGN BRIEF 3

    Figure 1.1: Cyber foraging architecture showing mobile client in a pervasive compute environment with bi-

    directional communication to surrogates and remote multimedia content server

    and communicate with the client application. ATA is limited to application APIs ability to adapt to external

    commands. The second method, called application-aware adaptation(in this work AAA), requires explicit

    modification to the application, to work at runtime. AAA enables greater scope for application adaptability

    during runtime, and is not limited by prior application adaptation constraints. However, the application-aware

    method does require manual intervention and access to application source.

    A hybrid of ATA and AAA are approach in which some code from the client application is transferred to

    the surrogate during cyber foraging. The application code transferred during cyber foraging is called mobile

    code, if choosing a hybrid approach the application must balance the proportion of mobile code to transfer

    to a surrogate with respect to the whole application code, and the time required to execute mobile code on

    a surrogate prior to cyber foraging taking place. In effect, mobile code prepares the surrogate by rewriting

    existing surrogate application, this is different from transferring identification and authentication information.

    Whichever approach or method is used, the design of a cyber foraging system to support a mobile client

    application in a pervasive compute environment should consider as a minimum, the following four design

    features.

    1. Discovery of surrogates - Without being aware of the presence of available surrogates no cyber foraging

    can take place.

    2. Trust establishment between communicating entities - The transient nature of cyber foraging means that

    initially both client and surrogate are unknown un-trusted entities that must satisfy some form of trust

    criteria before any interaction can take place.

    3. Partitioning application tasks to offload to surrogates -

    4. Scheduling tasks for offloading and to surrogates and retrieving completed task data to the client appli-

    cation.

    Human interaction should also be considered, ubiquitous computing ideally requires no conscious human

    interaction. However, the option of a manual mode that prompts the user to interact prior to offloading mobile

    code to surrogates has psychological usability benefits regarding privacy. Finally mobility, although the scope

    of mobility in this work is confined to a single pervasive compute environment, the user would still be expected

    to move location within the environment.

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    12/38

    4 1. INTRODUCTION

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    13/38

    2.1 Introduction

    A decade ago, Satyananrayanan [Satyanaranyanan01a] accurately described and predicted pervasive computing

    as the next step foreword in computing evolution, as a hybrid of distributed and mobile computing. Motivation

    behind the prediction was continual advances made in distributed and mobile computing a decade previous to

    Satyananrayanan towards the realisation of Weisers [Wieser91a] vision of ubiquitous computing. Integrating

    distributed and mobile computing technology continues to present unique challanges. Cyber Foraging is

    the term used by Satyananrayanan to describe a potential solution for one of the unique challenges pervasive

    computing must address in order to integrate distributed and mobile computing seamlessly within a pervasive

    computing environment. The specific unique challenge cyber foraging seeks to address, is to dynamically

    divide and distribute application processing tasks of a resource constrained mobile client, to remote surrogateresources that perform the processing tasks on behalf of the mobile client, and return the processed data back

    to the mobile client application. Each potential remote resource surrogate hosts some kind of service that the

    cyber foraging system searches for; consequently the activity of searching for remote resource surrogates is

    called service discovery. Implementing cyber foraging (also described as offloading [Gu04a]) for a mobile

    client within a pervasive environment requires four fundamental stages of operation. Stage one is service

    discovery of remote surrogate entity(s), stage two is establishing trust between mobile client and resource

    surrogate entity(s), thirdly partitioning the application processing tasks between local and remote execution

    [Balan07a], and fourthly scheduling the partitioned processing tasks to the correct surrogate resource entity

    [Kristensen10a]. In combination, these four stages define t he fundamental features required of a cyber foraging

    system within a pervasive computing environment.

    Intuitively, the speed with which a process task can be offloa ded and processed by surrogate is paramount.

    If the cyber foraging process cannot maintain transparency to the user, in the form of uninterrupted application

    usage, then the cyber foraging system has failed. Factors that affect the offloading and processing of partitioned

    tasks are called overheads, and play a very important part in cyber foraging research and development.

    Overheads include those network factors that effect overall response time and energy consumption, such as

    CPU, memory and battery consumption.

    This literature review identifies the current range of cyber foraging systems and the extent to which these

    systems adhere to the four fundamental prerequisite features that define cyber foraging within a pervasive

    environment. Firstly each cyber foraging systems is briefly described, and then we discuss the salient points of

    each cyber foraging system in relation to each of the four design features described earlier. (Figure 1.1 presents

    a tabulated snapshot of overall adherence) Secondly, alternative methods to achieve the functionality of the four

    design features are discussed.

    5

    2. Cyber Foraging

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    14/38

    2.2 Cyber Foraging Tools

    Introduction

    Essentially, a cyber foraging system is a fluid network that h as to juggle What processing code to distribute,

    Where to distribute it, and How to distribute it, in order for an application to running transparently on a

    resource constrained entity within minimum QoS. All the systems introduced in this work have as a minimum,

    some form of task scheduler that can range from deciding where a single task should be performed, to multiple

    tasks distributed on multiple surrogates. Tasks maybe modelled from data captured by system/network resource

    monitors, and prediction algorithms performed on the modelled tasks that allocate surrogate resources to the

    tasks. The range and scope of information included in the monitoring and modelling vary from system to

    system. The overall system architecture effects interaction and communication between system activities such

    as monitoring, scheduling, task/code partitioning and data transfer. Finally, actual communication methods vary

    from RPC, RPT, or messages depending on the system and system objectives. All the systems described here

    are designed as aides to developing applications with cyber foraging and task offloading functionality; therefore

    these are tools and not applications. Some cyber foraging tools vary in maturity and access to descriptive

    documentation and technical information; this is reflected in variation section page lengths.

    2.2.1 Spectra

    Spectra is arguably the forerunner of todays cyber foraging systems. The motivation behind Spectra was to

    address the uneven conditioning [Satyanaranyanan01a] that occurs in pervasive compute environments.

    Spectra is based on predicting an applications future resource requirements by monitoring and then mod-

    elling current application resource usage. Monitoring is performed by six resource monitors each of which

    monitor single of related sets of resources, namely CPU, network, battery, file cache state, remote CPU and

    remote cache state on surrogates. These monitors are within a modular framework shared by Spectrass client

    and server. Spectra consists of a client server architecture running on an application entity, and a server running

    on surrogate entity(s). Spectra uses services provided by Coda file system for replication and consistent remote

    file execution, and Odyessy for defining the fidelity of tasks f or distribution. Coda file system was used instead

    of using a distributed file system because it was felt that net work latency and bandwidth was not sufficient

    to sustain consistency. Spectra uses energy consumption during operation execution run times as the primary

    metric for scheduling and partitioning of tasks.

    The information provided by the resource monitors provide a snapshot of the surrogate state and availability

    of surrogate resources, and current state of the existing running system. From this, Spectra predicts balance

    or imbalance between current running system and future processing tasks, based on this balance prediction

    Spectra indicates the location of available surrogate(s) to the application. The decision to actually use the

    surrogate for the next task processing is made by the application.

    Predictions on the expected time for process task execution are made by firstly calculating transfer of

    data by dividing data for transmission with bandwidth; secondly, calculating task process time from previous

    heuristic data such as system logs, that are modelled for prediction of future usage. These predictions are made

    by predictors that use the heuristic data to generate the prediction models and update current log data.

    Scheduling is also based on modelled heuristic data of the surrogates available in the environment, any

    planned future execution of processes and fidelity of execut ion (fidelity is provided by Odyssey). The volume

    of data retrieved from the environment affects accuracy of the heuristic prediction models. This means that in a

    small environment with few surrogates scheduling predictions cannot be guaranteed to be optimal. Spectra uses

    a utility function to evaluate process executions. The function takes into account time for execution, fidelity

    and energy consumption. The utility function predicts execution by summing CPU time, data transfer time,

    time to manage cache, and time required to ensure consistency of data. Whilst cache management and data

    consistency times can remain stable, execution time varies with different applications. To overcome application

    specific execution time variability, each application must provide Spectras utility function with a similar func-

    6 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    15/38

    2.2. CYBER FORAGING TOOLS 7

    tion indicating potential application requirement or desire for remote execution i.e. the function 1/T where T is

    the predicted execution time, low value would would indicate low desire for local task execution and potential

    requirement for remote execution. In this example Spectra provides available compatible surrogate(s) informa-

    tion to the application. The decision to perform remote execution is made by the application. To continually

    carry out this requirements and desire for remote execution and pass on to Spectras utility function that returns

    surrogate information, carries with it non negligible overhead, therefore only execution times approximately

    one second duration are passed on to Spectras utility function.

    The Spectra API is presented in Appendix - Spectra. A register fidelity call identifies and registers start

    operations. The application then defines a set of possible execution scenarios, these scenarios provide different

    ways to partition execution between remote surrogate and local entity machines. The set of scenarios also

    specify levels of fidelity and input parameters, all of which add to operational complexity. The begin fidelity op

    call determines how to execute an operation and where the operation is to be executed. In short, Odysessy

    chooses the fidelity level and Spectra chooses the execution scenario (plan), before any actual execution takes

    place.

    do local op and do remote op calls mark the start of execution of operations, these call make RPCs to

    Spectras surrogates and local server. Other than starting execution of operations, these calls also serve to

    continually monitor resource usage at surrogates. The end fidelity op call is made by the application to signal

    end of operation execution.

    In the background to execution of operations a snapshot of resources in generated by the predict avail call,

    that iterates through Spectras resource monitors and returns predicted resource availability, this information

    is used by the register fidelity operation that generates potential scenarios. Spectras monitors are started and

    stopped in tandem with start and stop of execution of operations of the do local op and do remote op calls.

    Finally, add usage logs observations made, which may be used as future heuristic data.

    2.2.2 Chroma Tactics

    Based on Spectra [Flinn02], Chroma differs from Spectra in two major ways, firstly decisions to execute re-

    motely are not made by the application but by Chroma. The reason for this was to introduce flexibility so that

    application developers are not required to specifically interface with Spectra. Secondly, applications in parallel

    is possible with the introduction of remote execution tactics or tactics developed by Balan [Balan03a] (who

    also devloped Chroma).

    Tactics enables different ways to sub divide operations that can be executed sequentially or in parallel using

    RPCs. Once different tactics to execute an operation are defined, Chroma decides which tactic to use, and

    if execution shall be local or on remote surrogate resource. Tactics are defined in a generated tactics file, an

    example of which is found in Appendix. A tactics file is in two parts, part one is a sequence of RPC calls

    describing available RPCs and their associated IN input and OUT output parameters. Part two of a tactic file

    consists of single tactic definition descriptors, each of which is made up of a sequence of RPCs. RPCs that

    make up tactic file descriptors can be executed in parallel (within brackets) or sequentially (separated by an &

    symbol). Each tactic definition is created by the application developer, ideally at the application development

    stage.

    The tactic files for each task are collected by a solver that schedules each task by selecting an appropriate

    tactic plan for particular resource availability scenarios. A tactic plan specifies which tactic to use and where

    to execute related RPCs, taken from the tactic file. In order for Chromas solver to select an optimal tactic

    plan it requires resource usage information, which is supplied using multiple resource predictors and heuristic

    information similar to Spectras heuristic prediction models and resource utility function. However, unlike

    Spectra, Chromas solver selects an optimal tactic plan based on resource priority and enforces it in a brute

    force manner. The reason for this is the claim that there is only a small number of ways to subdivide an

    application task for remote execution to a surrogate. An example of Chroma architecture and that shows the

    work flow using tactics may be found in Appendix Chroma.

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    16/38

    Parallel remote execution of tasks in Chroma is achieved when the pervasive environment is over resourced

    for the current application requirements. In such a scenario, the same task RPCs from a tactic plan are executed

    on different surrogates. Chroma employs three different optimisation techniques, called fastest result, data

    decomposition, and best fidelity. The fastest result technique performs a tactic at a certain fidelity on multiple

    surrogates, and uses whichever result is returned first, any subsequent returned results are discarded. The fastest

    result technique increases performance but increases overall load on surrogates, especially in an environment

    with multiple application devices are operating simultaneously. The data decomposition technique requires the

    programmer to explicitly define a function to subdivide inpu t data that can be sent to multiple surrogates. The

    programmer defined function must include a method for mergin g returned data using this technique. The best

    fidelity technique is implemented in the tactic file. Here Chr oma sends different tactics to different surrogates

    and waits a certain time interval, the returned results within the time interval with the best fidelity are chosen,

    all others are discarded.

    2.2.3 AIDE

    AIDE developed by Messer [Messer02a] takes the approach partitioning the a service running on a client and

    offloading to surrogates. AIDE used a modular distributed pl atform employing the Java Virtual Machine (JVM),three modules address monitoring the application execution, partitioning tasks, and offloading of components.

    Dynamically partitioning Java programs and offloading code sections to surrogates is based on memory and

    processing constraints. AIDE used a graph of the application execution history, and subdivided the graph to

    represent code sections to offload to surrogates. The granul arity is defined by Javas class component in relation

    to Javas code architecture of objects, classes, and higher level components such as JavaBeans.

    Distributed execution was achieved by modifying JVM to have hooks instead of unique object references.

    These hooks took the form modifications to JVM that flag object references to remote objects, and then intercept

    accesses to remote objects. With these modifications to JVM, the AIDE modules were able to convert remote

    access into RPCs between JVMs on client application and surrogates. Any JVM on either client or suggogate

    that receives a request used a pool of threads from which to perform RPCs on behalf of other JVMs. Herethreads are not migrated, rather invocations and data accesses allow placement of objects.

    Partitioning of java code, was done using heuristic execution data in the form of an exection graph. Based on

    the graph MINCUT heuristic, all graph nodes representing a class that cannot be offloaded are partitioned first

    and stored on the client entity. Following this, each remaining graph node was evaluated using the MINCUT

    heuristic. The AIDE MINCUT heuristic produced a group of minimum cut partitions, that were individually

    evaluated to determine which one satisfies the partitioning policy. The partitioning policy was based on a cost

    function that returns historical data transferred between partitions. Partitions were then selected which could

    be offloaded without detriment to overall network operation and use of resources such as memory.

    Resource monitoring during partitioning and application execution was achieved by augmenting JVM code

    for method invocations, data field accesses, object creatio n and object deletion. The monitored information

    is obtained at the object level and aggregated to class level, to coincide with graph description. Memory

    usage within interclass interactions was also monitored and returned as values represented by graph edges

    and graph edge parameters. Graph representation of application execution was a fully weighted execution

    graph. The execution graph represents node classes annotated with memory usage of objects within the class,

    interactions between class objects, and data transfer between class objects. Graph adaptation and adaptive

    partitioning policy used memory usage by tracking free memory space from the JVM garbage collector. An

    overall archtecture found in Appendix AIDE shows the overall AIDE architecture and the hardware and VN

    used.

    2.2.4 Networked integrated Multimedia Middleware (NMM)

    Although not presented as either a cyber foraging system or a remote execution system, NMM is included in

    this work because it demonstrates the features required of a cyber foraging system and remote execution of

    8 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    17/38

    2.2. CYBER FORAGING TOOLS 9

    tasks.

    NMM is a component orientated middleware framework that integrates and configures components in a

    network. The primary service goal of NMM is to manipulate and render multimedia content within a network

    of resources prior to delivery to a mobile device.

    NMM is modelled on a logical flow graph, made up of nodes representing individual multimedia content

    processing tasks. The flow graph represents the overall task requested by an application, the overall task is

    further divided into subtasks represented by different node elements of the flow graph. Typically, a requested

    application task maybe for playback, transcoding or recording of multimedia data. To complete the application

    request task the data must be read from a source, for multiple media streams the data requires demuliplexing,

    and then individual streams decoded prior to rendering. The application request task is represented as node

    elements forming a chain of subtasks node elements required to accomplish the overall application request

    task. Each node within a logical flow graph accepts as input and produces as output, different defined data

    format via input/output jacks. Format of data is defined as a tupel, made up of media type and encoding task

    ie audio/mpeg3. As data is passed from node to node through the logical flow graph, the format tupel changes,

    until the data is ready for rendering or whatever the task of the final sink node maybe.

    Jacks act as connectors between nodes, each node having its own input and output jack. Exceptions depend

    on the node type such as a source node that does not have an input jack because it accepts data directly from

    the multimedia content repository, therefore the source node produces data to be consumed by the flow graph.

    Another exception is the sink type node that does not have an output jack because they are used to render data

    or dump to a hard drive, a sink node consumes data produced by the source node. The current NMM includes

    60 such node types, each defined as a unique multimedia content processing task. The life cycle states of a node

    runs through initialisation, initialise output, activated and finally started, once started processing may begin.

    The advantage of using NMM flow graph description is that different nodes can run on different resource

    hosts, furthermore the NMM application itself can run on separate hosts from any processing resource host.

    Resource node management including discovery, reservation, and instantiation is accomplished using a registry

    service that runs a registry server. Each resource node runs registry server which can be accessed by a registry

    client running on the application host. An application can use the registry client to query local of distributed

    running registry servers. When an application requests a resource node from the registry server, the registry

    server checks node availability, if available the node is instantiated and the node goes through its life cycle.

    Global resource host information is gained through registry information.

    NMM uses a client server and peer-to-peer communication architecture to communicate between nodes in

    a network, in which using a proxy object, invocation and execution are separated. Thus, the user of a proxy

    object does not need to know the resource host information on which the node is running; only the proxy object

    requires this information. These communication channels are therefore abstractions for all communication

    between NMM components, such as input/output binding between jacks of different nodes running on different

    resource surrogates.

    Service discovery and remote execution is performed by a server registry service. Discovery and registration

    of nodes is performed statically during initial registry service setup, however once a serve registry is initialised,

    dynamic registration of any added plug-in is dynamically added to the registry service. The dynamic addition

    of plug-ins to an existing network is added to a hierarchy of registry registers, the interface between client and

    servers in an existing network is called IRegistry and IServerRegistry, a description may be found in Appendix

    NMM.

    At this juncture it is important to underline the difference between a graph description, a node description,

    and plug-ins in order to make clear the what information is communicated by the registry service. The concepts

    used for describing entities that are administered within the registry service are the same for querying entities

    from the registry service. A plug-in is specified by a node description, whereas a complete flow graph is stored

    within a graph description Each description type contains all relevant attributes such as object name,ie a node

    description includes, node name, node type, format and sharing attributes. In addition to attributes, a node

    description also stores a list of events called configuration events used when configuring a plug-in instance.

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    18/38

    Therefore a list of any possible state and configuration attr ibutes for that state exists as configuration events.

    Node description can be further subdivided into subsets of the overall node description, this allows querying

    all node descriptions in the registry and returning only those subsets that full fill query criteria. The properties

    in a graph description include specification of nodes and the ir connections, specification of communication

    channel, and synchronisation.

    Distributed synchronisation in NMM distinguishes intra-stream and inter-stream synchronisation. Intra-

    stream synchronisation refers to refers to timings between multiple presentation of the same media stream,

    ie stream of subsequent video frames. Inter-stream synchronisation refers to synchronising media streams

    themselves, ie synchronising lip-sync for audio and video streams. Each NMM message holds a timestamp

    that contains entries for time and stream counter. The time and stream counter entries of the timestamp and

    a global system clock, are the main reference to time when synchronising between media streams passed

    from one processing task node to another either locally or remotely on surrogates. Synchronization between

    surrogate nodes is governed by a set of synchronisation controllers or synchronisers for each task node. These

    synchronisers realise inter-stream synchronisation by implementing a synchronisation protocol, and provide an

    interface allowing the application to modify the operation of a corresponding flow graph i.e. for pausing data

    processing.

    The overall objective of the sink synchronisation is to either, provide a synchronised playback of the media

    content, or rendering, for distributed audio/video sinks. An example of distributed synchronisation to a sink ap-

    plication node may be found at Appendix NMM. Here the buffer is simply a collection of messages containing

    timestamp information. Timestamps are handled by local running controllers delegated by synchronising sink

    nodes, these local controllers deal with the intra-stream synchronisation. Controllers decide when to present

    a particular buffer from a number of buffers from multiple streams by matching of buffer latency and flow

    graph latency. A buffer requires a certain time interval for reaching a node, after a timestamp has been set.

    This timestamp to node interval is called real latency and is expressed as the difference between arrival time

    and time set within the timestamp. Two buffers with corresponding time stamps will have different time for

    reaching sink node, i.e. real latency 1 and real latency 2. If one imagines latency as the time stream from the

    node where the time stamp was set called the sync-time, to the sink node where the buffer will be presented

    called the presentation-time, then latency = presentation-time sync-time. Alternatively, if latency is given,

    the controller will calculates a theoretical latency or theo-latency where theo-latency = sync-time latency.

    During runtime a controller computes real-latency > theo-latency+ max-skew, where max-skew is a tolerance

    previously defined. If the computed real-latency exceeds a v alue depending on the theo-latency, the buffer is

    considered as to old an maybe considered as invalid. However, if real-latency < theo-latency + max-skew

    presentation of the buffer will be delayed. In summary, the intra-stream synchronisation attempts to maintain

    constant latency for stream buffers, so that temporal distance between buffers is the same as their corresponding

    sync-time(s).

    This differs from inter-stream synchronisation that attempts to maintain equality for latencies of streams

    (equal latencies for different streams). This is achieved by setting theo-latency(s) of all controllers to the

    maximum real-latency of all current streams. First every controller sends computed real-latency to for first

    buffer to arrive, to the synchroniser. Then the synchroniser computes theo-latency as a maximum of all latencies

    and sets this theo-latency value as theo-latency for all connected controllers.

    Search and instantiation is performed in two stages. The firs t stage is referencing all nodes that match a

    given node description, here node description refers to a subset of a complete graph description stored in the

    server registry. The second stage is node instantiation of the first node in a list of nodes returned in response

    to stage one matching of references; response consists of returning a registry identifier and node identifier.

    Using these identifiers, a complete flow graph description ca n be requested using the first responding node. A

    description of a client registry and the registry hierarchy of registers that hold a number of specialised registers

    for different identifiers and node attributes may be found in Appendix NMM.

    The registers in the registry hierarchy are accessible via IRegistry interface, Registry1394 administrates

    firewire compatible devices, and LocalRegistry provides in formation for non-specialist available plug-in. Shown

    10 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    19/38

    2.2. CYBER FORAGING TOOLS 11

    in the ClientRegistry is the scope of information held and subsequent operations performed on the flow graph.

    Finally, a code snippet that a developer might use to access the registry service may be found at Appendix

    NMM. The first part creates a central application object for the application. Here the system server registry is

    contacted; if contact fails a local instance is created instead. The second line requests the client registry. The

    next part start with NodeDescription is an example for requesting a node specified by node name from the

    registry service. Here a graph description is used to request a simple flow graph consisting of three nodes, a

    source node for reading data from a file, a converter for decoding MPEG audio, and a sink node for outputting

    uncompressed audio. In the last part of the code snippet, all specified edges of the flow graph are specified and

    connected and all nodes activated, finally the flow graph is started.

    2.2.5 Goyal

    Goyal [Goyal04a] was motivated to develop a cyber foraging system on a widely available platform without

    the requirement of a large middleware layer.

    Goyal developed a lightweight cyber foraging system that differed from Spectra and Chroma because it

    did not require use of a common file system such as Coda/Odyssey. Similar to AIDE, Goyal employs virtual

    machine technology, however in contrast to AIDE partitioning is done by the application developer and not by

    any automated code division method. Multiple virtual surrogates can be created on the same surrogate host.

    The argument for using virtual machine technology was that independent virtual servers allowed for greater

    isolation, flexibility, resource control, and clean-up compared to running on real host surrogate machines.

    Isolation, in terms of no interference between virtual machines. Flexibility, as client applications can arbitrarily

    software on the virtual machine. Resource control, in that resources of the physical host can be fairly allocated

    between multiple virtual machines. This also allows the physical host to compute separate applications from

    the virtual machines without draining virtual machine resources. Clean-up is automated and simple, when a

    virtual machine instances shuts down, the allocated disk partition on the host surrogate is restored to original

    clean state.

    Service discovery was managed by a seperate service discovery server, that maintains lists of registered

    surrogates and their individual resource capabilities, represented in an XML syntax description. When a client

    requires surrogate resources, the client queries the service discovery server by requesting particular resources.

    The service discovery server matches existing listed registered surrogates with client query resource requests.

    Matching requests and resources is based on previous profiling of application resource requirements made by

    the developer.

    A typical implementation of the Goyal system workflow may be found in Appendix Goyal. Initially, the

    client sends a request to the server to discover a surrogate from its listed registered surrogates. The service

    discovery server replies with an IP and port number of the surrogate manager of a listed registered surrogate.

    With this information the client contacts the surrogate manager with a service start request. The surrogate

    manager determines adequate resources by matching application requirements with available resources, using

    the same XML notation as the initial service discovery request.

    After authenticating the client, the surrogate manager sends the client a service start response containing

    IP of new virtual machine, after allocating matched resources to a newly started virtual machine. During client

    application and virtual machine interaction, the client invokes an operation on the surrogate by sending a Sub

    Task Configuration Request to a virtual server manager on the surrogate. A Sub Task Configuration Request

    from the client would include a URL of the client program to run, the URL programme would include all the

    information the virtual surrogate requires to install and run the program.

    Authentication between client and virtual surrogate is addressed using a flexible authentication framework

    that supports multiple authentication mechanisms, specified by the client when first connecting to the surrogate

    machine during the service start request stage. The different authentication mechanisms available for the client

    to specify are SSL, TLS and SSH. Once a client - surrogate session is established, any subsequent transfer of

    data of data or communication uses a clients public key for authorisation. The clients public key is stored by

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    20/38

    the surrogate and service discovery server for future reference, and future client service discovery requests.

    2.2.6 Slingshot

    The motivation behind Slingshot [Su05a] is to eliminate the bottleneck that can occur when a client application

    attempts cyber foraging on remote surrogates via a wireless hotspot. Slingshot is client surrogate architecture

    for deploying mobile services at wireless hotspots, based on the concept of continuous replication of applica-

    tion states instantiated on virtual surrogates, as the client moves between available virtual surrogate resource

    services. The slingshot architecture replicates remote application state on surrogate computers, co located with

    wireless access points. A first replica of each application i s executed on a trusted safe server, and acts as a

    backup if subsequent surrogates fail. A subsequent second replicated application state is co located on a virtual

    surrogate within closer proximity than the first replicated state virtual surrogate, for quicker response times.

    The client applications broadcasts application requests to all replicated states on all virtual surrogates, and

    only responds to quickest return from any of the replicated state virtual surrogates. A database of the state of

    each replicated application maintains checkpoints for reference for a start point of a new replication instance.

    Replication in this fashion is used instead of migration of replicated state from surrogate to surrogate because

    during migration, processing cannot continue. In Slingshot processes continue on previous replicated states

    while new replicated states are being instantiated. Slingshot instantiates a new replica by check pointing the

    first replica, and migrating its volatile state to a surrogat e, and then replaying any operations that occurred after

    the checkpoint. The workflow of Slingshot may be found at Appe ndix Slingshot.

    The first safe replicated state surrogate server is called th ehome server that maintains a service database.

    The service database maintains current service state of replicated server on its virtual disk using SHA-1 values

    assigned to 4kb chunks of the latest replicated state. Therefore, at any time the home server has the latest

    updated replicated state

    2.2.7 Vivendi

    Vivendi was developed by Balan [Balan07a] who also developed Chroma tactics [Balan03a] and VERSUDS

    [Balan02a]. The Vivendi system has two main components, one that deals with creating a remote execution

    tactic file (tactic), and the Chroma runtime system [Balan03 a]. Here we only describe the Vivendi partitioning

    system and the interactions between Vivendi, tactics and Chroma. Please refer to previous sections for futher

    information on Chroma and tactics.

    Vivendi is written in little language [Bentley86a] for rapid modification of applications to enable partition-

    ing of application tasks. The motivation for vivendi is to reduce application development time by reducing the

    complexity of modifying an application to support cyber foraging at developer level, to allow both novice and

    less experienced application developers to develop cyber foraging enabled applications. Essentially, Vivendi

    requires the developer to preplan which application tasks should be considered for remote execution, and define

    critical variables of the task as parameters to be used to predict expected required resources to carry out the

    task, written in little languages as a tactics file. The devel oper includes for each critical task variable parameter,

    a fidelity for carrying out the task with satisfactory qualit y results. The developer also specifies RPCs that

    define the actual application computation task a surrogate p erforms on behalf of the client application. Finally,

    the developer defines combinations of different RPCs that ca n carry out the application task within the defined

    fidelity and quality.

    Chroma compliments Vivendi tactics file by selecting the app ropriate tactic and the binding of RPCs to

    surrogates. Selection of tactic is based on Chromas resource management, prediction and fidelity selection

    functions. as depicted in Appendix Vivendi. The Chroma solver module responds to Vivendi stubs generated

    by Vivendi RPCs defined in tactics file, and predicts the curre nt optimum tactic to use. The solver prediction

    process engages monitored surrogate resources data, computed using Chromas utility function, and similar

    tactic heuristics.

    12 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    21/38

    2.2. CYBER FORAGING TOOLS 13

    Vevendi generates two types of stubs, a standard RPC stub and a wrapper stub. The wrapper stub is manually

    written by the developer, the wrapper stub contains the application code methods required for the task defined

    in the original tactic, the wrapper stub effectively contains the lower level information a surrogate need to

    do the task. Using a Vivendi wrapper stub provides a convenient interface between Chroma and the targeted

    application. An example of a Vevendi tactic file and Vevendi wrapper stub may be found in Appendix Vivendi.

    2.2.8 EyeDentify

    EyeDentify EyeDentify is a smart phone object recognition application developed on the Android OS, that

    uses the Ibis Distributed Deployment System to deploy remote application on to surrogates, and the Ibis High

    Performance Programming System for communication.

    The paper developed two versions of the EyeDentify application, one version performed all computation

    locally and a second version performed computation on surrogates, the response times for the same computation

    processes were compared. Results revealed a 60 fold increase in responsiveness using Ibis for cyber foraging

    on remote surrogates than on local phone resources.

    The Ibis middleware consists of a number of sub projects, each of which implements a part of the grid mid-

    dleware requirements. The left side panel represents th Ibis Deployment System, with JavaGAT as the main

    component. The right panel represents the Ibis High Performance Programming System, the main component

    of which is Ibis Portability Layer (IPL). The combination of JavaGAT and IPL forms the main cyber foraging

    mechanism for EyeDentify. A graphic representation of the Ibis Middleware used can be found in Appendix

    EyeDentify/ JavaGAT has adaptors able to bind to any middleware, the adaptors map JavaGAT API to mid-

    dleware calls, including SSH. The EyeDentity Android application used two adaptors, one for client resource

    access, and one for surrogate resource access using SSH. On top of JavaGAT is a deployment library called

    IbisDeploy (Deploy) that starts distributed applications developed using Ibis High Performance Programming

    System. On top of IbisDeploy is a GUI from which remote application can be started. Deployment procedure

    of an Ibis application on a remote surrogate involves the following sequence of eight subtasks. 1) Replicate the

    application, libraries and input file on the remote surrogate. 2) Start and Ibis Server registry process. 3) Form

    an overlay network. 4) Construct middleware specific job descriptions. 5) Submit job description to remote

    surrogate. 6) Monitor job statuses. 7) Retrieve input file when process is completed. 8) Clean up remote file

    system.

    Service discovery is performed by the IbisDeploy library when it defines job descriptions and Ibis applica-

    tion using a namespace concept. Within an Ibis application description is contained a main class, and virtual

    machine options and arguments, within remote surrogate description is contained details of how a remote

    surrogate should be accessed. Both the IbisDeploy library and GUI are ported onto Android. In summary,

    the application itself is developed on the client and deployed to remote surrogate, therefore the only service

    software required to run on surrogates is the Ibis default middleware that JavaGAT binds to, and a JVM.

    The EyeDentify application is an object recognition application that has two stages of operation. Stage one

    is called the learning mode in which an image of an object is stored on an internal database with a predefined

    identification profile. In stage two called the recognition mode, another image is captured and matched to

    images in the internal database, the best match between second image and database images are then presented.

    Matching images is performed by learning algorithms that extract features and attributes of the second image,

    such as colour histograms, shapes, relative size etc. The recognition phase of the application is very resource

    consuming; this was the compute that the paper offloaded to surrogates.

    2.2.9 DiET

    Diet developed by Kim and workers [Kim09a] is a framework that transforms original java bytecode from

    a remote content service provider into graduating smaller versions, one version for surrogates, and an even

    slimmer version for execution on a client. Motivation for the work is to reduce java bytecode into serialised

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    22/38

    distributed objects that are usable by surrogates and clients, without the need for major developer modifications

    to the original java application. This slimming down of bytecode is done by replacing the main bodies of

    methods with remote procedure calls (RPCs). A client request starts the process by requesting an application

    to execute from a remote content service provider. The service providers then slim down the application java

    bytecode and transfers this to prediscovered surrogates and clients in the pervasive compute environment. The

    surrogates recieve a server byte code and clients recieve a smaller slim bytecode. Since no modification to code

    functionality takes place, once the client and server have their received their versions, the cyber foraging takes

    place as ATA using JVM on invoked on surrogates. A graphic description of the transfer of server and slim

    bytecode is shown in Appendix DiET.

    2.2.10 Instant-X

    Instant-X is a component based middleware platform that provides a generic programming model with an

    API for essential tasks of multimedia applications with respect to signalling and data transmission. The mo-

    tivation behind Instant-X is to develop spontaneous communication software, compatible with multimedia

    encoder/decoder protocols. The work argues that standard multimedia encoder/decoder protocols are limited

    as communication software. For example, the Java Media Framework [Sun99a] offers basic access to multi-

    media codecs, and RTP data transmission, but does not support further communication mechanisms such as

    signalling. However, JAIN SIP does support signalling, but requires considerable configuration by the multi-

    media application developer.

    The thrust of Instant-X concept is the ability to replace specific protocols implementations without chang-

    ing application code of multimedia application. Instant-X also supports dynamic deployment of unavailable

    components at runtime. Although not a cyber foraging system pre se, Instant-X is included in this section be-

    cause implemented with OSGI [OSGi07a] as a component platform, the system demonstrates interesting cyber

    foraging functionality. The programming model consists of three elements, binding, session and context. A

    graphic representation of the programming model can be found in Appendix Instant-X. Binding is a local end-

    point of an application represented by URI, that activates the URI and maintains UIR active status. A session

    represents a P2P relationship participants or actors, each participant is has a unique URI identification, such

    as SIP:[email protected] if using SIP URI method. The UIR identifiers are encapsulated in bindings. Con-

    text contains optional parameters required for sessions and binding such as permissions. A SIP session may

    contain multiple SIP sessions of RTP sessions for audio and video. The programming model is designed to

    provide generic tool for developers who do not need to worry about the underlying protocols required of their

    application. The generic API of instant-X is such that the application does not need to change if if the protocol

    implementation changes. Instant-X API employs OSGi [OSGi07a] as a java service orientated architecture that

    dynamically discovers collaborative components, changes device composition of a variety of networks, without

    the need for device restart. Instant-X is demonstrated using cloud computing with OSGi. By using the cloud

    computing paradigm, surrogate discovery and scheduling is deferred to the cloud.

    2.2.11 Scavenger

    Scavenger is a dual profile task scheduling system, written i n python as a hybrid cyber foraging approach based

    on locust [Kristensen07a], consisting of a daemon installed on surrogates and libraries installed on the client.

    Scavenger is motivated by increasing the effect of heuristic profiling for scheduling to take into account task

    complexity, and merging task centric and peer centric profil es. The surrogates consists of two independent

    software components that are a daemon running of surrogates using stackless python, and a library running on

    the client using normal python. The libraries on the client are the mobile code executed on surrogates through

    RPC entry point of surrogate daemon. Libraries can be invoked by the application starts cf or automatically

    without the need to start the application.

    The daemon on the surrogates has a front-end to receive RPC, and a mobile code environment. The mobile

    14 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    23/38

    2.2. CYBER FORAGING TOOLS 15

    code environment allows dynamic installation and execution of the python code transferred from the client in

    RPC.

    Kristensen argues that mobile code is a necessity for true mobility because pre-installed tasks on surrogates

    mean all surrogates everywhere must have all pre installed tasks. Similarly, using VM is too heavy weight and

    takes to long for a full VM to instantiate, especially if the user is mobile and out of reach in a few minutes.

    Kristenson argues using trusted pre installed mobile code is better, if the code is not installed then the mobile

    client simply installs it.

    The daemon execution environment spawns a core scheduler on the surrogate that handles the offloading

    application tasks for a particular core. When installing the daemon on a surrogate the number of cores to offer

    as surrogate cores is user configurable in the case that the surrogate is a machine such as a laptop used by other

    users. Here the laptop maybe used locally for other activities and still be used as a surrogate for cyber foraging

    in a pervasive compute environment. A high-level view of the scavenger architecture can be found in Appendix

    Scavenger.

    Once a surrogate had performed an offloaded task, the task is stored at the surrogate for future use using

    a automated UID with MD5 sum naming. So when invoking a given task scavenger first queries if the task

    is already installed, before installing from the mobile code. Here mobile code does not have to be transferred

    along with the task because the task code is already at the surrogate. This aides the fact that many transient

    client will normally use a certain number of tasks more than others.

    Security and trust measures include black-listing and white listing of imported known standard library

    modules.

    Surrogate discovery is performed using a presence discovery framework, used by clients to discover surro-

    gates. XML-RPC is used.

    Scheduling - The main contribution of scavenger is the dual profiling during scheduling. Each task has

    two profiles , a task centric profile where a globally applicable task weight is stored, and a peer-centric profile,

    where a profile is maintained for each (peer-task) pair, that has been encountered. The peer-centric profile

    stores information about how exactly a peer performs with regards to a specific task. These stored peer-centric

    profiles are used to probabilistically determine how an unknown peer may perform during a particular task.

    Scheduling then is based on maintaining a history based peer-centric profile. When profiling information is

    required during scheduling the history based peer centric is first read to see if that particular peer has been used

    before and for which tasks. If the peer is unknown, then the task centric profile is consulted. Task centric profile

    is also history based and contains a weight for each task. Task centric weighting is calculated as follows:

    Tweight= TdurationPstrength

    Pactivity(2.1)

    Where Tduration is task duration in seconds, Pstrength is peer strengh for nbench benchmark1, and Pactivity

    is the number of tasks running during execution.

    During a task, an expected Pstrength is derved by scaling with Tduration, the derived expected Pstrength

    is used as a measure to reason about task expected running times on other surrogates. For example a task with

    Tduration of 1 second on an surrogate with Pstrength 40, should take approximately 2 seconds on a surrogate.

    Experiments verified a correlation between theoretical and real task duration times on surrogates with varying

    compute resources carrying out the same task.

    The profiles are two dimensional with a second dimension that takes into account task complexity. Task

    complexity, is a factor that is also considered as a factor in the scheduling process. Task complexity variables

    such as input size and value will vary. At the development stage the application developer is asked to provide a

    description of input parameters that determine a tasks complexity. Combining developer specific complexity

    parameter values and/or sizes is evaluated to a yield a single value , where may be for example the size of

    an input file. Using a simple matching algorithm the value of in combination with Tweight is used to update

    1

    1http://www.tux.orgl..mayer/linux/bmark.html

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    24/38

    the dual scheduling profiles.

    Scavenger does not partition tasks into subtasks; each task described in mobile code cannot be further

    divided into subtasks run on different surrogate machines.

    2.3 Summary

    The methods used to address surrogate discovery, trust establishment and task scheduling are shown in Table

    2.1. With the exception of Scavenger, previous proposed cyber foraging systems have not supported all four

    design features. The reason for this may be the non-commercial nature of these systems, that focus on particular

    aspects of cyber foraging rather than a complete system.

    Table 2.1: Table of Cyber Foraging Systems reviewed and their adherence to the four fundamental features

    required of a Cyber Foraging system as defined Satyanaranyan an [Satyanaranyanan01a] and Balan [Balan07a]

    Figure 2.1: Classification of cyber foraging systems as thin and thick ATA and AAA

    2.4 Discussion

    In this section the literature and cyber foraging tools described in previousley are discussed in the context

    service/surrogate discovery, the mechanisms of these functionalities summarised in Table 1 in the previous

    section are expanded upon.

    1Network integrated Multimedia Middleware

    16 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    25/38

    2.4. DISCUSSION 17

    2.4.1 Surrogate Discovery

    Introduction

    Of the challanges needed to be addressed in the fusion of distributed and mobile computing [Satyanaranyanan01a],

    service discovery is the most critical, without which transient use of cyber foraging would not be possible. Dis-

    covery enables entities to properly discover, configure and communicate with each other [Zhu05a]. Discovery

    of a specific service in a pervasive compute environment is hampered because it is unreasonable to expect surro-

    gates to have services pre installed to cater for all transient clients. The ability to dynamcially discover services

    has less administrative overhead than traditional methods, that require prior knowledge of service existence

    and the need for manual input of parameters such computer names IP, URL etc, to configure the discovered

    surrogate service to the client appliacation.

    In this work, we replace the concept of searching for a service with searching for a surrogate. While pre-

    vious knowledge of services existence in fixed limited networks may seem practical, a pervasive computing

    environment will have transient client usage and potentially hundreds of potential transient surrogates and sur-

    rogate services. The cyber foraging systems reviewed in this section are not specifically discovery services, they

    are tools and aides for the developer to include features of cyber foraging functionality in mobile applications.

    In the work of Zhu [Zhu05a], ten service discovery design features are defined in a unifying taxonomy

    of terms and definitions, and comparison made to the adherance of these terms with some exisiting discovery

    protocols. The protocols that Zhu compares are primarily for home and enterprise environments that do not

    completley compare with a pervasice compute environment, however the design approaches are useful guides

    and reference points for design of discovery protocol in pervasice compute environment. In this discussion we

    first describe some current discovery protocols, and make the same comparison of adherance that Zhu made in

    Table 2.2, to discovery afforded by cyber foraging systems to in a pervasive compute environment described in

    the previous section.

    Existing Discovery Protocols

    Previous studies of announcement and discovery have focused mainly on static environments without mobile

    entities. Two examples of static discovery are Jini [Jini03a], Salutation [Salutation01a] and Universal Descrip-

    tion, Discovery and Integration [UDDI11a] (UDDI for Web services). In such static envirnoments an entity

    can act as both client and server (peer), typically a one or more peer(s) shall take on the role of registrar that

    maintains a central register. This registrar peer is contacted by peers that wish to announce that they provide a

    service, the service is registered by the registrar peer. In return, the registrar peer provides a lease to the service

    peer, that indicates how long the registrar will continue to announce the service to the network. When a peer

    wants access to a service it informs the registrar peer of its service requirements. The registar then matches the

    requirements with registered services, and returns either a list of known registered services or information of a

    type that provides communication details to a registered peer that currently provides the required service. This

    is a centralised approach that does not perform well in a mobile environment that may have mobile peers that

    both use and offer services concurrently.

    An example of a mobile network discovery protocol is the gossip approach as taken by Lee [Lee03a]

    called Konark. Here, mobile peers that are offering services, announce services that they are aware of only

    to nieghbours whom they beleive are not aware of the said services. The gossip annouoncement approach is

    an optimal approach in a multihop network because of its viral nature of communication. A drawback of the

    gossip approach is high overhead in wide networks, and suited to single hop neworks such a Mobile Adhoc

    Networks (MANTETS). In cyber foraging, constraining to single hop data transfer is impractical.

    Another alternative mobile discovery protocol is Plug and Play (UPnP) [UPnP02a] that is based on the Sim-

    ple Service Discovery Protocol (SSDP) [Goland99a]. UPnP is not centralised to a single regsiter, rather when

    a peer joins a network it advertises a service to control points that reperesent potential clients for the service.

    When a potential client (control point) joins the network it sends a discovery message requesting a service,

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    26/38

    the exisiting service adverising peer responds to the requesting client by returning a service announcement

    message. A service announcement message has limited service description, should the requesting client wish

    to have more service information it responds to the service announcement message requesting more compre-

    hensive service description information, which the service advertising peer duly provides. On recieving a full

    sevice description the client may choose to use the service, which it does by communicating with the service

    peer using the Simple Object Access Protocol (SOAP) [SOAP08a] to use the service.

    There are two drawbacks using UPnP in cyber foraging in a pervasive compute environment, firstly the

    discovery process is passive, in that networked peers only advertise once, and then wait for a potential client

    to respond. Secondly, after a service announcement message is recieved the client must yet again contact the

    service peer and request more service description information, before deciding to use the service or not. In the

    context of scheduling of tasks in cyber foraging, a client requires continuous service discovery information from

    all service surrogates. Conintual manual contacting of every surrogate in the pervasive compute environment,

    and recieving multiple service descriptions would incurr unfeasable overhead, considering all peers may be

    clients and service surrogates, therefore all peers may need to request to all other peers in the network for

    continous monitoring.

    In summary, existing discovery protocols do not completely adhere to cyber foraging in a pervasive compute

    environment requirements that demand continuous discovery of mobile clients and surrogates (peers) in a

    single hop network, that also provided continuous available service description update that may be used fo for

    scheduling of partitioned application tasks.

    Summary of Cyber Foraging Discovery Design Features

    Table 2.2: Table of Cyber Foraging System tools and their adherence to the four fundamental features required

    of a Cyber Foraging System in a pervasive compute environment as composed by Zhu [Zhu05a]

    18 2. CYBER FORAGING

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    27/38

    2.4. DISCUSSION 19

    Service and Attribute Naming

    The template-based approach defines a format for surrogate names and attributes, such as how the name must

    be composed. In addition a template-based name for the surrogate, common parameters of a surrogate may

    also be predefined. The advantage of templates and predefined sets of parameters, is the decrease in ambiguity

    in communication between entities within the system and externally. Specifying the format and composition

    of a surrogate name and attributes is dictated by the how much use of standardised components are used.

    Initial Communication Method

    While unicast is the most efficient initial communication method because it only targets specific entities, its

    drawback is the need to have prior knowledge of the entity address. However, the amount of prior knowl-

    edge required can be reduced by initially sending multicast UDP messages, from which entities can determine

    unicast addresses, and then switch from multicast to unicast. Here less prior knowledge using multicast and

    then switching to unicast is required, and unicast addresses may be stored for future reference. Alternatively,

    broadcast may be used for single hop networks to bind a discovery protocol to the underlying network protocol

    interface, the disadvantage of this method is that entities are then limited those with the underlying network

    communication protocol interface.

    Discovery and Registration

    In the announcement-based approach clients, surrogates and registration directories listen on a communication

    channel, when an announcement that a service is made, a client might learn a service is available and directory

    might register the service availability. In the query-based approach an entity may receive an immediate response

    to a specific query, each query is replied to separately.

    Service Discovery Infrastructure

    In the directory-based infrastructure model dedicated infrastructure components are used, here the directory

    component maintains service information and processes querys and announcements. A directory-based model

    infrastructure can have a flat structure in which all directories have a peer-to-peer relationship, or a hierarchical

    structure that only communicates with directories that match a certain criteria. The non-directory based infras-

    tructure model has no dedicated infrastructure components, here all services process a query, and reply if the

    service matches the query. A client may record information from a service announcement for future use.

    Service Information State

    The status of a service can be maintained as a soft or hard state. Using the soft state, the lifespan of a service is

    governed by the lease expiration time contained in the service announcement message. Prior to lease expiration,

    a client or directory may poll the service for validity or the service reannounces itself or renew the current

    lease, otherwise the service expires and is removed from directory entries of systems using the directory-based

    infrastructure model. The hard state is used in systems using the non-directory based infrastructure model, this

    requires periodic polling of services by the client and directories to update service information.

    Discovery Scope

    Defining discovery scope reduces unnecessary computation on clients surrogates and directories. Using scope

    criteria based on network topologies, user roles and context information, or combinations thereof, aides tar-

    geting correct scope definition. When including the network topology in the discovery scope criteria, LAN

    or single hop wireless network range may be used, here an implicit assumption is that clients, surrogates and

    directories belong to the same administrative domain. Opting for User roles criteria allows the user control the

    target domain; however this entails the user having prior knowledge of the domain, and domain authentication

    information. Criteria such as high level context information such as temporal, spatial and user activity is still

    uncommon, however including this in scope criteria lends added granularity to the discovery scope.

    Service Selection

    Using discovery scope criteria may limit the number of service matches, a discovery result may still contain

    matched services. In such cases the choice of service selection can be either manual or automatic. Manual

    selection provides complete control of service selection to the user, however this option assumes the user has

    the correct knowledge to make the optimal choice. Alternatively, an automatic selection requires little or not

    user input. An example of automated service selection is MatchMaking with ClassAds using Condor.

  • 7/28/2019 Introduction to Cyber Foraging, Tools and Techniques

    28/38

    Service Invocation

    Once a sevice has been selected, a client invokes the service, depending on the discovery system used. Service

    invocation has three facets of information, service location, an underlying communication mechanism, and

    operations specific to an application. The first level of serv ice invocation is the network address that provides

    service location only, here the application is then responsible for defining communication and operations. The

    second level is in addition to service location, and defines t he underlying communications mechanism, normally

    communicated using Remote Procedure Calls (RPCs) and variations thereof. The third level is in addition to

    service location and communication mechanisms, a definitio n of the application operations that are specific to

    an application domain.

    Service Usage

    Once service usage is granted to a client, the client may explicitly release the resources of a discovered and

    selected service. An alternative is the lease based method, in which the client and service negotiate a usage

    period as a renewable lease time, when the lease expires the resources are reclaimed and lease information

    deleted. For pervasive compute environments that have dynamic service resource availability, the lease based

    method is more suitable, than explicit release by the client because if the client fails this could cause issues of

    blocking the resource services.

    Service Status Inquiry

    A client can monitor a service state by either periodically polling the service or using service event notification.

    Service event notification requires the client to register w ith the service for the service to notify the client if an

    event of interest has occurred. Depending on the system, the method which is the most infrequent should be

    chosen.

    Discovery Discussion

    Spectra [Flinn02a], Chroma - Tactics [Balan03a] and Vivendi [Balan07a] are examples of RPC-based ap-

    proaches, in which client applications are partitioned into locally executable code and remotely executable

    services. The services are then pre installed on surrogates and may be invoked using RPCs. Data exchangebetween client and surrogate(s) is via system specific Coda fi le system that requires the aforementioned pre in-

    stalled remote services. This limits the system to pre-confi gured pervasive compute environments with specific

    application support. An advantage in surrogate discovery to pre-installed environment is low overhead when a

    surrogate is discovered, as there is no need no determine if the service is installed.

    The VM approaches of Slingshot [Su05a], AIDE [Messer02a] and Goyal [Goyal04a] add flexibility of

    control over the environment to the user who may install anything on a surrogate without interference to other

    systems. The replication approach taken by Singshot [Su05a], entails a proxy running on the client application

    device that broadcasts each service request to all replicas, the first recieved response is then passed to the

    application. T