towards adaptive actors for scalable iot applications at ... · complex distributed systems require...

© 2018 by the authors; licensee RonPub, Lubeck, Germany. This article is an open access article distributed under the terms and conditions ofthe Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Open Access

Open Journal of Internet of Things (OJIOT)Volume 4, Issue 1, 2018

http://www.ronpub.com/ojiotISSN 2364-7108

Towards Adaptive Actors forScalable IoT Applications at the EdgeJonathan FurstA, Mauricio Fadel ArgerichA, Kaifei ChenB, Erno KovacsA

A NEC Labs Europe, Kurfursten-Anlage 36, 69115 Heidelberg, Germany,{jonathan.fuerst, mauricio.fadel, ernoe.kovacs}@neclab.eu

B Computer Science Division, UC Berkeley, 387 Soda Hall, Berkeley, USA, [email protected]

ABSTRACT

Traditional device-cloud architectures are not scalable to the size of future IoT deployments. While edge andfog-computing principles seem like a tangible solution, they increase the programming effort of IoT systems, donot provide the same elasticity guarantees as the cloud and are of much greater hardware heterogeneity. FutureIoT applications will be highly distributed and place their computational tasks on any combination of end-devices(sensor nodes, smartphones, drones), edge and cloud resources in order to achieve their application goals. Thesecomplex distributed systems require a programming model that allows developers to implement their applications ina simple way (i.e., focus on the application logic) and an execution framework that runs these applications resilientlywith a high resource efficiency, while maximizing application utility. Towards such distributed execution runtime,we propose Nandu, an actor based system that adapts and migrates tasks dynamically using developer providedhints as seed information. Nandu allows developers to focus on sequential application logic and transforms theirapplication into distributed, adaptive actors. The resulting actors support fine-grained entry points for the executionenvironment. These entry points allow local schedulers to adapt actors seamlessly to the current context, whileoptimizing the overall application utility according to developer provided requirements.

TYPE OF PAPER AND KEYWORDS

Regular research paper: IoT, framework, programming model, edge-computing, adaptation

1 INTRODUCTION

Internet of Things (IoT) applications arrive witha paradigm shift compared to traditional Internetapplications: huge amounts of data are now generatedat the edge of the network and flow towards the cloud tobe processed and potentially combined with other datasources. While the first generation of IoT applications

This paper is accepted at the International Workshop on VeryLarge Internet of Things (VLIoT 2018) in conjunction with theVLDB 2018 Conference in Rio de Janeiro, Brazil. The proceedingsof VLIoT@VLDB 2018 are published in the Open Journal ofInternet of Things (OJIOT) as special issue.

often followed a traditional client-server architecture,with thin, low complexity clients and high-performancecloud computing, recent applications are characterizedby richer clients (smartphones, drones, cars) that uselocal sensor data as input for—possibly machine learningbased—data processing to support timely and localdecision making (e.g., UAVs) or user interaction—e.g.,for augmented reality (AR) applications [12].

The requirement of a timely feedback loop conflictswith common approaches of data processing in the clouddue to the long round trip times, and is not reliable ina wireless setting. Further, advanced sensor modalities

70

http://creativecommons.org/licenses/by/4.0/

http://www.ronpub.com/ojiot

J. Furst, M. F. Argerich, K. Chen, E. Kovacs: Towards Adaptive Actors for Scalable IoT Applications at the Edge

like video or 3D data have high bandwidth requirementsthat can quickly congest (wireless) network links andbecome non-scalable in a cloud-centric solution fora wide deployment like envisioned for future smartcities [3].

Edge or Fog computing [37, 8] are principles thatbring cloud resources closer to the user, which cangreatly improve responsiveness, resilience, bandwidthusage and thereby the scalability of such applications.However, edge computing adds more burdens onIoT developers: programming distributed, dynamicand heterogeneous device-edge-cloud systems requirescomplex decision making on (1) application partitioninginto tasks, (2) task allocation and (3) task adaptation.Application partitioning, meaning the structuring ofapplication logic into components like packages, classesor functions, is a daily task for developers and isstatic during runtime (i.e., it needs to be done duringimplementation).

Task allocation and adaptation however, dependhighly on the current context during execution time,often unknown to the programmer. As an example, anAR application running on a smartphone must place andadapt tasks differently when in proximity to an edgenode , opposed to in a traditional smartphone-cloudenvironment. It also needs to adapt its behavior tochanges in the network link (e.g., 3G vs. WiFi, networkpartition or downtime) and battery level for mobiledevices. In all allocation and adaptation scenarios, theapplication must adhere to its individual requirementsand constraints, such as responsiveness, accuracy, cost,energy consumption, CPU use, and storage.

In this work, we argue that these deployment andadaptation decisions should not be left solely to IoTdevelopers , but happen in interplay with a distributedexecution framework: Firstly, the complexity of thesedecisions moves the developer’s focus away from theapplication logic that contributes to the business goals.Secondly, the number of combinations of differentcontext factors and their implications are too large andpossibly not fully known during implementation time.For example, network round-trip times have a highvariance in a wireless and mobile scenarios, which iscommon for IoT applications [5]. Thirdly, programmingdistributed systems is hard and requires skills beyond theones of many application developers.

We begin this work with the assumption that IoTprogrammers should focus on (monolithic) applicationlogic, while the framework transforms this logic intoadaptive, distributed components (see Figure 1). Whilethere has been previous work with similar intentions,notably Sapphire [49] and Ray [35], they are targetingslightly different problems. Sapphire focuses on aframework for mobile-cloud systems. Ray focuses on

highly parallelized data analytics and machine learning.Both assume homogeneous nodes, and in the case ofRay, in-data center computation. We envision that futureIoT applications will span over a heterogeneous set ofhardware platforms, from physical IoT devices overclose-by edge-nodes to traditional cloud computing, andthat such setting needs strong framework support.

Towards these goals, we propose Nandu, a first designand prototype of a distributed execution frameworkfor IoT applications using adaptive actors [26], whichenables highly dynamic, autonomous and edge-centricapplication partitioning and task scheduling. Nanduprovides:

• Transformation of regular, sequential coding styleprograms into distributed actors.

• A programming model that allows developers toeasily specify application wide Quality of Service(QoS) requirements and function-level developerhints, consisting of adaptable function parameters(or even multiple function implementations) andtheir expected utility towards the overall applicationgoal.

• Dynamic actor adaptation that uses thesedeveloper hints and then optimizes actoradaptation throughout the lifetime of an applicationusing strategies of (1) actor migration, (2) datadegradation and (3) actor degradation.

In contrast to previous work [32, 49], Nanduconsiders different adaptation strategies to best conformto developer provided non-functional Quality of Service(QoS) requirements (e.g., latency, energy, memory use)under a changing execution context. Instead of onlyconsidering device-(edge)-cloud architectures, Nandu’sworldview is informed by distributed nodes that mayexchange tasks in a P2P fashion. The main contributionsof this paper are:

• We derive the requirement of framework supportedtask adaptation for IoT applications from anexemplary use case (Section 2).

• A survey of popular actor systems, consideringcommon IoT requirements (Section 3).

• The design of Nandu, an IoT tailored actor baseddistributed execution platform (Section 4).

• A prototype implementation and evaluation thatshow the advantages of Nandu (Section 5).

71

Open Journal of Internet of Things (OJIOT), Volume 4, Issue 1, 2018

F5F4

F1

F2

F3

C5 C4

C1

C2

C3

Distributed Adaptive ComponentsStatic Partitioning

into Functions

Figure 1: From monolithic application logic to distributed adaptive components

2 MOTIVATION

Worldwide disasters have affected 1.7 billion people andkilled 0.7 million between 2005 and 2014 accordingto numbers by the UN [43]. Information andcommunication technology (ICT) has been applied inacademia and industry to help reduce these numbers.For example, by providing resilient ad-hoc networkinfrastructure for disaster response [23] or detectingnatural disasters, like eruptions of a volcano bydeploying wireless sensor nodes [45]. In the event ofa disaster, drones are used to locate people in needfor help or directly provide support, e.g., in the formof dropping defibrillator devices [15]. We consider asimilar application scenario to motivate our work.

2.1 Application Scenario

Semi-autonomous drones, equipped with a monocularand an infrared camera are used by rescue teams to locatepeople in a difficult to access area in the aftermath of anatural disaster (see Figure 2).

This scenario might require multiple, concurrentrunning tasks:

T1 (Visual) Simultaneous Localization and Mapping.(V) SLAM based drone navigation requires near real-time sparse point cloud construction and vision-basedlocalization [21].

T2 3D Model Reconstruction and SemanticUnderstanding. The 3D model and semanticunderstanding (e.g., road detection) is needed to helpfirst responders to navigate to the disaster survivors.Pre-disaster infrastructure might not be accessibleanymore (e.g., due to earthquake, flooding).1

T3 Infrared based Detection. Use thermal infraredcamera to detect humans and animals.

1 After the Haiti earthquake in 2010, volunteers used recentsatellite data to manually (re-) create accurate mapping onOpenStreetMap (see https://wiki.openstreetmap.org/wiki/WikiProject_Haiti).

T4 People Detection. Detect people using computervision algorithms on pictures taken with the drone’svisual-optical camera.

Developing such systems is hard because of thediverse task requirements and dynamically changingnetworking and energy state:

Placement. T1 needs to run on the drone for real-timelocation feedback. T1, T2, and T3 can either run on thedrone or be offloaded to the edge or cloud, because theyare not as time critical as T1. When scheduling tasks, T4should be a high priority task that happens only when T3detects objects, while T2 is a continuous but low prioritytask.

Latency. Each task has different latency requirements.While T2 can be postponed until when resources areavailable, T1 is mandatory for a correct drone navigationand needs to be executed near real-time. To findsurvivors in a timely manner [15], T3 and T4 also requirehigh responsiveness.

Accuracy. T1 requires high accuracy to avoid droneaccidents, while the people detection performed in T3and T4 can trade accuracy for a quicker response time.

Energy. Compared to the edge and cloud, the drone isrestricted by its limited battery. It must find a good tradeoff for energy consumption between local computationand offloading.

Network Connection. Since wireless signal can beimpacted by various factors (e.g., interference, distance),the system should constantly monitor the networkconditions and adjust accordingly.

Ultimately, to work well, such distributed systemsneed to adapt and allocate concurrently running tasksdynamically during runtime. We derive the followingkey design goals.

72

https://wiki.openstreetmap.org/wiki/WikiProject_Haiti

https://wiki.openstreetmap.org/wiki/WikiProject_Haiti


The Cloud

WiFi, TV White Space…

Fiber LTE Car processing unit, Laptop,Smartphone, Smart Glassesor any combination of these.

Edge Server

Camera3G

DisasterVictim

Car Edge

Figure 2: Drone supported disaster management. (Drones are used to timely detect disaster survivors, providesupport, and notify first responders. Mobile network infrastructure might be damaged or overloaded.)

2.2 Key Design Goals

(1) Dynamic discovery mechanism. Mobile IoTdevices like drones or smartphones need to discoverclose-by available computational resources in orderto distribute their tasks. E.g., multiple dronesmight distribute tasks among them according to theircapabilities.

(2) Abstract execution mechanism away fromapplication logic. To simplify development,programmers should be able to implement theirapplication in a sequential, non-distributed way, as weenvision most of them to be non-experts in distributedprogramming. Programmers are only required to makeannotations (e.g., can this function be offloaded or not?)to their functions to transform the application into adistributed one.

(3) Dynamic adaptation. The runtime needs todynamically adapt and migrate application tasks,depending on the current execution context. Adaptationmechanisms might require hints for the runtime in theform of annotations from the programmer that allow theapplication to dynamically adjust function parametersor choose different task implementations (e.g., slowand accurate or fast and less accurate people detectionalgorithm) to maximize the utility under the currentcontext (e.g., node computational resources, node load,networking, battery level) and according to applicationrequirements (e.g., latency).

3 ACTOR MODEL

In the actor model, actors are treated as the universalprimitive of computation. They can communicate witheach other only through message passing. When an actorreceives a message it can perform some local processing,create new actors and send itself messages [26]. Thissimple model provides strong guarantees for highlyconcurrent systems and has been successfully applied towidely used distributed systems.2 As such it providesa good foundation for a prospective execution modelfor highly distributed systems of IoT. We now brieflysurvey existing actor systems and their applicability forIoT systems (Section 3.1).

Erlang [22] is a functional programming languagethat follows the actor model. Actors need to beexplicitly created at a specified location (i.e., either onthe caller’s server or on a remote server). Processes(actors) communicate with each other via asynchronousmessage passing both locally and remotely. Erlang doesnot support process migration. In traditional Erlang,scaling is constrained by the fact that processes needto be created explicitly and that connections are sharedbetween all nodes, requiring data structures quadraticin the number of nodes at every machine. Chechina etal. [11] propose the scalable distributed Erlang library

2 E.g., Spark relies on the actor model for distributed computing.Microsoft is scaling their actor based Orleans system up to millionsof actors in a data center [7].

73


to overcome these bottlenecks by dividing nodes intosmaller groups and providing a more implicit processplacement.

Akka [1] is a popular actor framework on top of theJVM written in Scala. Since version 2.10.0, Scala usesAkka as the default actor library [27]. Like Erlang,Akka adopts the “Let it crash” model for resilience.Akka cluster provides a cluster membership servicethat relies on a Gossip protocol similar to Amazon’sdistributed key-value store Dynamo [19]. Akka clusteruses heartbeats in a 1 s interval and the Phi AccrualFailure Detector [25] to calculate the probability for anode being unreachable. Programmers can tweak failuredetection by setting a threshold value. Lightbend hassuccessfully run an Akka cluster with 2400 nodes onGoogle Compute Engine [36].

Orleans [7] adds in contrast to Erlang and Akka theabstraction of virtual actors (grains). The virtual actorabstraction removes the need for the programmer toexplicitly create or destroy actors. Actors always existvirtually and their instances are dynamically createdby the Orleans runtime on an available server when amessage is passed to an actor. Likewise, actors aretorn down when they are no longer needed. The virtualto physical actor mappings are stored in a one-hopdistributed hash table (DHT) and a large local cache onevery server to avoid the otherwise additional networkhop for each message that is sent. Actors can bestateless with potentially many instances or restricted toa single activation. Stateless actors allow Orleans toautomatically scale out hot actors by creating multipleinstances.

3.1 Problems with Existing Actor Systems

Existing actor systems provide a simple model forabstracting concurrent computation. However, theyare generally intended for in data center—or even forsingle machine—computation. Only recently, thereare some efforts to extend them to geo-distributeddata centers [6]. Actor based systems assume(mostly) reliable and uniform network communication.Last, actor programming frameworks require thatprogrammers adjust their programming style to actorprogramming, a programming model unfamiliar to manydevelopers.

Opposed to in data center computation, IoT systemsoften rely on wireless communication, which can beunreliable and not timely. Further, IoT and edgeplatforms are heterogeneous and are more limited in theirresources compared to a server in a data center.

Actors of IoT systems need to adapt themselvesaccording to the current context (network andcomputational resources). Part of this adaptation

Nandu Application Library

F1 F2 F3

Function Calls

A1 A2 A3

Local and Remote Calls

Nandu Node

Function and Data Cache Scheduler Discovery

(1) F

ront

end

(2) B

acke

nd

Figure 3: Nandu architecture overview. (Regularfunctions are intercepted by the Nandu applicationlibrary and then scheduled as Nandu actors by a localscheduler, either locally or on a remote node.)

should be strategies such as data or task degradation.Further, we argue that because of the potentially widegroup of IoT developers, to gain traction, a goodframework should enable developers to use commonsequential programming style and not expose themdirectly to actor programming.

In the following, we port the actor model to edge-centric IoT systems by introducing the concept ofadaptive actors.

4 TOWARDS ADAPTIVE ACTORS FORSCALABLE IOT APPLICATIONS

We now describe our design of Nandu, an actor baseddistributed execution framework and programmingmodel, tailored for highly dynamic, autonomous andedge-centric IoT applications.

4.1 System Overview

Figure 3 depicts the overall proposed architecture ofNandu. Conceptually, we divide it into two parts (1) theapplication library (frontend) and (2) the node (backend),which are only loosely coupled. The main primitive ofNandu is a node. Each Nandu node can host a variablenumber of actors. Actors can communicate with otheractors on the same node and with actors on remote nodes.

This communication is provided by the hosting node.Nodes communicate with each other through RemoteProcedure Calls (RPC). Inspired by Orleans [7] model

74


of virtual actors, our model abstracts the physical actorlocation away by using a distributed hashing table (DHT)to keep track of current node-actor mappings. This willallow actors to potentially migrate during the runtime ofan application. If a function depends on local device I/O(e.g., camera stream, or an actuator), the correspondingactor is bound to the device as well.

To use Nandu, developers only need to import ourlibrary and then implement their application logic ina sequential, non-distributed way. They then canannotate (a subset of) functions to enable a conversionof these functions to adaptable actors. Practically, weachieve this transformation to actors by creating a Nanduobject for each annotated function that contains theserialized function logic, the function input parametersand adaptation hints. Actors are next scheduled byNandu either locally or on a remote node. The schedulerperforms this decision based on the hints provided bythe programmer, profiling of previous executions of thesame function and based on its current state of availableNandu nodes (e.g., is there a node close-by with freeresources). We plan to use a distributed hashing tableto propagate the same state across distributed nodes.

We assume annotated functions to be stateless. In caseof a node failure or network partition, we simply re-create the respective actor. For time-critical functionsthat require high reliability, Nandu can execute afunction on multiple nodes and only take the firstavailable result. Nandu uses a cache in form of an in-memory KV-Store to cache actors and input data.

4.2 Programming Model

Our programming model allows existing programsto be swiftly transformed into distributed, adaptiveactors, while only exposing developers to a traditional,sequential coding style.

We achieve the transformation of ordinary functionsto actors by enabling developers to annotate functionsusing the decorator design pattern, which provides aninterception point to the Nandu application library.Annotated functions need to be self-contained, i.e., theyneed to contain necessary dependencies (e.g., necessaryimport statements) and not refer to variables outsidethe function scope. When an annotated function isinvoked, instead of executing it as usual, we return apromise [31], that then serves as input for subsequentfunction invocations and thus implicitly enables us tolink the output of one actor to the input of another.Promise constructs (also known as futures) are includedin standard libraries of many major programminglanguages (e.g., C++. Java, Python). This makes ourapproach applicable to a wide range of implementationand systems. The function logic itself is scheduled

1 from nandu import adapt2 @adapt(offload = True,3 size = {640: 10, 480: 9, 320: 6, 240: 3})4 def resize_image(img, img_id, size = 640):5 import imutils6 return(imutils.resize(7 img, width=min(size, img.shape[1])),8 img_id)

Listing 1: Nandu programming model. (Developerscan annotate functions with hints for the runtime onwhat parameters to adapt dynamically. In this example,the image size is adapted and the function is potentiallyoffloaded dynamically.)

locally or remotely and is executed when its inputbecomes available (i.e., the output of a previous functionhas been computed and thus the promise becomesavailable).

To enable actor adaptations, function decorators allowdevelopers also to specify function utility and adaptationparameters as hints to the Nandu scheduler. Asan example, Listing 1 shows a code snippet usingour Nandu prototype implemented in Python. Theoffload=True parameter explicitly allows the Nanduscheduler to place the actor remotely. The sizedictionary parameter represents a selection of possiblesize parameter values together with their expectedutility towards the application goal (i.e., greater imagesizes will provide better accuracy). Nandu uses suchhints to support its scheduling and adaptation decisionsas we describe now.

4.2.1 Developer Hints

Developers assign adaptation hints for the overallapplication as well as for different functions.

Overall application hints. These hints represent non-functional requirements that should be achieved by theapplication or application components, e.g., “overallpeople detection pipeline should occur in ≤ 1 s”. Assuch, they must define a measurable proxy that capturesa requirement well.

Individual function hints. These hints indicate thata function is adaptable, which of its parameters canbe adapted, and what the expected utility of differentadaption values is. Utility represents the impactof different parameter values towards achievement ofthe overall application requirements, which cannot bemeasured by the system itself. For example, dependingon the function logic and the specific parameter, this

75


Table 1: Execution context and possible adaptations

Task Execution Context Possible Adaptation

Search a suspect in a crowd usingsurveillance cameras and facerecognition.

Many people in camera view.Face-recognition slow, latencyoutside requirements.

Degrade parameters of classifier toperform faster.

Predict maintenance of CNCmachine in a factory in nearreal-time.

Required AI accelerator (GPU ordedicated ASIC) not available onedge.

Migrate to cloud-node equippedwith AI accelerator.

Estimate room occupancy throughwireless connected CO2 sensors todetermine HVAC set-points.

Poor network bandwidth, it is notpossible to send all the data in realtime.

Reduce sampling rate or aggregatedata within an interval, beforesending it, to reduce bandwidthusage.

utility value can represent the accuracy of a predictivemodel; any Quality of Experience (QoE) measure,such as the user-perceived quality of different mediacompression levels; or any other performance measure.For the application scenario from Section 2, thedeveloper might for example specify different classifiersand assign them utility values according to their expectedaccuracy results. Application hints together withindividual function hints and online actor profiling allowus then to optimise the overall application utility throughactor adaptation.

4.3 Adaptation

To capture different IoT execution contexts, Nandu usesthree adaptation strategies: (1) actor migration, whereactors are migrated to more powerful nodes, (2) datadegradation, where actor input is degraded in order toadapt the application to changing network conditionsor available resources and (3) actor degradation, wherea different actor implementation is selected (e.g., adifferent classifier).

Any of these strategies might be a viable way toadapt a distributed application to the current executioncontext: Migration is intended for when a stable networklink is available and data movement between actorsis small; data degradation helps to mitigate varyingnetwork bandwidths, either during a single deployment(e.g., network bandwidth changes during the lifetimeof an application) or for different deployment scenarios(e.g., the same application is deployed to wireless andwired devices); actor degradation is able to adapt a taskto available computing resources, e.g., when additionalload occurs or when a network link to a higher powernode becomes unavailable and execution needs nowoccur on a lower power node (e.g., the drone from the

scenario described in Section 2). Table 1 gives furtherexamples of applications and their execution contextdependent adaptations.

Nandu selects and executes these adaptation strategiesby combining developer provided adaptation hints withonline actor profiling, while using overall non-functionalapplication requirements as a target for optimization(e.g., latency).

4.3.1 Optimization

Nandu optimizes adaptation by combining the developerhints introduced in Section 4.2.1 with online profilingto establish cost and utility of each actor execution.Nandu assigns a utility value for each actor execution.This utility value represents how well the actor performsaccording to non-functional application requirementsthat cannot be measured: accuracy, QoE or otherperformance measures. In addition, a cost valuerepresents how much effort is involved in the taskexecution. This cost is monitored by Nandu throughonline profiling. In this work we focus on latency costs,however the same model can also be applied to othermeasurable cost factors depending on application wideQoS requirements: e.g. bandwidth, CPU or energy use.

The Nandu Optimizer is in charge of finding thebest adaptation strategy, using the previously definedadaptation mechanisms (function annotations) to deliverthe highest possible utility with the defined costconstraints in the current context (e.g. current availableCPU or network bandwidth).

Finding the best strategy is a complex task dueto the large number of combinations of adaptationmechanisms. To solve this problem, without adding asubstantial overhead to the overall system, we currentlyuse a simple heuristic value selection algorithm for

76


Algorithm 1 Simplified parameter selection logicoldParams = {...} . previous function parameterswindowSize = 20 . tunable velocity parameter to fit applicationi = 0

function DECORATOR(oldParams)if MMEAN(ObservedCosts, windowSize) > Constraint then

params = DEGRADE(oldParams); i = 0else

if i >windowSize thenparams = UPGRADE(oldParams); i = 0

elseparams = oldParams; i++

end ifend if

end function

dynamic parameters (depicted in Algorithm 1). When anactor receives input and is executed, we monitor and saveits execution time and input parameters. We then usethese statistics to estimate the cost for future executionsusing a given set of parameter values. For example, inthe scenario from Section 2, the cost is the time that ittakes to process each image, from the moment the imageis captured to the moment when the result of the peopledetection step is obtained.

When an actor is executing, Nandu chooses theparameter values that it expects will yield the bestcost-utility trade-off (e.g., latency-accuracy). Selectinga parameter value not only affects the execution ofthe actor itself, but also the execution of subsequentactors. Our model thus estimates the cost for the wholeexecution pipeline. Practically, we estimate this cost bycalculating the moving mean of the last K executions. Ifno profiling data is present for a given set of parametervalues, 0 is returned to encourage the exploration ofunknown sets of values. The window size is a parameterfor our optimization model that can be configured tomeet the expected velocity of the application (i.e. howswiftly Nandu should adapt to changing execution costs).More formally, the estimated cost is calculated as:

N∑f=0

(1

WS

WS∑k=0

cf(params=P )) (1)

Where N is the overall number of actors for theprocessing job, WS is the window size for which wecalculate the moving mean (in our case WS = 20)and cf(params=P ) is the cost of running function f withparameters P .

As described in Section 4.2, developers provide hintsin form of parameter-value pairs to Nandu. During

optimization, these hints are used to maximize theoverall application utility. For instance, in our example,we specify different image resolutions and how theyaffect application accuracy. For the initial executionof an actor, we sort parameter values by descendingaccuracy, and start profiling the cost for each value inthis order during runtime. In our example, Nandu willfirst select the image size that yields the highest accuracybecause no profile data exists yet. During the nextiteration, Nandu estimates the cost by using its profileddata; if this profiled cost (i.e., the latency in our example)is less than the cost target, Nandu continues to use thisparameter value, otherwise it selects the next best valueto lower the cost. This process continues until the costtarget is met with the highest possible accuracy.

5 IMPLEMENTATION AND EVALUATION

We have implemented a prototype of Nandu in Python3.6 to show that our programming model provides asimple abstraction for distributed IoT programming andthat our adaptation strategies can support applicationrequirements under dynamic conditions. Because wefocus our evaluation on the benefit of our programmingmodel and the benefit of dynamic adaptation, wecurrently use static host addresses instead of a DHTbased and fully decentralized system. We use Pythondecorators to allow developers to specify adaptationhints and Python’s future statement as placeholder fora function output. This allows us to schedule thedifferent actors before their final input is available asdescribed further in Section 4.2. To serialize functionlogic we use cloudpickle [16], which allows us toserialize Python objects not supported by pickle modulein the Python standard library. Cloudpickle has beenapplied previously to Spark based big data processing

77


Capture Image

Resize Image

Detect People

ApplicationPipeline

Figure 4: Simplified people detection pipeline

applications [47]. We have build an RPC server andclient using the Python’s asyncio module that runs oneach Nandu node and abstracts message passing andserialization away from the Nandu application library.When a node receives a new message, we de-serializeit, and pass the resulting Nandu object to the scheduler,which decides on its execution.

We also implemented a simplified version ofthe Computer Vision-based people detection exampleintroduced in Section 2 using OpenCV that consists oftwo devices: (a) the drone, which goal is to detect peoplein a stream of images in collaboration with (b) an edgedevice (see Figure 4).

The image processing pipeline is divided in three mainsteps: (1) capture the image, (2) resize the image, and (3)detect people in the image. Each step is performed by adifferent function; (1) and (2) are run on the drone while(3) might be migrated to the edge component. We use thepre-trained HOG + LinearSVM model from OpenCV [9]to implement the people detection step.

5.1 Experimental Setup

We use mininet [34], a simple network emulator, tosimulate different network environments. Despite itssimplicity, researchers have been able to reproducethe results of more than 40 networking papers withmininet [46]. Mininet does not explicitly simulatea wireless network setting, however our wirelessapplication scenario is just an example of a changingnetwork (e.g., recently, Zhang et al. [48] have shownwide bandwidth and RTT variances for Internet scaleapplications). This is why, without loss of generality,we only simulate different bandwidths to show howNandu can adapt under different network conditions.As for dataset, we use a subset of the INRIA PersonDataset [18] with 410 images. The subset was createdby selecting all the images with a size larger than 700 kBfrom the positive samples and images larger than 600 kBfrom the negative samples. The different thresholdswere chosen to have a fairer split between positive andnegative samples for our use case.

5.2 Latency-Accuracy Trade-offs

Using mininet, we simulate different network speeds:5, 10, 20, 50 and 100Mbps. To understand thelatency accuracy trade-off, and to establish a baseline forNandu’s adaptive optimization, we then run experimentsfor static wireless link-image size pairs. Figure 5 depictsour measured latency values. As expected, latency isdetermined by two variables: (1) network link speed and(2) image sizes.

Moving on to the quality of the received results,Table 2 shows what accuracy we achieve with differentimage sizes and the HOG + LinearSVM classifier. Forour application, we are only interested in detectingpeople, and not in the exact number of people.Therefore, we also classify results where the number ofpeople is not exact as correct. From our results, we seethat accuracy is relatively stable across image sizes andbetween 83 and 91%.

Our experiments reveal two important insights:(1) applications can gain accuracy and retain latencyrequirements by adapting to different networkingconditions, and (2) developers implementing theirapplication without Nandu would need to manually pickan—application requirements dependent—adequateimage resize parameter that is not optimal for allexecution contexts or implement custom optimizationstrategies. With these insights, we now evaluate Nandu’sperformance for changing execution contexts.

5.3 Input Degradation

For evaluating Nandu’s input degradation strategy, weannotate the resize function of our simple peopledetection pipeline with Nandu developer hints as shownin Listing 1, adding only one additional line of code. Wefurther set the overall QoS requirement to 1 s latency.Then we run the application again for the same networkspeeds as in Section 5.2.

As we can see in Figure 6, the size, to whichimages were resized by Nandu, varies according tothe available network bandwidth: as bandwidth grows,Nandu uses higher resolution images in order to keep thehighest accuracy possible while not exceeding the targetlatency requirement of the overall application. Notethat, because we are using a discrete set of parametervalues as input to Nandu, the optimizer is not able topick an optimal image size value for some networkspeeds. E.g., Nandu selects both, 480 px and 640 px for100Mbps, because perceived latencies of 480 px are 1 s,while latencies of 640 px are slightly larger than 1 s—anoptimal value would be somewhere in between.

Figure 7 depicts the distribution of the completiontimes of our runs. Even with the sub-optimal, discrete

78


05Mbps 10Mbps 20Mbps 50Mbps 100MbpsWireless Link

0

1

2

3

4

Late

ncy

(s)

Image Width (px)240 320 480 640

Figure 5: Network link speeds and experienced latency. (Different network speeds result in highly differentlatency results. No single image width works for all network speeds if application requirements should be met.)

05

0b

ps

10

0b

ps

20

0b

ps

50

0b

ps

10

00

bp

s

WLreless LLnN

0

50

100

150

200

250

300

350

400

450

1u

mb

er

oI

Lmag

es

Image WLdWh (px)240 320 480 640

Figure 6: Selected image sizes for application runs. (Nandu selects the best possible size among the discrete setof supplied parameter values in developer hints according to application requirements and execution context.)

0.0 0.5 1.0 1.5 2.0 2.5

Completion Time (s)

0

50

100

150

200

250

300

350

Nu

mb

er

of

imag

es

Wireless Link5Mbps 10Mbps 20Mbps 50Mbps 100Mbps

Figure 7: Stacked completion times of application runs. (The majority of images is processed in≤ 1 s accordingto application requirements specified by the developer)

79


Table 2: Accuracy for different image sizes. (Largerimage sizes result in higher accuracy.)

Image Size (px) 240 320 480 640

Accuracy (%) 86.1 88.3 88.8 90.73

Table 3: Mean latencies and accuracy results fordifferent network speeds

Wireless Link (Mbps) 5 10 20 50 100

Mean Time (s) 0.82 0.76 0.72 0.69 0.85

Accuracy (%) 87.3 88.8 86.8 88.8 90.0

Table 4: Accuracy and proportional processing timeof HOG descriptor with different scale values using480px images

Classifier accuracy Low Medium High

Scale 1.30 1.15 1.05

Accuracy (%) 80.7 85.7 88.8

Proportional proc. time (%) 100 145 351

size selection, the great majority of images wereprocessed in ≤ 1 s. This is in strong contrast to ourresults in Figure 5, where some size values result inlatencies above 3 s—a three-fold violation of applicationlatency requirements.

Further, Table 3 shows that our mean accuracy overall bandwidth speeds is 88.34%, which is higher thanwhat we might achieve with a static parameter selection(e.g., Table 2 shows that picking an image size of230 px, results in an accuracy of only 83.6%). Italso shows the accuracy increases with higher availablebandwidth, because it can transmit more images withhigher resolution.

5.4 Actor Degradation

Next we look actor degradation as an adaptationstrategy by implementing a low, medium and highaccuracy method of the HOG + LinearSVM classifierby modifying the value of the parameter scale, for thedetectMultiScale method of the HOG detector.To be able to detect people close to the camera, aswell as people further away, HOG calculates features fordifferent scales of the same image. A smaller scale valueresults in more steps, while a larger value results in lesssteps. The resulting classifiers and their accuracy andlatency results can be seen in Table 4. We then placethe people detection actor on the edge node. We create

additional load for two time intervals using stress(options: -c 2, spawning two sqrt() processes) asimple workload generator for Linux [44]. We set theoverall QoS again to 1 s.

Figure 8 shows the result of this experiment. Atthe beginning, Nandu uses the high accuracy method.When CPU resources need to be shared with stress,it degrades first to the medium accuracy method andthen to the low accuracy method to reach its latencygoals. When additional load at the node is again reduced(at 180 s), Nandu returns quickly to its high accuracymethod. Overall, Nandu achieves a mean accuracy of84.88% and a mean latency of 0.76 s.

5.5 Actor Migration

To evaluate Nandu’s actor migration strategy, we againuse stress to simulate additional load, this time on thedrone (see Figure 9). This increases latency of the peopledetection task above QoS requirements and thus triggersan actor migration from drone to edge (40 s). When loadis reduced, Nandu migrates the actor back to the dronedynamically (at 110 s and finally at 180 s). Overall,Nandu achieves a medium latency of 0.70 s across all ofour experiments. Note, that accuracy is not influenced byactor migration, as we are solely adapting the placementof computation.

Nandu decides dynamically and fine-grained onmigration, depending on available CPU resources onthe Drone. To evaluate this, we again use mininet toevaluate different CPU shares by restricting availableCPU bandwidth on the drone host (mininet relies on theCFS bandwidth control of the Linux kernel [42], whichallows to explicitly set an upper CPU bandwidth limitfor a process). Figure 10 depicts the actor placementdistribution for different CPU shares. For smaller shares,actors are placed pre-dominantly on the edge server,while from 40% upwards, actors are solely placed onthe drone.

5.6 Adaptation Overhead

Finally, to evaluate the adaptation overhead of Nanduwe compare the execution time of a Nandu actor withan ordinary Python function (1000 repetitions). Thisexperiment results in a time difference of 4ms for asingle execution. The overhead is mainly due to theadded execution time for the annotation decorator, whichis invoked for each call. Despite that, this small overheadresults in fewer lines of code (LoC).

80


0 50 100 150 200 2500

20

40

60

80

100C

PU u

sage

(%)

Nandu Stress (other processes)

0 50 100 150 200 250Time (s)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Late

ncy

(s)

Method usedHigh accuracy method Medium accuracy method Low accuracy method

Figure 8: Actor degradation. (Increased load on the hosting node leads to increased latency after 60s and todynamic actor adaptation to stay in QoS specified latency requirements.)

0 50 100 150 200 2500

20

40

60

80

100

CPU

usa

ge (%

)

Stress (other processes) Response Time

0 50 100 150 200 250Time(s)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Late

ncy

(s)

Execution NodeEdge Drone

Figure 9: Actor migration. (Additional load on the drone dynamically triggers an actor migration to the edge.)

81


10% 20% 30% 40%Drone CPU Share

0

100

200

300

400

Num

ber

of im

ages

PlacementDroneEdge

Figure 10: Placement distribution. (Nandu allocates actors dynamically on drone and edge according to availableCPU resources.)

6 DISCUSSION

Overall, these results show that Nandu is able to adaptto different networking and computing conditions bydynamically applying its adaptation strategy in unisonwith developer provided hints and overall applicationlatency requirements. Because we use a discrete setof values for adaptation, the results are sub-optimal.Applying more advanced optimization strategies and amore fine-grained selection of input parameters (e.g., inour application for the image resolution) is a topic forfuture work. Specifically, we are currently exploring twodifferent techniques: (1) Discrete Optimization and (2)Reinforcement Learning.

Because the decisions of task migration anddegradation are discrete and the input degradationcould be discrete or continuous, it is necessary toframe the problem in the discrete domain, as a discreteoptimization problem. To define the adaptation problemas such, we consider the utility as our objective function,and our non-functional requirements as the constraints.Another approach that we consider is reinforcementlearning. In this case, our system is seen as an agent;its state is seen as the current context and its ownperformance, in terms of utility and cost; an action is achange in its optimization strategy; and the reward forperforming this action, is the improvement (or decrease)in its performance that this action had.

Our current model puts the responsibility of assigningutility values to different adaptation options on thedeveloper. For different input degradations (e.g., imageresolutions, frame rate) this can be achieved easily.However, the relationship between accuracy results anddifferent ML based classifiers might sometimes be notobvious. We therefore consider to add an off-lineprofiling phase to our system, where we obtain accuracyvalues for different classifiers by using a testing data setas input (i.e., including ground truth values).

For node discovery, we are currently exploring severaloptions: (1) Traditional service-discovery protocols likeAvahi, Bonjour, Zeroconf for local network discovery;(2) a (local) message broker (e.g., MQTT); (3) NamedData Networking (NDN/CCN) principles for nodediscovery, similar to the work in [30], a Nandu nodecould simply send interest packets containing the nameof the actor into the network to allow close-by nodes toperform the execution opportunistically and return theresult in a NDN data packet.

7 RELATED WORK

There are several related research fields to our work. Wenow discuss a selection of relevant works.

7.1 Distributed Execution Frameworks

In [32], the authors develop MagnetOS, a programmingmodel for ad hoc networks where a thin distributedoperating system layer makes the entire network appearto applications as a single virtual machine. MagnetOSthen dynamically partitions a set of communicating Javaobjects in a sensor network with a focus on energyefficiency.

Sapphire [49] requires programmers to specify per-object deployment managers which aid in runtimeobject placement decisions, while abstracting awaycomplexities of inter-object communication. Sapphireonly focuses on mobile-cloud systems and leavesdeployment decisions to the developer (application anddeployment logic is split). In our system, deployment isdecided by the runtime based on local and global systemcontext.

Beam [40, 39] is a framework that simplifies IoTapplications by letting them specify “what should besensed or inferred”, without worrying about “how it issensed or inferred.” Beam introduces the abstraction

82


of an inference graph to decouple applications from themechanics of sensing and drawing inferences.

Rivulet [4] is a fault-tolerant distributed platform forrunning smart-home applications that can handle typicalfailures like link losses, network partitions, sensorfailures, and device crashes. It provides distributedplatform for running smart-home applications onheterogeneous consumer appliances.

FogFlow [13] develops a programming model for edgecomputing where developers provide a directed acyclicgraph and declarative granularity and stream shufflinghints for stream processing. FogFlow then manages taskallocation of data processing tasks over cloud and edgesfor minimal latency and low bandwidth consumption.

In [38, 20] the authors develop the concept oftasklets, which are fine-grained computation units thatare executed by a distributed run-time according todeveloper specified Quality of Computation (QoC)goals. Tasklets are similar to our approach inthat they can be used to offload small computationtasks. However, our Nandu run-time further adapts theexecution logic of a task dynamically according to QoSrequirements and developer hints. Results are retrievedas promises opposed to blocking calls.

Ray [35] focuses on large-scale machine learningand reinforcement learning applications by allowingdata scientists to easily parallelize existing applications.Dask distributed [2] is an effort out of the SciPycommunity with similar goals. Both approaches focuson simplifying and accelerating data science applicationsand an in-cluster computation. Our focus is on dynamicIoT-edge systems. Recently, [30] proposes NFaaS:named function as a service, in which close-by services(e.g., edge-hosted) are discovered by clients by themeans of content centric networking principles (NDN).Their work is orthogonal to ours, as NDN can be usedto discover close-by nodes more efficiently than usingoverlay networks.

7.2 Code Offloading

Code offloading approaches have been appliedextensively in the mobile computing domain dueto the limited on-device compute capabilities. [37]motivates the idea of cloudlets as means to overcomemobile resource limitations and latency limitations ofcloud computing. Cloudlets are VM-based computingresources that can be used seamlessly by close-bymobile applications. MAUI [17] follows with animplementation of such a system for .NET on Windowsmobile that shows superior performance comparedto solely on-device computing strategies. BothCloneCloud [14] and COMET [24] use VMs to parallelexecute tasks in the cloud. Our work is inspired by the

performance gains achieved with such code-offloadingtechniques, but with Nandu we argue that in the IoTdomain, tasks need not only be offloaded, but alsoadapted dynamically by the run-time.

7.3 Programming Abstraction

The goal of a good programming abstraction should beto enable programmers to express what they want theirprogram to do, without specifying all details on how to.Senergy [28] is a framework for programming mobileapplications. It supports programmers in automatingcommon latency, power, accuracy trade-offs by lettingthem specify a priority among them. ENT [10]provides a type-based proactive and adaptive mode-based energy management at the application level, wheredevelopers characterize energy behavior of differentprogram fragments with modes. In Nandu, we combinethese ideas of run-time supported task adaptation withfine grained task migration.

Policy based network management [41] can providefurther useful abstractions by applying policy basedmanagement to networking. Policies are definedas rules that define the states and behaviors of thenetwork. This allows for simple, dynamic adaptationsin large-scale distributed systems, without modifyingthe implementations. Nandu follows an orthogonalapproach, where the execution framework has fine-grained control over distributed applications logic. Toenable this fine-grained control, we have focused on IoTapplications, the programming model and on frameworksupport for different adaptation strategies (migration,task degradation and input degradation). Integratingpolicy based network management into our system canbe an additional useful adaptation modality and createsynergy between application and networking layer as hasbeen shown in [29].

8 CONCLUSION AND FUTURE WORK

We presented Nandu, an adaptive, actor based executionenvironment for distributed IoT applications. Nanduenables applications at the edge of the network todynamically use available resources like envisionedin edge-computing, while freeing developers fromthe tedious and complex tasks of distributed systemdevelopment. Applications dynamically adapt to thecurrent execution context, like network link quality andavailable computational resources, through an interplayof developer provided hints and local schedulers at eachparticipating node. Our prototype is able to adhereto application latency requirements while maximizingachieved accuracy dynamically.

83


Nandu is actively ongoing research in our group, andwe see much potential for additions and future work. Insimilar fashion to [33], which applies machine learningto adaptive video streaming, we are actively investigatingthe application of reinforcement learning for schedulingdecisions in the IoT space. Another aspect for our futurework is to improve Nandu’s security principles. We arelooking in actor isolation techniques, e.g., through theuse of individual actor unikernels. For node discovery,we are exploring Named Data Networking (NDN/CCN)principles for node discovery, similar to the work in [30],where the system sends interest packets into the networkthat contain the name of functions and returns the resultin data packets.

REFERENCES

[1] Akka, “Build highly concurrent, distributed, andresilient message-driven applications on the jvm,”https://github.com/akka/akka, 2018.

[2] Anaconda, Inc, “Dask: Distributed computation inpython,” https://github.com/dask/distributed, 2018.

[3] G. Ananthanarayanan, P. Bahl, P. Bodık,K. Chintalapudi, M. Philipose, L. Ravindranath,and S. Sinha, “Real-time video analytics: Thekiller app for edge computing,” Computer, vol. 50,no. 10, pp. 58–67, 2017.

[4] M. S. Ardekani, R. P. Singh, N. Agrawal, D. B.Terry, and R. O. Suminto, “Rivulet: a fault-tolerant platform for smart-home applications,”in Proceedings of the 18th ACM/IFIP/USENIXMiddleware Conference. ACM, 2017, pp. 41–54.

[5] N. Baccour, A. Koubaa, L. Mottola, M. A. Zuniga,H. Youssef, C. A. Boano, and M. Alves, “Radiolink quality estimation in wireless sensor networks:A survey,” ACM Transactions on Sensor Networks(TOSN), vol. 8, no. 4, p. 34, 2012.

[6] P. A. Bernstein, S. Burckhardt, S. Bykov,N. Crooks, J. M. Faleiro, G. Kliot, A. Kumbhare,M. R. Rahman, V. Shah, A. Szekeres et al., “Geo-distribution of actor-based services,” Proceedingsof the ACM on Programming Languages, vol. 1, no.OOPSLA, p. 107, 2017.

[7] P. A. Bernstein, S. Bykov, A. Geller, G. Kliot, andJ. Thelin, “Orleans: Distributed virtual actors forprogrammability and scalability,” MSR-TR-2014–41, 2014.

[8] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli,“Fog computing and its role in the internet ofthings,” in Proceedings of the first edition ofthe MCC workshop on Mobile cloud computing.ACM, 2012, pp. 13–16.

[9] G. Bradski and A. Kaehler, “Opencv,” Dr. Dobb?sjournal of software tools, vol. 3, 2000.

[10] A. Canino and Y. D. Liu, “Proactive andadaptive energy-aware programming with mixedtypechecking,” in Proceedings of the 38th ACMSIGPLAN Conference on Programming LanguageDesign and Implementation. ACM, 2017, pp.217–232.

[11] N. Chechina, P. Trinder, A. Ghaffari, R. Green,K. Lundin, and R. Virding, “The design of scalabledistributed erlang,” in Proceedings of the 24thSymposium on Implementation and Application ofFunctional Languages (IFL 2012), 2012, p. 461.

[12] K. Chen, J. Furst, J. Kolb, H.-S. Kim, X. Jin,D. E. Culler, and R. H. Katz, “Snaplink: Fast andaccurate vision-based appliance control in largecommercial buildings,” Proceedings of the ACMon Interactive, Mobile, Wearable and UbiquitousTechnologies, vol. 1, no. 4, p. 129, 2018.

[13] B. Cheng, G. Solmaz, F. Cirillo, E. Kovacs,K. Terasawa, and A. Kitazawa, “Fogflow: Easyprogramming of iot services over cloud and edgesfor smart cities,” IEEE Internet of Things Journal,2017.

[14] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, andA. Patti, “Clonecloud: elastic execution betweenmobile device and cloud,” in Proceedings of thesixth conference on Computer systems. ACM,2011, pp. 301–314.

[15] A. Claesson, A. Backman, M. Ringh, L. Svensson,P. Nordberg, T. Djarv, and J. Hollenberg, “Timeto delivery of an automated external defibrillatorusing a drone for simulated out-of-hospital cardiacarrests vs emergency medical services,” Jama, vol.317, no. 22, pp. 2332–2334, 2017.

[16] C. contributors, “cloudpickle,” https://github.com/cloudpipe/cloudpickle, 2018.

[17] E. Cuervo, A. Balasubramanian, D.-k. Cho,A. Wolman, S. Saroiu, R. Chandra, and P. Bahl,“Maui: making smartphones last longer with codeoffload,” in Proceedings of the 8th internationalconference on Mobile systems, applications, andservices. ACM, 2010, pp. 49–62.

[18] N. Dalal and B. Triggs, “Inria person dataset,”Online: http://pascal. inrialpes. fr/data/human,2005.

[19] G. DeCandia, D. Hastorun, M. Jampani,G. Kakulapati, A. Lakshman, A. Pilchin,S. Sivasubramanian, P. Vosshall, and W. Vogels,“Dynamo: amazon’s highly available key-value

84

https://github.com/akka/akka

https://github.com/dask/distributed

https://github.com/cloudpipe/cloudpickle

https://github.com/cloudpipe/cloudpickle


store,” in ACM SIGOPS operating systems review,vol. 41, no. 6. ACM, 2007, pp. 205–220.

[20] J. Edinger, D. Schafer, C. Krupitzer,V. Raychoudhury, and C. Becker, “Fault-avoidancestrategies for context-aware schedulers in pervasivecomputing systems,” in Pervasive Computingand Communications (PerCom), 2017 IEEEInternational Conference on. IEEE, 2017, pp.79–88.

[21] J. Engel, J. Sturm, and D. Cremers, “Scale-aware navigation of a low-cost quadrocopter witha monocular camera,” Robotics and AutonomousSystems, vol. 62, no. 11, pp. 1646–1656, 2014.

[22] Erlang, “Build massively scalable soft real-timesystems,” http://www.erlang.org/, 2018.

[23] S. M. George, W. Zhou, H. Chenji, M. Won, Y. O.Lee, A. Pazarloglou, R. Stoleru, and P. Barooah,“Distressnet: a wireless ad hoc and sensornetwork architecture for situation managementin disaster response,” IEEE CommunicationsMagazine, vol. 48, no. 3, 2010.

[24] M. S. Gordon, D. A. Jamshidi, S. A. Mahlke,Z. M. Mao, and X. Chen, “Comet: Code offloadby migrating execution transparently.” in OSDI,vol. 12, 2012, pp. 93–106.

[25] N. Hayashibara, X. Defago, R. Yared, andT. Katayama, “The/spl phi/accrual failure detector,”in Reliable Distributed Systems, 2004. Proceedingsof the 23rd IEEE International Symposium on.IEEE, 2004, pp. 66–78.

[26] C. Hewitt, “Actor model of computation: scalablerobust information systems,” arXiv preprintarXiv:1008.1459, 2010.

[27] V. Jovanovic and P. Haller, “The scala actorsmigration guide,” http://docs.scala-lang.org/overviews/core/actors-migration-guide.html.

[28] A. Kansal, S. Saponas, A. Brush, K. S.McKinley, T. Mytkowicz, and R. Ziola, “Thelatency, accuracy, and battery (lab) abstraction:programmer productivity and energy efficiencyfor continuous mobile context sensing,” ACMSIGPLAN Notices, vol. 48, no. 10, pp. 661–676,2013.

[29] V. Karagiannis and A. Papageorgiou, “Network-integrated edge computing orchestrator forapplication placement,” in Network and ServiceManagement (CNSM), 2017 13th InternationalConference on. IEEE, 2017, pp. 1–5.

[30] M. Krol and I. Psaras, “Nfaas: named functionas a service,” in Proceedings of the 4th ACM

Conference on Information-Centric Networking.ACM, 2017, pp. 134–144.

[31] B. Liskov and L. Shrira, Promises: linguisticsupport for efficient asynchronous procedure callsin distributed systems. ACM, 1988, vol. 23, no. 7.

[32] H. Liu, T. Roeder, K. Walsh, R. Barr, andE. G. Sirer, “Design and implementation of asingle system image operating system for ad hocnetworks,” in Proceedings of the 3rd internationalconference on Mobile systems, applications, andservices. ACM, 2005, pp. 149–162.

[33] H. Mao, R. Netravali, and M. Alizadeh, “Neuraladaptive video streaming with pensieve,” inProceedings of the Conference of the ACM SpecialInterest Group on Data Communication. ACM,2017, pp. 197–210.

[34] Mininet Team, “Mininet,” http://mininet.org/,2018.

[35] R. Nishihara, P. Moritz, S. Wang, A. Tumanov,W. Paul, J. Schleier-Smith, R. Liaw, M. Niknami,M. I. Jordan, and I. Stoica, “Real-time machinelearning: The missing pieces,” in Proceedings ofthe 16th Workshop on Hot Topics in OperatingSystems. ACM, 2017, pp. 106–110.

[36] P. Nordwall, “Running a 2400 akka nodes clusteron google compute engine,” 2013.

[37] M. Satyanarayanan, P. Bahl, R. Caceres, andN. Davies, “The case for vm-based cloudlets inmobile computing,” IEEE pervasive Computing,vol. 8, no. 4, 2009.

[38] D. Schafer, J. Edinger, J. M. Paluska, S. VanSyckel,and C. Becker, “Tasklets:” better than best-effort” computing,” in Computer Communicationand Networks (ICCCN), 2016 25th InternationalConference on. IEEE, 2016, pp. 1–11.

[39] C. Shen, R. P. Singh, A. Phanishayee, A. Kansal,and R. Mahajan, “Beam: Ending monolithicapplications for connected devices.” in USENIXAnnual Technical Conference, 2016, pp. 143–157.

[40] R. P. Singh, C. Shen, A. Phanishayee, A. Kansal,and R. Mahajan, “A case for ending monolithicapps for connected devices.” in HotOS, 2015.

[41] M. Sloman, “Policy driven management fordistributed systems,” Journal of network andSystems Management, vol. 2, no. 4, pp. 333–360,1994.

[42] P. Turner, B. B. Rao, and N. Rao, “Cpu bandwidthcontrol for cfs,” in Linux Symposium, vol. 10.Citeseer, 2010, pp. 245–254.

85

http://www.erlang.org/

http://docs.scala-lang.org/overviews/core/actors-migration-guide.html

http://docs.scala-lang.org/overviews/core/actors-migration-guide.html

http://mininet.org/


[43] United Nations Office of Disaster Risk Reduction,“The economic and human impact of disastersin the last 10 years,” www.unisdr.org/files/42862economichumanimpact20052014unisdr.pdf.

[44] A. Waterland, “stress,” https://people.seas.harvard.edu/∼apw/stress/, 2014.

[45] G. Werner-Allen, K. Lorincz, M. Ruiz, O. Marcillo,J. Johnson, J. Lees, and M. Welsh, “Deployinga wireless sensor network on an active volcano,”IEEE internet computing, vol. 10, no. 2, pp. 18–25,2006.

[46] L. Yan and N. McKeown, “Learning networkingby reproducing research results,” ACM SIGCOMMComputer Communication Review, vol. 47, no. 2,pp. 19–26, 2017.

[47] M. Zaharia, M. Chowdhury, M. J. Franklin,S. Shenker, and I. Stoica, “Spark: Clustercomputing with working sets.” HotCloud, vol. 10,no. 10-10, p. 95, 2010.

[48] B. Zhang, X. Jin, S. Ratnasamy, J. Wawrzynek, andE. A. Lee. (2018) AWStream: Adaptive Wide-AreaStreaming Analytics. [Online]. Available: https://nebgnahz.github.io/awstream-paper/awstream.pdf

[49] I. Zhang, A. Szekeres, D. Van Aken, I. Ackerman,S. D. Gribble, A. Krishnamurthy, and H. M.Levy, “Customizable and extensible deploymentfor mobile/cloud applications.” in OSDI, vol. 14,2014, pp. 97–112.

AUTHOR BIOGRAPHIES

Jonathan Furst is a visitingresearcher at NEC LaboratoriesEurope and a postdoctoralresearcher at the IT Universityof Copenhagen. He hasbeen working with softwarecontrollable micro grids, sensornetworks, IoT systems andBuilding Management Systems(BMS) in non residential

buildings. During 2014 he was a visiting researcher atUC Berkeley in the Software Defined Buildings (SDB)group. His current research focuses on developingbetter abstractions to program and run IoT applicationsefficiently from edge to cloud.

Mauricio Fadel Argerichis a research intern at NECLaboratories Europe. He isa student from the Master’sDegree in Data Science at LaSapienza, Universita di Roma,and has previously obtained thedegree of Information SystemsEngineer at the UniversidadTecnologica Nacional inArgentina. He has worked

as a software engineer with different technologiesand carried out a research project in Automatic LinkGeneration as a visiting student at the University ofAuckland in 2013. He is currently working on his thesison leveraging machine learning for IoT applications.

Kaifei Chen is a Ph.D.candidate in Computer Scienceat the University of California,Berkeley. He is in the Building-Energy-Transportation Systemsgroup. His research is focusedon mobile computing, indoorlocalization, sensor networks,and ubiquitous computing. Heobtained a bachelor degreein Computer Science and

Technology from University of Science and Technologyof China. He was a visiting student in Carnegie MellonUniversity Silicon Valley Campus in 2010, an internin Microsoft Research Asia in 2011, and an intern inMicrosoft Research Redmond in 2014.

Erno Kovacs holds a Ph.D.from the University of Stuttgart.At NEC Laboratories Europe, heis a Senior Manager for “CloudServices and Smart Things”.His group works on CloudComputing, IoT analytics, self-organisation and context-awareservices. He was a leadingarchitect in the SPICE and in the

MAGNET Beyond project. He is currently contributingto the FIWARE, FIESTA and Mobinet projects. He wasan advisor to the Singapore Smart Nation program inthe Functional Specification round table and was leadingNECs engagement in the Safe City Singapore test bed.

86

www.unisdr.org/files/42862_economichumanimpact20052014unisdr.pdf

www.unisdr.org/files/42862_economichumanimpact20052014unisdr.pdf

https://people.seas.harvard.edu/~apw/stress/

https://people.seas.harvard.edu/~apw/stress/

https://nebgnahz.github.io/awstream-paper/awstream.pdf

https://nebgnahz.github.io/awstream-paper/awstream.pdf

towards adaptive actors for scalable iot applications at ... · complex distributed systems require...

Documents