ieee transactions on computer-aided design of …lu/papers/emsoft20.pdf · 2020-07-22 · exploring...

12
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1 Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow, IEEE, Bruno Sinopoli, Senior Member, IEEE, and Shen Zeng, Member, IEEE Abstract—Industrial automation traditionally relies on local controllers implemented on microcontrollers or programmable logic controllers. With the emergence of edge computing, how- ever, industrial automation evolves into a distributed two-tier computing architecture comprising local controllers and edge servers that communicate over wireless networks. Compared to local controllers, edge servers provide larger computing capacity at the cost of data loss over wireless networks. This paper presents Switching Multi-tier Control (SMC) to exploit edge computing for industrial control. SMC dynamically optimizes control per- formance by switching between local and edge controllers in response to changing network conditions. SMC employs a data- driven approach to derive switching policies based on classifi- cation models trained based on simulations, while guaranteeing system stability based on an extended Simplex approach tailored for two-tier platforms. To evaluate the performance of industrial control over edge computing platforms, we have developed WCPS-EC, a real-time hybrid simulator that integrates simulated plants, real computing platforms, and real or simulated wireless networks. In a case study of an industrial robotic control system, SMC significantly outperformed both a local controller and an edge controller in face of varying data loss in a wireless network. Index Terms—Cyber-physical systems, edge computing, wire- less networked control system, machine learning. I. I NTRODUCTION I NDUSTRIAL automation is undergoing a significant trans- formation driven by the emergence of edge computing and wireless networking technologies. As shown in Fig. 1, the new generation of industrial automation systems features a two- tier computing architecture comprising local and edge com- puting platforms. Traditionally, industrial automation relies on local controllers running on microcontrollers or programmable logic controllers (PLC) that are often embedded in control plants with wired connections to sensors and actuators. Edge computing platforms comprise edge servers located on in- dustrial premise. To lower deployment and maintenance cost, edge servers may communicate with control systems through wireless technologies that are increasingly being adopted in industry [1], [2]. Despite their flexibility, wireless networks may suffer from data loss due to environmental factors such as obstacles, noises, interferences, extreme weather, as well Yehan Ma and Chenyang Lu are with Department of Computer Science & Engineering, Washington University in St. Louis, St. Louis, MO, 63130 USA. E-mail: {yehan.ma, lu}@wustl.edu. Bruno Sinopoli and Shen Zeng are with Department of Electrical & Systems Engineering, Washington University in St. Louis, St. Louis, MO, 63130 USA. E-mail: {bsinopoli, s.zeng}@wustl.edu. Manuscript received April 17, 2020; revised June 17, 2020; accepted July 6, 2020. This article was presented in the International Conference on Embedded Software 2020 and appears as part of the ESWEEK-TCAD special issue. as cyber attacks. The reduced reliability of wireless network results in degradation of the control performance. Therefore, compared with local controllers, edge servers provide the ad- vantages of computation capacity at the cost of communication reliability and latency. The Edge wired connection wireless network Sensors/Actuators (encoders, cameras, motors, valves) Local Controller (SCM, PLC) Edge Controller (Edge Server, NVDIA EGX) Computation capacity Communication latency Network reliability Fig. 1: Multi-tier control system (local and edge tiers). While edge computing has received significant attention in industry, research on exploiting edge computing for industrial control has remained limited. In this paper, we propose Switch- ing Multi-tier Control (SMC), a novel approach to optimize control performance at run-time on a two-tier computing platform. SMC dynamically switches control between local and edge platforms in response to changing network reliability. SMC has two salient features. First, in order to overcome theoretical challenges of analyzing control performance of complex systems under network dynamics [3]–[5], it employs data-driven approaches to derive control switching policies. A key contribution of this approach is to formulate the platform selection problem as a data-driven classification problem, Optimal Platform Classifier (OPC), which can be solved using models extracted from simulations. Second, it extends the Sim- plex framework [6]–[10] to the distributed two-tier architecture comprising local and edge controllers. The Simplex approach enables SMC to dynamically optimize control performance without sacrificing stability. Furthermore, as a tool to study edge computing for industrial control, it presents WCPS-EC, a real-time hybrid simulator that integrates (1) real computing platforms, (2) real or simulated wireless networks, and (3) simulated physical plants. The contributions of this work are five-fold. A Switching Multi-tier Control architecture that dynami- cally switches control between local and edge controllers in response to changes in network reliability;

Upload: others

Post on 06-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1

Exploring Edge Computing for Multi-Tier IndustrialControl

Yehan Ma, Chenyang Lu, Fellow, IEEE, Bruno Sinopoli, Senior Member, IEEE, and Shen Zeng, Member, IEEE

Abstract—Industrial automation traditionally relies on localcontrollers implemented on microcontrollers or programmablelogic controllers. With the emergence of edge computing, how-ever, industrial automation evolves into a distributed two-tiercomputing architecture comprising local controllers and edgeservers that communicate over wireless networks. Compared tolocal controllers, edge servers provide larger computing capacityat the cost of data loss over wireless networks. This paper presentsSwitching Multi-tier Control (SMC) to exploit edge computingfor industrial control. SMC dynamically optimizes control per-formance by switching between local and edge controllers inresponse to changing network conditions. SMC employs a data-driven approach to derive switching policies based on classifi-cation models trained based on simulations, while guaranteeingsystem stability based on an extended Simplex approach tailoredfor two-tier platforms. To evaluate the performance of industrialcontrol over edge computing platforms, we have developedWCPS-EC, a real-time hybrid simulator that integrates simulatedplants, real computing platforms, and real or simulated wirelessnetworks. In a case study of an industrial robotic control system,SMC significantly outperformed both a local controller and anedge controller in face of varying data loss in a wireless network.

Index Terms—Cyber-physical systems, edge computing, wire-less networked control system, machine learning.

I. INTRODUCTION

INDUSTRIAL automation is undergoing a significant trans-formation driven by the emergence of edge computing and

wireless networking technologies. As shown in Fig. 1, the newgeneration of industrial automation systems features a two-tier computing architecture comprising local and edge com-puting platforms. Traditionally, industrial automation relies onlocal controllers running on microcontrollers or programmablelogic controllers (PLC) that are often embedded in controlplants with wired connections to sensors and actuators. Edgecomputing platforms comprise edge servers located on in-dustrial premise. To lower deployment and maintenance cost,edge servers may communicate with control systems throughwireless technologies that are increasingly being adopted inindustry [1], [2]. Despite their flexibility, wireless networksmay suffer from data loss due to environmental factors suchas obstacles, noises, interferences, extreme weather, as well

Yehan Ma and Chenyang Lu are with Department of Computer Science &Engineering, Washington University in St. Louis, St. Louis, MO, 63130 USA.E-mail: yehan.ma, [email protected].

Bruno Sinopoli and Shen Zeng are with Department of Electrical & SystemsEngineering, Washington University in St. Louis, St. Louis, MO, 63130 USA.E-mail: bsinopoli, [email protected].

Manuscript received April 17, 2020; revised June 17, 2020; accepted July 6,2020. This article was presented in the International Conference on EmbeddedSoftware 2020 and appears as part of the ESWEEK-TCAD special issue.

as cyber attacks. The reduced reliability of wireless networkresults in degradation of the control performance. Therefore,compared with local controllers, edge servers provide the ad-vantages of computation capacity at the cost of communicationreliability and latency.

Th

e E

dge

wired

connection

wireless

network

Sensors/Actuators

(encoders, cameras,

motors, valves)

Local Controller

(SCM, PLC)

Edge Controller

(Edge Server,

NVDIA EGX)

Computation capacity

Communication latency

Network reliability

Fig. 1: Multi-tier control system (local and edge tiers).While edge computing has received significant attention in

industry, research on exploiting edge computing for industrialcontrol has remained limited. In this paper, we propose Switch-ing Multi-tier Control (SMC), a novel approach to optimizecontrol performance at run-time on a two-tier computingplatform. SMC dynamically switches control between localand edge platforms in response to changing network reliability.SMC has two salient features. First, in order to overcometheoretical challenges of analyzing control performance ofcomplex systems under network dynamics [3]–[5], it employsdata-driven approaches to derive control switching policies. Akey contribution of this approach is to formulate the platformselection problem as a data-driven classification problem,Optimal Platform Classifier (OPC), which can be solved usingmodels extracted from simulations. Second, it extends the Sim-plex framework [6]–[10] to the distributed two-tier architecturecomprising local and edge controllers. The Simplex approachenables SMC to dynamically optimize control performancewithout sacrificing stability. Furthermore, as a tool to studyedge computing for industrial control, it presents WCPS-EC,a real-time hybrid simulator that integrates (1) real computingplatforms, (2) real or simulated wireless networks, and (3)simulated physical plants. The contributions of this work arefive-fold.

• A Switching Multi-tier Control architecture that dynami-cally switches control between local and edge controllersin response to changes in network reliability;

Page 2: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2

• Data-driven approaches to derive switching policies formulti-tier control based on the classification modelslearned from simulations;

• A controller-switching design that guarantees system sta-bility by extending the Simplex approach to a multi-tierarchitecture;

• A real-time hybrid simulator (WCPS-EC) for studyingand evaluating control systems based on edge computing;

• A case study on industrial robotic control demonstratingthe advantage of SMC to optimize control performancewhile maintaining system stability.

The remainder of the article is organized as follows: Sec. IIproposes the SMC architecture and design. Sec. III introducesa control theoretic framework for enabling a systematic designof both a safety controller and a stability-ensuring switchingpolicy. Sec. IV presents the data-driven approaches to optimizecontrol performance based on physical states and networkreliability. Secs. V and VI describe the case study and perfor-mance evaluation of SMC for industrial robotic control. Sec.VII reviews related works and Sec. VIII concludes this paper.

II. SMC ARCHITECTURE

In industrial control, the states of physical plants (e.g., thejoint position of a robotic arm) are monitored and controlledby either a local controller or an edge controller. The edgecontrollers communicate with the sensors and actuators of thephysical plants through communication networks. However,to date wireless networks and edge computing face openchallenges for industrial control with stringent requirements:(1) control performance, which are related to factory revenue,and (2) system stability, which is related to safety of fac-tory operation. The system may face data loss over wirelessnetworks. For example, if a control command is lost, theactuator may not react to changes in the physical states intime. Therefore, our goal is to improve control performanceand guarantee stability in the presence of data loss over thewireless network.

This section presents an overview of the SMC architecture.The objective of SMC is to (1) dynamically optimize controlperformance and (2) guarantee system stability when thereliability of the wireless network changes dynamically. SMCexploits the two-tier platform by switching between the localcontroller and the edge controller in response to changingnetwork reliability.

A wireless network incurs both communication latency anddata loss. For digital control systems with periodic sampling,communication latency is acceptable as long as the packetmeets its deadline (associated with the sampling period). Whena packet misses its deadline, it is treated as a loss [11],[12]. In addition, industrial wireless network standards, suchas ISA100 [13] and WirelessHART [14], employ TDMAprotocols with predictable latency. In such systems, the impactof wireless networks on control is often manifested throughpacket losses. Therefore, we focus on addressing data loss inthis paper. Latency will be addressed in future work.

In this section, we introduce the SMC architecture and pro-vide an overview of its switching logic and mechanisms. The

design to guarantee system stability and optimize performancewill be detailed in Secs. III and IV, respectively.

A. System Model

SMC employs a pair of local/edge controllers for eachfeedback control loop. Compared to the local controller, theedge controller may run a more sophisticated control algorithmgiven the larger computational capacity of the edge server. Thelocal controller is connected to sensors and actuators througha wired network with no data loss. The local controller andthe edge controller communicate over wireless networks withvarying data loss. An example of two-tier robotic controlis shown in Fig. 2. At any time, only one controller isactive and controls the physical plant at a sampling periodTs. Furthermore, a switching agent is co-located with eachcontroller. The switching agent co-located with the activecontroller checks the trigger of stability status to determinewhether to switch to the safety controller, and checks thetrigger for performance optimization every coordination periodTc to determine whether to switch to the controller with theoptimal performance. If a switch is triggered, the control istransferred to the other controller along with the necessarystate information using a switching protocol.

We now introduce the variables representing the physicaland network states. x(k) denotes the state vector of thecontrolled system in the kth sampling period; xe(k) denotesthe state error vector defined as the difference between x(k)and its reference value. In our case study we use the meanabsolute error MAE := 1

n+1

∑nk=0 |xe(k)|, where n denotes

the number of samples in the coordination period, as a suitablemetric for control performance of a robotic system tracking areference trajectory.

The control performance of the edge controller depends onthe data loss over the wireless network. We explore two modelsto describe the data loss process: (1) i.i.d Bernoulli randomdistribution with packet loss ratio ρ and maximum loss boundη defined as the maximum number of consecutive packetloss, and (2) two-state Markov chain with the probabilitiesof transiting from/to the packet loss state to/from the packetreception state as α and β, respectively. While these modelsare relatively simple, we show in our evaluation that they areeffective in selecting the controller when used with data-drivenapproaches.

B. Switching Logic

The switching logic of SMC integrates two switching rules:(1) the Stability Switch for guaranteeing stability, and (2) theOptimal Platform Classifier (OPC) for selecting the optimalcontrol platform.

Edge Cloud

Edge

Controller

Local

Controller

Local

Controller

Sensor

Actuator

Actuator

Sensor

Wireless

Network

Fig. 2: The two-tier control system of robotic arms.

Page 3: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 3

d

RecoveryRegion (RR)

PerformanceRegion (PR)x

NetworkConditions(e.g.,𝜌 and 𝜂, or 𝛼 and 𝛽)

PhysicalConditions

(e.g.,𝑥

&)

EdgeControl

LocalControl

Optimal PlatformBoundary (OPB)

Fig. 3: Regions and boundaries of SMC.

The Simplex framework [6] supports high-performance anddependable control systems through the integration of an un-verifiable complex controller, a verified safety controller, anda switching logic. When the system is in danger of enteringan unrecoverable state, the switching logic makes the systemswitch to the safety controller. In this way, a control systemcan employ the complex controller while maintaining theguarantees of the safety controller. In the Simplex framework,the physical states satisfying all operational constraints, suchas physical state restrictions and limits of actuation, are definedas admissible states. The set of recoverable states is a subsetof the admissible states. If the given safety controller is usedfrom these states, all future states will remain admissible. TheStability Switch in SMC extends the Simplex framework toour two-tier architecture to guarantee stability.

SMC extends the Simplex framework [6]–[10] in three novelways. (1) While earlier research on Simplex usually assumedthat the safety controller and the performance controller areco-located on a same computational platform, SMC extendsthe approach to the distributed two-tier architecture comprisinglocal and edge controllers. (2) We introduce OPC, a new data-driven classifier to select the optimal controller at run time. (3)We integrate the Stability Switch and OPC in a coordinatedswitching logic that optimizes control performance withoutsacrificing stability.

Stability Switch. In SMC, the local controller serves as thesafety controller as it usually employs a simple control law andhas a reliable connection to sensors and actuator without dataloss. The edge controller serves as the performance controllerthat can afford a sophisticated control law and suffers time-varying data loss over the wireless network. As illustrated inthe left figure in Fig. 3, the physical system may operate inthe Recovery Region (RR) or the Performance Region (PR) interms of its physical states. RR is the region of recoverablestates that can be stabilized by the local controller. PR isthe subset of the RR where either the local or the edgecontroller may be active as selected by the OPC at run-time.The derivation of PR is presented in Sec. III.

Optimal Platform Classifier. When the system operatesin PR, the OPC is responsible for selecting between thelocal and edge controllers based on the network conditionsand physical states. The control performance of the localand edge controllers depends on both the current networkconditions and the physical states of the system. To overcomethe theoretical challenges of analyzing control performanceof complex systems under network dynamics [3]–[5], OPCemploys data-driven approaches to select controllers. The data-driven approach allows SMC to be adopted in a wide rangeof practical systems and networks. The inputs to the OPCinclude the physical states (e.g., xe) and network conditions.

The network conditions include the distribution and burstinessof data loss. e.g., ρ and η in the i.i.d. model, or α and βin the Markov chain model. The OPC may be trained basedon simulation or experimental results. The design of OPC isdetailed in Sec. IV.

Integration of Stability Switch and OPC. As illustratedin Fig. 3, when x ∈ PR, the OPC selects the controller thatoptimizes control performance based on network conditionsand physical states. When x /∈ PR, control is switched to thelocal controller to guarantee stability. By designing the localcontroller and PR with theoretical guarantees and by settingthe stability switch with higher priority, SMC can dynamicallyoptimize control performance without sacrificing stability.

C. Switching Protocol

We now present the switching protocol used by SMC toexecute the switching between local and edge controllersacross the two-tier platform.

In the two-tier architecture, each controller is co-locatedwith a switching agent. Each switching agent integrates theStability Switch and the OPC. The OPC monitors networkand physical conditions every Tc to achieve the platform thatoptimizes predicted control performance over next Tc. Themonitored network conditions depends on which loss modelthe OPC is trained on. If the OPC is trained on i.i.d loss model,it measures the packet loss rate (ρ) and the maximum numberof consecutive packet loss (η) over the last time window ofTc. If the OPC is trained on the Markov chain model, itmeasures the probabilities of transiting from/to the bad state(packet drop) to/from the good state (packet reception), α andβ, respectively, over the last time window of Tc.

At any time, only one controller is active along with itsswitching agent. The active controller operates its controlpolicy in each sampling period Ts to generate the actuationcommands. When multiple tasks share the local or edgeplatform, schedulability analysis can be performed for eachplatform to guarantee that all the tasks remain schedulablewhen controllers are activated. As shown in the finite statemachine (Fig. 4), the switching protocol works as follows.• When the local controller is active, the switching agent

is invoked every Tc, the prediction horizon of OPC. If (1)x ∈ PR and (2) the OPC selects the edge controller asthe optimal platform over the next prediction horizon Tc,the control switches to the edge controller. The switchingagent on the local platform sends a switching commandto the switching agent on the edge platform along withany state data, dl, needed by the edge controller. E.g.,depending on control design, dl may include previous x.

• When the edge controller is active, the switching agentchecks if x ∈ PR every Ts, and switches to localcontroller immediately when x /∈ PR. The switchingagent also invokes OPC every Tc. If the OPC selects thelocal controller as the optimal platform over the next Tc,the control switches to the local controller. The switchingagent on the edge platform sends a switching commandto the switching agent on the local platform along withany state data, de, needed by the local controller.

Page 4: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 4

Send de over wireless network;Reserve a coordination data flow every Tc fornetwork conditions between edge and local;

x∈ PR AND OPC à Edge

x ∉ PR OR(x∈PRAND OPC à Local)

Local

x∈ PR AND OPC à Edge

Send dl over wireless network

Edge

x ∉ PR OR OPC à Local

Local ControllerOperate local control policy every Ts;

Switching AgentCheck Stability Switch

Condition(if x∈PR) every Tc;

Operate OPC every Tc

Edge ControllerOperate edge control policy every Ts;

Switching AgentCheck Stability Switch

Condition(if x∈PR) every Ts;

Operate OPC every Tc

Fig. 4: Finite state machine for SMC. The arrows representplatform switches, and the switching conditions are above thearrows. The main actions taken to switch the computationplatform are below the horizontal arrows.

Given the criticality of the switching command, the switch-ing agents rely on retransmissions and acknowledgementsto ensure reliable communication. Furthermore, to ensure ahigher level of assurance, the local controller may remainactive to monitor the physical states and take over controlwhen x /∈ PR. The local controller may also monitor thenetwork condition using the control commands from the edgecontroller as heartbeat messages. When it has not received anactuation command from the edge controller within a timeoutthreshold, the local controller can take over control. Thisapproach enhances the dependability of SMC as it can switchto the local controller without the switching commands fromthe edge switching agent, e.g., when the network between edgeand local is completely jammed. The actuators can be designedto allow override by the local controller if it receives controlcommands from both controllers.

III. LOCAL CONTROLLER AND PERFORMANCE REGION

The key problem of a Stability Switch boils down todeveloping the local controller and the PR. We consider thesystem dynamics approximated by a linear time-invariant (LTI)model with linear constraints, since a wide variety of systemscan be represented with satisfactory accuracy by such LTImodel

x = Ax +Bu (1a)

aTi x ≤ 1, i = 1, 2, . . . , q (1b)

bTj u ≤ 1, j = 1, 2, . . . , r, (1c)where (1b) are physical restrictions and (1c) are limits ofactuators. The local controller is a state feedback controller,u = Kx. We next formulate a convex optimization problemto establish both the safety feedback gain, K, and RR

S = x|xTPx ≤ 1, (2)by applying the Lyapunov stability theory and linear matrix in-equality (LMI) methodologies [7]–[10], where P is a positive-definite matrix. P defines a Lyapunov function, xTPx, whichis positive-definite with a negative-definite derivative, thusguaranteeing the stability of the linear system.

We establish the local controller (K) and S (Q = P−1)jointly by applying the method presented in [10]. We solvean optimization problem over feasible K and Q, subject to

LMI constraints, such that the resulting closed-loop system isasymptotically stable and RR is maximized. Since the volumeof RR given by (2) is proportional to

√det(Q−1), maximizing

RR is equivalent to minimizing the determinant det(Q−1). TheLMI problem can be formulated as

minimizeQ,Z

log det(Q−1) (3a)

subject to Q > 0; (3b)

QAT +AQ+ ZTBT +BZ < 0; (3c)

aTi Qai ≤ 1, i = 1, 2, . . . , q; (3d)[1 bTj Z

ZT bj Q

]≥ 0, j = 1, 2, . . . , r. (3e)

Applying the change of variable, we obtain K = ZQ−1

and P = Q−1. Constraints (3b) and (3c) are the stability(Lyapunov equation) constraints, (3e) is an LMI constraintconverted from (1c). Since multi-tier control systems areimplemented on digital platforms, we discretize (1a) withsampling period Ts. We consider the worst-case latency of Ts,since in our case (Table II, joint position control) the worst-case end-to-end (E2E) latency of local safety controller is 6.19ms which is shorter than Ts = 50 ms, which results in theconsideration of

x(k + 1) = Adx(k) +Bdu(k − 1). (4)Considering the state feedback control u(k) = Kx(k), anddefining a new state vector z(k) =

[xT (k) xT (k − 1)

]T,

the closed-loop form of the augmented system of z(k) isz(k + 1) = Adz(k), (5)

whereAd =

[Ad BdK

IN×N 0N×N

]. (6)

We solve Q = P−1 by [7]:maximize

Qlog det(Q) (7a)

subject to Q > 0; (7b)

QATd − A−1

d Q < 0; (7c)

αTmQαm ≤ 1,m = 1, 2, . . . , g, (7d)

where αm = [αTm, 01×N ]T and αm = [01×N , α

Tm]T are the

constraints corresponding to x(k) and x(k − 1), respectively.αTm = aTm, for m = 1, ..., q; αT

m = bTj K, for m = q+ 1, ..., g,j = 1, ..., r, and g = q + r. The corresponding RR is

S = z(k)|z(k)T Pz(k) < 1. (8)

Remark 3.1: The stability condition of the two-tier controlsystem is that x should stay in RR. Thus control should beswitched to the local controller before x leaves RR. SMC isdesigned to switch among two platforms in a distributed way.Hence, the switching latency is unavoidable. It may resultin x out of RR. In order to guarantee stability, a smallerregion PR should be calculated. A practical solution wouldbe to choose a smaller ellipsoid, e.g., an ellipsoid defined byxTPx = 0.7 [6]. The distance d in Fig. 3 can be derivedtheoretically through the searching of control policies thatdrive x within RR to the boundary of RR in the shortest time,which can be formulated as a minimum-time optimal controlproblem and solved by bang-bang control. If the resultingshortest time is shorter than the sum of Ts and switching

Page 5: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 5

-5

0

0

5

00.5 0.51 1-2

-1

0

1

2

3

4

(a) Data collected for edge control

0

0.50

-5

0.5

0

11

5

LocalEdge

(b) Optimal platform labelingFig. 5: Data collection, labeling, and classifying (Markov loss when Tc = 15 s, the color bar of (a) indicates log-scale MAE).

latency, x belongs to PR.Remark 3.2: When switching occurs, the actuation com-

mands may change in a non-smooth (discrete) fashion. Thediscrete changes in the actuation commands will not affectstability since the stability switch is rigorously derived forthe system Eq. (4), and switches do not happen frequently(multiples of Tc). In addition, there exist switching systemdesigns that deal with discontinuous design space [15], whichare not the focus of this article.

IV. OPTIMAL PLATFORM CLASSIFIER

OPC employs data-driven approaches to select controllerstrained by simulations. Invoked every Tc, OPC measures thenetwork conditions and physical states and selects the con-troller that is predicted to optimize control performance overthe prediction horizon Tc. Our data-riven approach is inspiredby simulation-guided certificate construction [16], [17], but weapply the data-driven approaches for different purposes andscenarios. Unlike certificate construction, the false-positive ofwhich is extremely dangerous to a control system, the OPCis designed to improve the control performance within thestablizable PR, the misclassification of which does not affectsystem stability in SMC.

A. Cyber-Physical Inputs

There exist different approaches to model data loss in thecontext of networked control systems: (1) modeling loss asstochastic processes [18], e.g., i.i.d Bernoulli random distribu-tion [19], [20] and two-state Markov chain [21]; (2) modelingloss as weakly hard real-time constraints [22], e.g., (m, k)model [11], [23]; (3) modeling loss with consecutive lossbound [24], [25]. The (m, k) model is commonly used inreal-time scheduling for computing tasks and is not suitablefor the probabilistic nature of wireless communication. Hence,we explore two stochastic models for data loss including i.i.dprocess with consecutive loss bounds, and two-state Markovchain. The i.i.d loss model is characterized by (1) the lossratio ρ and (2) the maximum number of consecutive packetloss count η. The two-state Markov chain loss (Gilbert-Elliott)model is characterized by (1) the probability of transiting fromthe packet loss state to the reception state, α, and (2) theprobability of transiting from the reception state to the loss

state, β. Our experiments show that the OPC trained by eithermodel works for realistic wireless traces. In addition to thenetwork conditions, OPC also takes the physical state errors,xe, as inputs. OPC therefore selects controllers based on bothcyber (network) and physical states.

B. Data Collection

To train and test OPC, we conduct simulations of local andedge control by sampling the input values over their feasibleranges. Since that OPC is invoked every Tc at run-time, thesimulation time for data collection is Tc, such that SMC canoptimize the control performance over the prediction horizonof Tc. We will study and discuss the choice of Tc in Sec. VI.Our simulations are performed for a robotic case study inWCPS-EC, a real-time hybrid simulator (see Secs. V-C andVI for details).

We run simulations to collect data for i.i.d loss model andtwo-state Markov chain loss model, respectively. The simula-tions incorporates communication latency corresponding to themaximum latency of the local and edge platforms measuredexperimentally (see Sec. VI-A). Fig. 5a shows data collectionand processing examples for cases of two-state Markov chainloss model when Tc = 15 s (other experimental settings ofthis group of experiments are the same as in Sec. VI-A2). Thethree axes are OPC inputs, i.e., α, β, and initial xe. Each datapoint indicates one round of simulation for 15 s. The color barindicates the log-scale MAE. We can see that when xe andβ are low, and α is high, the edge control has smaller MAE.By comparing MAEs of edge control and local control, welabel each data point with the optimal platform that achievessmaller MAE, as shown in Fig. 5b.

Binary classifier:OPC

Initial 𝑥"𝛼 over 𝑇%𝛽 over 𝑇%

Local control

Edge control

Fig. 6: Optimal Platform Classifier (with Markov data loss).

C. Classifier: Training and Testing

We then train classifiers to select the local or edge platform(as shown in Fig. 6). We explore two classification models,threshold-based classification and support vector machines(SVM). The goal of a classifier is to identify the Optimal

Page 6: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 6

(a) Threshold-based Classifier (Grid) (b) SVMFig. 7: OPBs derived by threshold-based classifier and SVM, and testing results (Markov loss, Tc = 15 s).

Platform Boundary (OPB), i.e., the boundary that separatesthe regions in which the local and edge platforms achievebetter control performance, respectively. While simple to trainand efficient at run-time, a threshold-based classifier may notbe able to achieve a close approximation of the OPB which isoften non-linear in practical systems. In comparison, SVM isa well established and powerful method to efficiently establishthe separation boundary of arbitrary data sets, even whenthe training data is not linearly separable [26]. We train andcompare both classifiers in this work.

For the threshold-based classifier, we use grid search to findthe set of thresholds that minimize the classification error onthe training datasets. For example, for the two-state Markovchain loss model, we exhaustively sweep all feasible values ofα, β and xe at certain step sizes to find the set of parametersthat leads to the minimum classification error. We tuned thestep sizes to balance the search time and classification error.The step sizes used to generate the thresholds are 0.067 for αand β, and 0.53 for xe. For SVM, we use Radial basis kernelfunction, proper box constraint and kernel scale to minimizethe 10-fold cross validation loss on the training datasets.

We trained and tested the classifiers on our training andtesting datasets, respectively. The OPBs established by thethreshold-based (Grid) classifier and SVM are shown as thetwo shaded surfaces in Fig. 7, respectively. The regions to theleft and right of the boundaries are classified as edge and localcontrol, respectively. As expected, when xe and β are low, andα is high, OPC will choose edge control, and local controlotherwise. Notably, SVM establishes a non-linear boundaryseparating edge control and local control, while the threshold-based classifier identifies a cube-shaped boundary.

The testing results of SVM are shown in Fig. 7 (mis-classified points are circled in red). We observe that themiss classifications happen near OPB, where local and edgecontrollers have similar control performance. Fig. 8 shows theempirical cumulative distributions of MAE difference of localand edge control in their correct classification sets and in theirmiss classification sets, respectively. The results confirms thatmisclassifications have moderate impacts on control perfor-mance for both classifiers. SVM outperforms threshold-based(Grid) classification in the miss classification set since SVMestablishes a more accurate boundary. We train two sets of

OPCs based on i.i.d loss model and two-state Markov chainloss model to show the generality of the data-driven approach.Both OPCs work for realistic wireless traces, which will bediscussed in Sec. VI-B2.

0 1 2 3 4 5 6MAE differences between local and edge (rad)

0

0.5

1

P(M

AE d

iffer

ence

s)Correct SVMCorrect GridMiss SVMMiss Grid

Fig. 8: CDF of MAE difference for local and edge controlin the correct classification sets and miss classification sets,respectively.

Remark 4.1: Due to the generality of the data-driven ap-proach, the OPC can be trained for other metrics of controlperformance. For example, to optimize the settling time of acontrol system, the OPC can be trained to select the optimalcontrol platform with the shortest settling time. Furthermore,to minimize settling time, the local and edge controllerscan be designed to maximize the convergence speed withproper closed-loop eigenvalues [27] or learning-based ap-proaches [28]. For our robotic case study, as we considera linear state-space model for the plant, we can design thestabilizing controller to result in a certain convergence rate ofthe closed-loop system, which is readily given in terms of theeigenvalues of the closed-loop system matrix.

V. CASE STUDY DESIGN FOR MULTI-TIER CONTROL

Fig. 2 shows a case for multi-tier control, where single ormultiple robotic arms are in a workshop of a factory. Wewill utilize this case study to explore the multi-tier controlsystem in the rest of the paper. The objectives are to controlthe velocity and position of the independent joints.

In order to evaluate multi-tier control and SMC, we built theWireless Cyber Physical Simulator-Edge Computing (WCPS-EC), a real-time hybrid simulator that integrates real multi-tier computation platforms, real or simulated wire/wirelessnetworks, and physical plants simulated in Simulink.

Page 7: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 7

A. Physical Plants

A robotic arm comprises a chain of joints and rigid links.The motion of the end-effector is the composition of themotion of each link, and the links are ultimately moved byforces and torques exerted on the joints.

The most common structure for joint control is the nestedcontrol loop. The outer loop is responsible for maintainingposition and determines the velocity of the joint that willminimize position error. The inner loop is responsible formaintaining the velocity of the joint as demanded by the outerloop. The motor drive assembly comprises a motor to generatetorque, a gearbox to amplify the torque and reduce the effectsof the load, and an encoder to provide feedback about positionand velocity. We can write the torque balance on the motorshaft as

KmKau−B′ω − τ ′C(ω)− τd(q)

G= J ′ω, (9)

where Km is the motor torque constant, Ka is the transcon-ductance of the amplifier, q is the joint coordinates, w is thejoint velocity, and u is the control voltage. B′, τ ′C and J ′

are respectively the effective total viscous friction, Coulombfriction, and inertia due to the motor, gearbox, bearings and theload. τd(q) is the disturbance torque due to the link motion. Inorder to analyze the dynamics of (9), we linearize it by settingall additive constants to zero:

J ′ω +B′ω = KmKau, (10)which admits the transfer function description

Ω(s)

U(s)=

KmKa

J ′s+B′. (11)

The outer position loop provides the velocity demand for theinner velocity loop.

We modify the open-source Robotics ToolBox [29] tosimulate a PUMA560 robotic arm in a multi-rate and real-timefashion. We refer readers to [29] for the underlying kinematicsand dynamics of the robotic arm and the guide for the toolbox,and to [30], [31] for the parameters of the PUMA560.

B. System Settings of Local and Edge Platforms

The architecture of our case studies for exploring the multi-tier control system was shown in Fig. 2. The settings of thecomputation platforms and communication approaches that weuse are described in Table I.

Tiers Local Edge

ComputationRaspberry Pi 3 (4

ARMv7 CPUs@ 900MHz, 1G RAM)

Intel Server (4 IntelCore i5-4590 [email protected] GHz, 16 G RAM)

Communication I/O + Ethernet cable I/O + Wi-Fi +University network

TABLE I: System settings of the case studies

C. Wireless Cyber Physical Simulator-Edge Computing

It is challenging to conduct experiments on industrial con-trol systems in the field, especially under cyber and physicaldisturbances. We built a real-time hybrid simulator, WCPS-EC, which integrates (1) Real controllers running on variouscomputation platforms; (2) Real Wi-Fi network and Ethernet,or simulated network using TOSSIM; (3) Simulink DesktopReal-time (SLDRT), which simulates robotic arm in real-time.

ty ˆtu

ˆty

ActuatorsPlantSensors

Local/Edge Platform

Wired/Wireless Network

Simulink Desktop Real-Time Plant Side

Controller Side

Controller tu

Interfacing BlockSensor Client Control Server

File Interface Commands FileSensor Data File

Interfacing BlockSensor Server Control Client

20 Hz

20 Hz 500 Hz 500 Hz

Fig. 9: Architecture of WCPS-EC.

The architecture of WCPS-EC is shown in Fig. 9. Com-pared with RT-WCPS [32], WCPS-EC includes immigratedcontrollers running on various computation platforms insteadof controllers running in MATLAB/Simulink. In addition,WCPS-EC can reflect the impacts of real communicationnetwork and computation platform during run-time.

We implement the control policies in Python and C [33]so that the policies can run on any platforms that supportPython or C. Based on our measurements, while the controllerwritten in C has a slightly shorter execution time than thePython implementation, the difference is not significant as thecomputation time is dominated by solving the same quadraticprogramming problem. In RT-WCPS, the worst-case E2Elatency is below Ts. Hence the actuation commands are setto have fixed latency of Ts. However, in multi-tier controlsystems, the E2E latency can be longer than Ts. In addition,we consider a more realistic control system setup [4], [8], [34]with (1) clock-driven sensors that sample the plant outputsperiodically every Ts; (2) an event-driven controller whichcalculates the actuation commands as soon as the sensor dataarrives; and (3) event-driven actuators, which means actuatorscan respond to updated actuation commands immediately.

As shown in Fig. 9, Sensors sample periodically (e.g. 20Hz), and write the new measurements to a Sensor Data File.The Sensor Client, located on the host of SLDRT, discoversthe sensing update immediately and sends it to the SensorServer via a wired or wireless networks. The event-drivenController starts operation once Sensor Server receives thesensing measurements, and sends the control commands toControl Server, which writes the control commands to theCommands File. The actuator in SLDRT checks the Com-mands File with high frequency (e.g. 500 Hz) and updatesthe actuation once it discovers an update of control com-mands. The intermediate File Interface is introduced betweenSimulink and the computation platform. In this way, Simulinkand the computation platform are asynchronous, which isessential for the sequential execution of the Simulink loop.

VI. CASE STUDY EVALUATION

We evaluate the multi-tier control and SMC in the case studyintroduced in Sec. V. We explore control systems with staticlocal and edge controllers. Two cases, i.e., multi-tier control

Page 8: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 8

of joint velocity and of joint position, are evaluated from theaspect of control performance. Next, we evaluate the SMC thatoptimizes performance and guarantees system stability.

A. Evaluation of Static Cases

When network is in normal condition, Table II summarizesthe latencies of different tiers of control with the systemsettings described in Table I. We choose proportional-integral-derivative (PID) and model predictive control (MPC) as al-ternative control policies for independent joint velocity andposition control. As shown in Table II, since PID is morecomputationally lightweight than MPC, the computationallatency of PID is shorter than that of MPC over both localand edge platforms. Since the edge platform is equipped withmore computation capacity than the local platform, it finishesPID, MPC within 0.05 ms and 4.5 ms, respectively, while thelocal platform requires 2.8 ms and 61.7 ms, respectively. Onthe other hand, the communication latency between the localplatform and the plant is around 5 ms, which is much shorterthan ∼ 30 ms of the edge platform. In summary, edge canrun MPC much faster than local due to higher computationcapacity at the cost of longer communication latency.

Tiers Controlpolicy

Communicationlatency (ms)

Computationlatency (ms)

E2E latency(ms)

Local

PID (3.89, 5.28) (2.00, 2.79) (5.91, 6.19)

MPC ————- (51.49, 61.65)(infeasible) ————-

Edge

PID (22.17, 34.40) (0.04, 0.05) (23.44, 35.26)

MPC (20.80, 34.80) (3.69, 4.44) (25.29, 38.08)TABLE II: Latency measurements of each tier (in the formatof median and worst-case latency pair)

1) Case 1 - joint velocity control of a PUMA560: Thephysical plants described by (10) are discretized and simulatedat 1000 Hz to mimic a continuous system. The sampling rateof the sensors is 200 Hz, a reasonable rate for the system timeconstant of joint velocity control. The settings for communica-tion between sensors/actuators and the computation platformare shown in Table I. The communication latency of the casestudies is shown in Table II.

The response curves of the joint velocity control are shownin Fig. 10. The reference value of the physical state w isw∗. For joint velocity control that has a short time constant(high sensitivity to latency), the local controller, which has theshortest latency, performs the best. We ran experiments for 30times. The statistical results for MAE are shown in Fig. 12a.PID over local has the best control performance, which isconsistent with the observation in Fig. 10.

2) Case 2 - joint position control of a PUMA560: Thesettings for the joint position control are mostly the same asfor the joint velocity control described in Sec. VI-A1, exceptthat the sampling rate is 20 Hz since the time constant ismuch longer than that of velocity control. We control the jointposition (θ) to track a sinewave signal (θ∗), which is commonin position control of robots [35]. The response curves andthe statistical results of the joint position control are shown

-50

0

50

100

(a)

(rad

/s)

*

0 0.05 0.1 0.15 0.2Time (s)

-10

0

10

20

(b) u

(a) PID over local

-50

0

50

100

(a)

(rad

/s)

*

0 0.05 0.1 0.15 0.2Time (s)

-10

0

10

20

(b) u

(b) MPC over edgeFig. 10: Response curves of the joint velocity control.

-5

0

5

(a)

(rad

) *

0 2 4 6 8Time (s)

-1000

0

1000

(b) u

(a) PID over local

-5

0

5

(a)

(rad

) *

0 2 4 6 8Time (s)

-1000

0

1000

(b) u

(b) MPC over edgeFig. 11: Response curves of the joint position control.

Local Edge0

10

20

30

40

50

MAE

(rad

/s)

(a) Joint velocity control

Local Edge

0.1

0.2

0.3

0.4

0.5

0.6

0.7

MAE

(rad

)

(b) Joint position controlFig. 12: Performance of the joint velocity and position control.

in Fig. 11 and Fig. 12b, respectively. The edge controllerperforms the best, since the gain of MPC on edge overcomesthe affects of extra communication latency.

In summary, for joint velocity control, PID over local hasthe best performance, since it is sensitive to latency. For jointposition control, MPC over edge has the best performance,since MPC performs better than PID, which overcomes theaffects of longer communication latency. The control perfor-mances of control tiers are determined by the properties ofthe control policies, network, and the physical plants. Theadvantage of the two-tier computing architecture is enabledby the flexibility in controller placements and the choice ofcorresponding control policies tailored for physical plants.

Note that MPC controller over local is infeasible due tothe computational resource shortage on edge: the computationlatency is longer than the sampling period. For instance, inTable II, the maximum computational latency of MPC overlocal is 61.7 ms, which makes it infeasible for a samplingperiod of 50 ms. In this section, the network conditions arenormal. The performance of the closed-loop control systemmay deteriorate due to unreliable wireless network connectionbetween edge controller and the plant [4]. In next section, wewill focus on evaluating SMC facing changing data loss.

B. Evaluation of SMC

1) OPC training and testing: We run 26,000 simulationsto collect around 40 GB data for i.i.d loss model and Markov

Page 9: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 9

Loss ModelApproach Accuracy Tc = 5 s Tc = 10 s Tc = 15 s Tc = 20 s

i.i.d Grid Training 95.46% 89.60% 88.02% 89.68%Testing 94.80% 89.68% 87.74% 89.71%

i.i.d SVM Training 96.95% 91.89% 91.26% 92.62%Testing 96.83% 91.72% 90.25% 92.48%

Markov Grid Training 96.02% 90.58% 86.91% 85.48%Testing 95.31% 90.08% 86.68% 85.69%

Markov SVM Training 96.82% 92.17% 91.72% 91.63%Testing 96.94% 93.11% 90.98% 91.78%

TABLE III: Accuracy of threshold-based (Grid) and SVMclassifiers. (The training accuracies of Grid and SVM are fromthe grid search and 10-fold cross validation, respectively, onthe training datasets. The testing accuracies are measured onthe testing datasets.)

chain loss model, respectively. We set the worst-case end-to-end latency of local control and edge control to 7 ms and40 ms, respectively, based on the measurements in Table. II.For each data loss model (i.i.d or Markov), we randomlychoose 6,500 simulations as the training set, and the other6,500 as testing set. The accuracies of training and testingare summarized in Table III. SVM consistently achievedhigher accuracies than threshold-based (Grid) classification.The average time taken by one SVM classification is 0.67 msand 10.27 ms on edge and local platforms (over 20,000 testingcases), respectively, which suggests that online classificationis practical even on a local platform. The average time it takesto train the SVM classifier is 26.62 s on the edge server (over100 training runs). It is a reasonable amount of time sincetraining is done offline.

2) Performance optimization: In order to evaluate SMC,we consider joint position control with the same settings asdescribed in Sec. VI-A2, except that we simulate data lossbased on traces generated by TOSSIM [36], [37], which isa high-fidelity wireless simulator. Received signal strengthindicator (RSSI) data have been collected over real wirelesstestbed. In addition, we use controlled background noisestrength to simulate various network conditions between edgeand local as shown in Fig. 13a. Both the RSSI and controllednoise strength are fed into TOSSIM to derive realistic packetloss traces. The OPC (SVM trained for i.i.d loss model andTc = 10s) monitors ρ and η during a sliding window ofthe past Tc (10s), as shown in Figs. 13b and 13c. The run-time position error inputs of OPC are shown in Fig. 13d. Theoptimal platform derived by OPC is shown in Fig. 13e. Wecan see that edge control is the best choice when noise levelis low, and the control should be switched to local when noiselevel is high, since the gain of the edge control is offset bythe effects of data loss.

With a coordination period Tc of 10s, we observe a 10s-delay of OPC in reacting to the changes of noise level. Fig. 14acompares the control performance of SMC with various Tc,fixed local control, and fixed edge control over 30 rounds ofsimulations. When Tc is properly chosen, i.e., 10s–15s, ourSMC provides over 30% and 40% MAE reduction over fixedlocal control and edge control, respectively. While the OPCis trained based on the i.i.d loss model, it works for morerealistic loss traces generated by TOSSIM as well.

-85-80-75

(a)

Noi

seLe

vel (

dBm

)

0

0.5

1

(b)

0

10

20

(c)

-1

0

1

0 50 100 150 200 250 300Time (s)

Edge

Local

(e)

Opt

imal

Plat

form

Fig. 13: System dynamics under SMC. (Tc = 10s, i.i.d dataloss)

SMC (Tc:5s) SMC (10s) SMC (15s) SMC (20s) Local Edge

0.4

0.6

0.8

1

1.2

MAE

(rad

)

(a) Under changing noise as in Fig. 13(a)

0 50 100 150 200 250 300Time (s)

-85

-80

-75

Noi

seLe

vel (

dBm

)

(b) Frequently changing noise

SMC (Tc:5s) SMC (10s) SMC (15s) SMC (20s) Local Edge

0.4

0.5

0.6

0.7

0.8

0.9

MAE

(rad

)

(c) Under frequently changing noise as in (b)Fig. 14: Control performance under SMC, fixed local control,and fixed edge control.

To study the impact of Tc, we run experiments under morefrequently changing noise (Fig. 14b). As shown in Fig. 14c,SMC achieves the best performance with Tc = 10s. When Tcis too short, the OPC is trained based on the data of transientphysical states without taking steady states into consideration.On the other hand, when Tc is too long, the OPC cannot reactto frequently changing network conditions in time. We willexplore adaptive Tc to balance the above factors in the future.

In Fig. 15, we compare the performance of the OPC trainedbased on the i.i.d loss model and that based on the Markovchain loss model. The OPC performs similarly when trainedwith different data loss models, which suggests the general-ity and robustness of the data-driven approach for selectingoptimal platforms.

In Fig. 16, we compare the control performance un-der learning-based (SVM) and threshold-based (Grid) OPC.Thanks to its higher classification accuracy, SVM generallyoutperforms Grid. Furthermore, SVM is a more general andprecise approach that may lead to more significant improve-

Page 10: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 10

5 10 15 20Tc (s)

0.2

0.4

0.6

0.8

MAE

(rad

)i.i.dMarkov

(a) Changing noise as in Fig. 13a

5 10 15 20Tc (s)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

MAE

(rad

)

i.i.dMarkov

(b) Changing noise as in Fig. 14bFig. 15: Performance comparison of OPC based on the i.i.dloss model and the Markov loss model

SVM Grid(5s) SVM Grid(10s) SVM Grid(15s) SVM Grid(20s) Local Edge

0.4

0.5

0.6

0.7

0.8

0.9

MAE

(rad

)

Fig. 16: Performance comparison of learning-based (SVM)and threshold-based (Grid) OPC (with Markov loss model,under noise as in Fig. 14b)

ment when the cost of misclassification is higher under othersettings, which we will evaluate in future work.

3) Stability analysis: We model the joint position controlby treating the inner velocity loop as constant, since theresponse of the velocity loop is much faster than that of theposition loop, as shown in Figs. 10 and 11,

θ(t) = −KgKpθ(t). (12)We define θ(t) = θ(t)− θ∗(t), and the proportional controlleru(t) = −Kpθ(t). To achieve the equilibrium point at theorigin, we transform the coordinates of (12)

˙θ(t) = −KgKpθ(t) = Kgu(t).

Solving (3), among multiple solutions, we choose Kp = 800.Then we model the discrete system with one sampling

period latency (Ts) using the Euler method:

z(k+ 1) =

[1 −TsKgKp

1 0

]z(k), where z(k) =

[θ(k)

θ(k − 1)

].

We solve (7) by using YALMIP and the Multi-ParametricToolbox (MPT) with Semidefinite Programming Solver(SDPT3) solver in MATLAB:

⇒ Q =

[2.837 3.3823.382 7.834

]⇒ P =

[0.732 −0.316−0.316 0.264

].

The corresponding RR isS = z(k)|z(k)T P z(k) < 1.

We confirm that during the dynamic switch process, all physi-cal states are within this RR. Therefore, stability is guaranteed,and no Stability Switch is needed.

VII. RELATED WORK

The cloud computation platforms proposed and commercial-ized in last two decades for large scale data centers are famousfor their high computing ability, voluminous data storage, andlow cost [38]–[40]. Cloud-based IIOT provides a centralizedsolution for statistical data analysis of increased amounts oftasks and data. Many mainstream cloud computing vendors,such as Amazon, Google, and IBM, have offered services

in various domains, such as data management and analysis,and intelligent transportation systems. However, for industrialcontrol systems, a cloud-based computation platform suffersfrom long and fluctuating latency to end devices, because it isup to hundreds of miles away from the end devices [41].

In the recent decade, the increase of IoT devices at the edgeof the network and the evolution of hardware with increasedcomputational capability have spawned the edge computa-tion platform. Compared with the cloud-based platform, theedge platform benefits from the localization of computationresources [42]. The ability to provide local data aggregationand processing not only reduces the bandwidth demand onnetwork links to the cloud [43], but also meets the latencyrequirements of applications [44]. Therefore, although only inits early stage, edge computing is already widely deployedamong applications with low-latency requirements, such asvehicles [45], mobile gaming [46], and health monitoring [47];applications dealing with large amounts of data, such asvideo [48] and large scale sensor networks [49]; and appli-cations with geographical distribution [50] and mobility [51],such as mining, smart grid, transportation, waste management,and agriculture [52]. However, few edge computing technolo-gies have been applied to industrial control systems thus far.

In a conventional industrial control system, the controlalgorithm runs on a local controller, which is usually deployedphysically near the sensors and actuators and implementedwith a SCM or PLC [53] with reliable wired connection butlimited computational ability. In recent years, the physicalplants tend to be controlled by a remote controller for theadvantages of (1) more conductive environments for hardwareimplementation of control algorithms, (2) centralized controland coordination of multiple plants. Due to the networkconnection between the remote controller and the physicalplant, remote control systems are regarded as NCSs [1], [2].

We note that there are remote controller designs whichguarantee stability in a mean square sense in the presenceof data loss. The design is based on stochastic Lyapunovfunctions [54], [55], as well as certain assumptions about thecommunication network. However, the unpredictable wirelessconditions mean that the assumptions may not be guaranteed,leading to unsafe physical plant operations with deterioratedperformance.

Lin et al. and Liberzon et al. summarize switching controlapproaches that maintain stability [56], [57]. Dai et al. [58]establish an optimization framework for switching samplingperiods that maximizes the control performance and CPU re-source efficiency. However, those work do not consider controlover wireless networks and multi-tier computing platforms.The Simplex framework ensures the safe use of an unverifiablecomplex controller by using a verified safety controller and aswitching logic. In the conventional Simplex framework [6],the complex controller and safety controller operate in par-allel. ORTEGA [7] enhances the efficiency and flexibility ofSimplex by eliminating the redundant execution of controllers.NetSimplex [8] extends the Simplex from a single node controlsystem to a networked control system (NCS). Bak et al. [9]derive the switching logic by combining offline linear matrixinequality (LMI) results and online reachability computation,

Page 11: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 11

which significantly reduces conservatism. In all above cases,both the complex controller and safety controller are runningon the same computation platform. Whereas most previousworks consider single machine cases, we extend the Simplexframework by considering (1) distributed two-tier architecturecomprising local and edge controllers, (2) new data-drivenOPC for control performance optimization, (3) a coordinatedswitching logic which integrates stability switch and OPC.

Ma et al. [5] design the stability switch between local andedge controllers under data loss from another perspective,based on co-design of edge and local controllers that are de-signed via a joint Lyapunov function. The co-design approachis non-trivial and specific to the control policy. In comparison,by extending the Simplex framework to the distributed two-tierarchitecture, our SMC approach can be applied to any controlpolicies on edge without sacrificing stability.

Data-driven approaches have been applied to control de-sign [59] and safety verification of control systems [17] toovercome the restrictions of analytical modeling. We developour data-driven approach for a different purpose and systemarchitecture. The data-driven OPC is designed to improve thecontrol performance by selecting the optimal control platformin a distributed two-tier computing architecture.

In [60], the authors presented a 5G-based edge cloudcomputing test bed for a multi-tier control system and eval-uated the system while operating a mission-critical controlapplication of a MPC controlled ball and beam process.However, systematic studies, such as how to co-design propercomputation platforms and control policies, and how to adjustto uncertainties and maintain closed-loop stability, remain tobe performed. In [32], [61], the authors propose the conceptof holistic control that co-joins network management andphysical control at run time, where the focus is the remotecontrol. In this paper, we focus on a multi-tier industrialcontrol system with a comprehensive view of the propertiesof the edge/cloud controller, network conditions, and physicalplants. We also propose a controller dynamic switch amongmulti-tier computation platforms which not only optimizes thecontrol performance at run-time but also guarantees systemstability under various network conditions.

Simulation tools are of vital importance to studymulti-tier control. NCS simulators [36], [62], [63] areMATLAB/Simulink-based tools, which enables simulations ofCPU scheduling, communication and control by integratingwireless simulators. Given simulators cannot always capturethe real-world wireless network dynamics, network-in-the-loopsimulations have recently been developed [32], [64]. In [60],the authors presented a 5G-based edge cloud computing testbed for a multi-tier control system. However, the real physicalplants used in its experiments are limited to lab settings. Webuild WCPS-EC, which integrates real multi-tier computationplatforms and wireless networks, and leverages simulationsupport for various physical plants.

VIII. CONCLUSIONS

With the emergence of edge computing and wireless net-work technologies, controllers located in various computa-

tional tiers have different computational capacities and com-munication reliability, which influence the performance andstability of industrial control systems. We presented a Switch-ing Multi-tier Control (SMC) architecture for industrial controlsystems. SMC dynamically switches between local and edgecontrol based on plant states and network reliability. It employsa switching agent that integrates (1) a data-driven OptimalPerformance Classifier for selecting the controller platformwith optimal performance, and (2) a Stability Switch for guar-anteeing system stability. SMC effectively reaps the benefitsof switching among different computation tiers. In hybridsimulations on WCPS-EC, SMC achieved over 30% and 40%reductions in mean absolute errors when compared with fixedlocal and edge control, respectively, while maintaining stabilityguarantees under changing network reliability.

ACKNOWLEDGMENT

This work was sponsored, in part, by NSF through grant1646579 (CPS), and by the Fullgraf Foundation.

REFERENCES

[1] C. Lu, A. Saifullah, B. Li, M. Sha, H. Gonzalez, D. Gunatilaka, C. Wu,L. Nie, and Y. Chen, “Real-time wireless sensor-actuator networks forindustrial cyber-physical systems,” Proceedings of the IEEE, vol. 104,no. 5, pp. 1013–1024, 2015.

[2] P. Park, S. C. Ergen, C. Fischione, C. Lu, and K. H. Johansson, “Wirelessnetwork design for control systems: A survey,” IEEE CommunicationsSurveys & Tutorials, vol. 20, no. 2, pp. 978–1013, 2017.

[3] K. Gatsis, H. Hassani, and G. J. Pappas, “Latency-reliability tradeoffsfor state estimation,” arXiv preprint arXiv:1810.11831, 2018.

[4] W. Zhang, M. S. Branicky, and S. M. Phillips, “Stability of networkedcontrol systems,” IEEE Control Systems, vol. 21, no. 1, pp. 84–99, 2001.

[5] Y. Ma, Y. Wang, S. Di Cairano, T. Koike-Akino, J. Guo, P. Orlik, andC. Lu, “A smart actuation architecture for wireless networked controlsystems,” in Conference on Decision and Control. IEEE, 2018.

[6] L. Sha, “Using simplicity to control complexity,” IEEE Software, no. 4,pp. 20–28, 2001.

[7] X. Liu, Q. Wang, S. Gopalakrishnan, W. He, L. Sha, H. Ding, andK. Lee, “Ortega: An efficient and flexible online fault tolerance archi-tecture for real-time control systems,” IEEE Transactions on IndustrialInformatics, vol. 4, no. 4, pp. 213–224, 2008.

[8] J. Yao, X. Liu, G. Zhu, and L. Sha, “Netsimplex: Controller faulttolerance architecture in networked control systems,” IEEE Transactionson Industrial Informatics, vol. 9, no. 1, pp. 346–356, 2013.

[9] S. Bak, T. T. Johnson, M. Caccamo, and L. Sha, “Real-time reachabilityfor verified simplex design,” in Real-Time Systems Symposium. IEEE,2014, pp. 138–148.

[10] D. Seto and L. Sha, “An engineering method for safety region develop-ment,” Carnegie-Mellon Univ Pittsburgh PA Software Engineering Inst,Tech. Rep., 1999.

[11] D. Soudbakhsh, L. T. Phan, O. Sokolsky, I. Lee, and A. Annaswamy,“Co-design of control and platform with dropped signals,” in Interna-tional Conference on Cyber-Physical Systems, 2013, pp. 129–140.

[12] B. Demirel, Z. Zou, P. Soldati, and M. Johansson, “Modular design ofjointly optimal controllers and forwarding policies for wireless control,”IEEE Transactions on Automatic Control, vol. 59, no. 12, 2014.

[13] ISA100, “Wireless systems for automation.” http://www.isa100wci.org,2020.

[14] Fieldcomm, “WirelessHART specification.”https://fieldcommgroup.org/technologies/hart/hart-technology, 2020.

[15] X. Zhao, P. Shi, Y. Yin, and S. K. Nguang, “New results on stability ofslowly switched systems: A multiple discontinuous lyapunov functionapproach,” IEEE Transactions on Automatic Control, vol. 62, no. 7, pp.3502–3509, 2016.

[16] J. Kapinski, J. V. Deshmukh, S. Sankaranarayanan, and N. Arechiga,“Simulation-guided lyapunov analysis for hybrid dynamical systems,” inInternational Conference on Hybrid Systems: Computation and Control,2014, pp. 133–142.

Page 12: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF …lu/papers/emsoft20.pdf · 2020-07-22 · Exploring Edge Computing for Multi-Tier Industrial Control Yehan Ma, Chenyang Lu, Fellow,

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 12

[17] J. F. Quindlen, U. Topcu, G. Chowdhary, and J. P. How, “Activesampling-based binary verification of dynamical systems,” in 2018 AIAAGuidance, Navigation, and Control Conference, 2018, p. 1107.

[18] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla, and S. S.Sastry, “Foundations of control and estimation over lossy networks,”Proceedings of the IEEE, vol. 95, no. 1, pp. 163–187, 2007.

[19] M. Kogel, R. Blind, F. Allgower, and R. Findeisen, “Optimal andoptimal-linear control over lossy, distributed networks,” IFAC Proceed-ings Volumes, vol. 44, no. 1, pp. 13 239–13 244, 2011.

[20] E. Garone, B. Sinopoli, and A. Casavola, “LQG control over lossy tcp-like networks with probabilistic packet acknowledgements,” in Confer-ence on Decision and Control. IEEE, 2008, pp. 2686–2691.

[21] Y. Mo, E. Garone, and B. Sinopoli, “LQG control with Markovian packetloss,” in European Control Conference. IEEE, 2013, pp. 2380–2385.

[22] R. Blind and F. Allgower, “Towards networked control systems withguaranteed stability: Using weakly hard real-time constraints to modelthe loss process,” in Conference on Decision and Control. IEEE, 2015,pp. 7510–7515.

[23] P. Ramanathan, “Overload management in real-time control applicationsusing (m, k)-firm guarantee,” IEEE Transactions on Parallel and Dis-tributed Systems, vol. 10, no. 6, pp. 549–559, 1999.

[24] M. Ljesnjanin, D. E. Quevedo, and D. Nesic, “Packetized mpc with dy-namic scheduling constraints and bounded packet dropouts,” Automatica,vol. 50, no. 3, pp. 784–797, 2014.

[25] H. S. Chwa, K. G. Shin, and J. Lee, “Closing the gap between stabilityand schedulability: a new task model for cyber-physical systems,” inReal-Time and Embedded Technology and Applications Symposium.IEEE, 2018, pp. 327–337.

[26] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithmfor optimal margin classifiers,” in Annual Workshop on ComputationalLearning Theory, 1992, pp. 144–152.

[27] T. Hu, Z. Lin, and Y. Shamash, “On maximizing the convergence rate forlinear systems with input saturation,” IEEE Transactions on AutomaticControl, vol. 48, no. 7, pp. 1249–1253, 2003.

[28] J.-X. Xu and Y. Tan, “Robust optimal design and convergence propertiesanalysis of iterative learning control approaches,” Automatica, vol. 38,no. 11, pp. 1867–1880, 2002.

[29] P. Corke, Robotics, vision and control: fundamental algorithms InMATLAB® second, completely revised. Springer, 2017, vol. 118.

[30] P. I. Corke and B. Armstrong-Helouvry, “A meta-study of puma 560dynamics: A critical appraisal of literature data,” Robotica, vol. 13, no. 3,pp. 253–258, 1995.

[31] P. I. Corke and B. Armstrong-Helouvry, “A search for consensus amongmodel parameters reported for the puma 560 robot,” in InternationalConference on Robotics and Automation. IEEE, 1994, pp. 1608–1613.

[32] Y. Ma and C. Lu, “Efficient holistic control over industrial wirelesssensor-actuator networks,” in International Conference on IndustrialInternet. IEEE, 2018, pp. 89–98.

[33] P. Zometa, M. Kogel, and R. Findeisen, “µao-mpc: a free code gener-ation tool for embedded real-time linear model predictive control,” inAmerican Control Conference. IEEE, 2013, pp. 5320–5325.

[34] F.-L. Lian, J. Moyne, and D. Tilbury, “Modelling and optimal controllerdesign of networked control systems with multiple delays,” InternationalJournal of Control, vol. 76, no. 6, pp. 591–606, 2003.

[35] M. M. Olsen and H. G. Petersen, “A new method for estimatingparameters of a dynamic robot model,” IEEE Transactions on Roboticsand Automation, vol. 17, no. 1, pp. 95–100, 2001.

[36] B. Li, Y. Ma, T. Westenbroek, C. Wu, H. Gonzalez, and C. Lu, “Wirelessrouting and control: a cyber-physical case study,” in InternationalConference on Cyber-Physical Systems. IEEE, 2016, p. 32.

[37] H. Lee, A. Cerpa, and P. Levis, “Improving wireless simulation throughnoise modeling,” in International Conference on Information Processingin Sensor Networks. IEEE, 2007, pp. 21–30.

[38] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing andits role in the internet of things,” in MCC Workshop on Mobile CloudComputing. ACM, 2012, pp. 13–16.

[39] A. Zou, J. Leng, X. He, Y. Zu, C. D. Gill, V. J. Reddi, and X. Zhang,“Voltage-stacked GPUs: A control theory driven cross-layer solutionfor practical voltage stacking in GPUs,” in International Symposiumon Microarchitecture. IEEE, 2018, pp. 390–402.

[40] A. Zou, J. Leng, X. He, Y. Zu, V. J. Reddi, and X. Zhang, “Efficient andreliable power delivery in voltage-stacked manycore system with hybridcharge-recycling regulators,” in Design Automation Conference. IEEE,2018.

[41] V. K. Reddy, B. T. Rao, and L. Reddy, “Research issues in cloudcomputing,” Global Journal of Computer Science and Technology, 2011.

[42] H. El-Sayed, S. Sankar, M. Prasad, D. Puthal, A. Gupta, M. Mohanty,and C.-T. Lin, “Edge of things: The big picture on the integration ofedge, iot and the cloud in a distributed computing environment,” IEEEAccess, vol. 6, pp. 1706–1717, 2017.

[43] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, andK. Chan, “Adaptive federated learning in resource constrained edge com-puting systems,” IEEE Journal on Selected Areas in Communications,vol. 37, no. 6, pp. 1205–1221, 2019.

[44] S. Sarkar, S. Chatterjee, and S. Misra, “Assessment of the suitability offog computing in the context of internet of things,” IEEE Transactionson Cloud Computing, vol. 6, no. 1, pp. 46–59, 2015.

[45] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edgecomputing—a key technology towards 5G,” ETSI White Paper, vol. 11,no. 11, pp. 1–16, 2015.

[46] G. Premsankar, M. Di Francesco, and T. Taleb, “Edge computing forthe internet of things: A case study,” IEEE Internet of Things Journal,vol. 5, no. 2, pp. 1275–1284, 2018.

[47] M. Chen, W. Li, Y. Hao, Y. Qian, and I. Humar, “Edge cognitivecomputing based smart healthcare system,” Future Generation ComputerSystems, vol. 86, pp. 403–411, 2018.

[48] I. Burago, M. Levorato, and A. Chowdhery, “Bandwidth-aware datafiltering in edge-assisted wireless sensor systems,” in InternationalConference on Sensing, Communication, and Networking. IEEE, 2017.

[49] V. Mihai, C. Dragana, G. Stamatescu, D. Popescu, and L. Ichim,“Wireless sensor network architecture based on fog computing,” in Inter-national Conference on Control, Decision and Information Technologies.IEEE, 2018, pp. 743–747.

[50] G. Ananthanarayanan, P. Bahl, P. Bodık, K. Chintalapudi, M. Philipose,L. Ravindranath, and S. Sinha, “Real-time video analytics: The killerapp for edge computing,” Computer, vol. 50, no. 10, pp. 58–67, 2017.

[51] M. Aazam, S. Zeadally, and K. A. Harras, “Deploying fog computingin industrial internet of things and industry 4.0,” IEEE Transactions onIndustrial Informatics, vol. 14, no. 10, pp. 4674–4682, 2018.

[52] A. Yousefpour, C. Fung, T. Nguyen, K. Kadiyala, F. Jalali, A. Niakan-lahiji, J. Kong, and J. P. Jue, “All one needs to know about fog computingand related edge computing paradigms: A complete survey,” Journal ofSystems Architecture, 2019.

[53] M. G. Ioannides, “Design and implementation of plc-based monitoringcontrol system for induction motor,” IEEE Transactions on EnergyConversion, vol. 19, no. 3, pp. 469–476, 2004.

[54] F. Mager, D. Baumann, R. Jacob, L. Thiele, S. Trimpe, and M. Zim-merling, “Feedback control goes wireless: Guaranteed stability overlow-power multi-hop networks,” in International Conference on Cyber-Physical Systems. ACM, 2019, pp. 97–108.

[55] J. Wu and T. Chen, “Design of networked control systems with packetdropouts,” IEEE Transactions on Automatic control, vol. 52, no. 7, 2007.

[56] H. Lin and P. J. Antsaklis, “Stability and stabilizability of switched linearsystems: a survey of recent results,” IEEE Transactions on Automaticcontrol, vol. 54, no. 2, pp. 308–322, 2009.

[57] D. Liberzon and A. S. Morse, “Basic problems in stability and designof switched systems,” IEEE Control Systems Magazine, vol. 19, no. 5,pp. 59–70, 1999.

[58] X. Dai, W. Chang, S. Zhao, and A. Burns, “A dual-mode strategyfor performance-maximisation and resource-efficient cps design,” ACMTransactions on Embedded Computing Systems, vol. 18, no. 5s, 2019.

[59] F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptivedynamic programming for feedback control,” IEEE Circuits and SystemsMagazine, vol. 9, no. 3, pp. 32–50, 2009.

[60] P. Skarin, W. Tarneberg, K.-E. Arzen, and M. Kihl, “Towards mission-critical control at the edge and over 5G,” in International Conferenceon Edge Computing. IEEE, 2018, pp. 50–57.

[61] Y. Ma, D. Gunatilaka, B. Li, H. Gonzalez, and C. Lu, “Holistic cyber-physical management for dependable wireless control systems,” ACMTransactions on Cyber-Physical Systems, vol. 3, no. 1, p. 3, 2018.

[62] A. Cervin, D. Henriksson, B. Lincoln, J. Eker, and K.-E. Arzen, “Howdoes control timing affect performance? analysis and simulation oftiming using jitterbug and truetime,” IEEE Control Systems Magazine,vol. 23, no. 3, pp. 16–30, 2003.

[63] E. Eyisi, J. Bai, D. Riley, J. Weng, W. Yan, Y. Xue, X. Koutsoukos,and J. Sztipanovits, “NCSWT: An integrated modeling and simulationtool for networked control systems,” Simulation Modelling Practice andTheory, vol. 27, pp. 90–111, 2012.

[64] J. Araujo, M. Mazo, A. Anta, P. Tabuada, and K. H. Johansson, “Systemarchitectures, protocols and algorithms for aperiodic wireless controlsystems,” IEEE Transactions on Industrial Informatics, vol. 10, no. 1,pp. 175–184, 2014.