characterizing and orchestrating nfv-ready servers for

10
Characterizing and Orchestrating NFV-Ready Servers for Efficient Edge Data Processing Lu Zhang Shanghai]iao Tong University [email protected] Y unxin Liu Microsoft Research [email protected] ABSTRACT Chao Li Shanghai ]iao Tong University [email protected] .edu .cn YangHu The University of Texas at Dallas yan g.hu4 @utd allas.edu Minyi Guo Shanghai ]iao Tong University [email protected] KEYWORDS Pengyu Wang Shanghai ]iao Tong University [email protected] Quan Chen Shanghai ]iao Tong University chen-qu [email protected] The fast-growing Internet of Things (loT) and Artificial intelligence (AI) applications mandate high-p erf ormance edge data analytics. This requirement cannot be fully fulfilled by prior works that focus on either small architectures (e.g., accelerators) or large infrastruc- ture (e.g., cloud data centers). Sitting in betwe en the edge and cloud, there have been many server-level designs for augmenting edge data processing. However, they often require specialized hardw are resour ces and lack scalability as well as agility. Other than reinventing the wheel, we explore tapping into un- derutilized network infrastructure in the incoming 5G era for aug- menting edge data analytics. Specifically, we focus on efficiently deploying edge data processing applications on Network Function Virtualization (NFV) enabled commodity servers. In such a way, we can benefit from the service flexibility of NFV while greatly r educing the cost of many servers deployed in the edge network. We perf orm extensive experiment s to investigate the characteristics of packet pro cessing in a DPDK-based NFV pl atf orm and discover the resource under-utilization issue when using the DPDK polling- mode. Then, we prop ose a framework named EdgeMiner, which can harvest the potentially idle cycles of the cores for data processing purp ose. Meanwhile, it can also guarantee the Quality of Service (QoS) of both the Virtu alized Network Functions (VNFs) and Edge Data Processing (EDP) applications wh en they are co-running on the same server. CCS CONCEPTS General and reference -> Meas ure me nt ;• Software a nd its engi neer ing -> Pro cess management; Networks -> Network servers. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advant age and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitt ed. To copy otherwise, or republish, to post on servers or to redistribute to lists. requires prior specific permission and/or a fee. Request permissions from perm[email protected]. IWQoS '19, Jun e 24-25, 2019, Phoenix, A Z, USA © 2019Association for Computing Machinery. ACM ISBN 978-1-4503-6778-3/19/06... $15.00 https://doLorg/10.1145/3326285.3329057 Network Function Virtu alization, resour ce und er-utili zation, edge data processing, QoS ACM Reference Format : Lu Zhang, C hao Li, Pengyu Wang, Yunxin Liu, Ya ng Hu , Quan Chen, and Minyi Guo . 2019. Cha ra cte riz ing and Orches tra ting NFV-Ready Serv ers for Efficient Edge Data Proce ss i ng . In IEEE/ACM Int ernati onal Sy mposium on Quality of Service (IWQoS '19), June 24-25,2019, Phoenix, AZ, USA. ACM , New York, NY, USA, 10 pages . https://doi.orgi l 0.1145/3326285.3329057 1 INTRODUCTION Deploying data analytic applications and Al-enabled services near the edge is bec oming increasingly popular today since moving stored or in-flight data to the cloud can be problematic. According to Gartner, there will be over 20 billion loT devices inst alled by 2020 [9], which will create large quantiti es of data needing to be analyzed. Integrating domain-specific accelerators in end devices (e.g., smart phones or video cameras) can boost Edge Data Processing (EDP), but it is in adequate due to limited capacity and scalability. Faced with a deluge of edge traffic, it is also critical to tap into auxiliary edge systems for augmented EDP. Recently, man y pioneer studies have explored such hierarchical topology, includin g the Cloud-Fog- Edge three-layer design [7, 38] or four- layer design [34]. However, there are no well-established architectures for the aug- mented EDP landscape [25]. Even though various edge computing solutions [5, 24, 32, 36, 37] have been explored over the years, they typically assume certain specialized hardw are such as smart gate- ways, mini clusters, or network devices to process the raw data. For example, XPro [36] proposes an energy-efficient architecture for smart body sensing. ParaDrop [37] and Fog Computer [8] high- light smart gateways for processing data stream near users. At the server level, Cloudlet [32] uses virtu alized local cluster s to augment mobile services and InSURE [24] leverages standalone clusters to pre-process raw data onsite. Recently, Microsoft introduces Data Box Edge/Gateway [5], which is a physical network appliance that uses AI to analyze and transform data before it is uploaded to Azure. The above systems improve different edge-oriented services for sure, but they can be less cost-efficient and scalable. Oftentimes, en ormous mini data centers or fog computing nodes need to be

Upload: others

Post on 17-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Characterizing and Orchestrating NFV-Ready Servers for

Characterizing and Orchestrating NFV-Ready Servers forEfficient Edge Data Processing

Lu ZhangShanghai ]iao Tong University

[email protected]

Yunxin LiuMicrosoft Research

[email protected]

ABSTRACT

Chao LiShanghai ]iao Tong University

[email protected] .edu .cn

YangHuThe University of Texas at Dallas

yang.hu4@utd allas.edu

Minyi GuoShanghai ]i ao Tong University

[email protected]

KEYWORDS

Pengyu WangShanghai ]iao Tong University

wpybtw@sjtu .edu.cn

Quan ChenShanghai ]iao Tong University

chen-qu [email protected]

The fast-growing Internet of Things (loT) and Artificial intelligence(AI) applications mand ate high-performance edge data ana lytics.This requirement cannot be fully fulfilled by prior works that focuson either small architectures (e.g., accelerators) or large infrastruc­ture (e.g., cloud data centers). Sitting in betwe en the edge and cloud,th ere have been many server-level design s for augmenting edgedata processing. However, they often require specialized hardw areresour ces and lack scalability as well as agility.

Other than reinventing the wheel, we explore tapping int o un ­derutilized network infrastructure in the incoming 5G era for aug­menting edge data ana lytics. Specifically, we focus on efficientlydeploying edge data processing applications on Network Functi onVirtualization (NFV) ena bled commodity servers. In such a way,we can benefit from the service flexibilit y of NFV while greatlyreducing the cost of many servers deployed in the edge network.We perform extensive experiment s to investigate the characteristicsof packet pro cessing in a DPDK-based NFV platform and discoverthe resource under-utili zation issue when using the DPDK polling­mode.Then, we prop ose a framework named EdgeMiner, which canharve st the potentially idle cycles of the cores for data processingpurpose. Meanwhile, it can also guarantee the Quality of Service(QoS) of both the Virtu alized Network Functions (VNFs) and EdgeData Processing (EDP) applications wh en they are co-running onthe same server.

CC S CONCEPTS• General an d reference -> Measurement; • Software and itsengineering -> Process management; • Networks -> Networkservers.

Permission to make digital or hard copies of all or par t of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitt ed. To copy otherwise, or republish,to post on servers or to redistribute to lists. requires prior specific permission and/or afee. Request permissions from [email protected] '19, Jun e 24-25, 2019, Phoenix , A Z, USA© 2019Association for Computing Machinery.ACM ISBN 978-1-4503-6778-3/19/06... $15.00https://doLorg/10.1145/3326285.3329057

Network Function Virtu alization, resour ce und er-utili zation, edgedata processing, QoS

ACM Reference Format:Lu Zha ng , Chao Li, Pe ngy u Wang , Yun xin Liu , Yang Hu, Q ua n Che n ,an d Minyi Guo . 2019. Cha racte riz ing and Orches trating NFV-Ready Serv ersfor Efficien t Edge Data Proce ss ing. In IEEE/ACM Int ernati onal Sy mposiumon Quality of Service (IWQoS '19), June 24-25,2019, Phoenix, AZ, USA. ACM ,New York, NY, USA, 10 pages. https :/ /doi. orgi l 0.1145/ 3326285.3329057

1 INTRODUCTIONDeploying data analytic applications and Al-enabled services nearthe edge is becoming increasin gly popular tod ay since movin gstored or in-flight data to the cloud can be problematic. Accordingto Gartner, there will be over 20 billion loT devices inst alled by 2020[9], which will create large quantiti es of data needing to be analyzed.Integratin g domain-specific accelerators in end devices (e.g., smartph one s or video cameras) can boost Edge Data Proc essing (EDP),but it is inadequate due to limited capa city and scalability. Facedwith a deluge of edge traffic, it is also critical to tap into auxiliaryedge systems for augmented EDP. Recentl y, man y pioneer studieshave explored such hierarchical topology, includin g the Cloud-Fog­Edge three-layer design [7, 38] or four- layer design [34].

However, there are no well-established architectures for the aug­ment ed EDP landscape [25]. Even though various edge computingsolutions [5, 24, 32, 36, 37] have been explored over the years, theytypically assume certain specialized hardw are such as smar t gate­ways, mini clusters, or network devices to process th e raw data.For example, XPro [36] proposes an energy-efficient architecturefor smart body sensing. ParaDrop [37] and Fog Computer [8] high­light smart gateways for processing data stream near users . At theserver level, Cloudlet [32] uses virtu alized local cluster s to augmentmobile services and InSURE [24] leverages standalone cluster s topr e-process raw data onsite. Recently, Microsoft introduces DataBox Edge/Gateway [5], which is a physical network appliance thatuses AI to analyze and transform data before it is uploaded to Azure.

The above systems impro ve different edge-orient ed services forsure, but they can be less cost-efficien t and scalable. Oftentimes,enormous mini data centers or fog computing nodes need to be

Page 2: Characterizing and Orchestrating NFV-Ready Servers for

IWQoS '19, Jun e 24-25, 2019, Phoenix, AZ, USA Lu Zhang, Chao Li, Pengyu Wang, Yunxin Liu, Yang Hu, Ouan Chen , and Minyi Guo

In-Situ Acceleration Augmented Edge Data Processinig

I III II -:::=.I JJJ1i \I II II d \\I II ~ ~ II II

~ ~: :1 ~I * J ~I IL'~D~r :.!...1 L __ ~~'J~'__ _ I

Fig ure 1: The case for Au gmented Edge Data Processing

physically deployed in a specific region. Thi s unavoidably increasescapital expenses (Capex) and opera tional expenses (Opex). Worse ,due to the reliance on specialized hardware, one can hardly accom­modate edge traffic surge or react to failure scena rios. Withoutover-prov isioning edge reso urces, one has to offload traffic to re­mote data centers every now and then.

Moreover, since existing designs introd uce various application­speci fic platform s, it is difficult to coordina te diverse applicationsacross different vendors. While all of th ese sys tems may operateund er the same open standard in the future , currently there is nouniversally accepted one. The edge computing space is th ereforesomewha t fragmented. Many existing technologies only providea certain degree of fun ctionality. For example, popular suites ofproprietary pr otocols including XlO [1], Z-Wave [2], and ZigBee[4], all of which are incompatible with each other. Unless a singlespec ific kind of equipment is used, one has to find a way to sha redevice data with others.

Thi s work is driven by the observation that currently the re is nota strong motivation for provisioning a wide spectrum of specializedappliances for generic EDP tasks. Rather than exploring and re­defining a new type of edge architecture , we propose to unleash theperformance and capac ity potential on edge equipment already inplace. Specifically, in this paper we set out to explore leveraging theunder-utilized NFV serve rs at the edge to co-loca te th e emergingedge computation workloads.

Our argument stands on the recent trend towards Network Func­tion Virtualization (NFV) [10] with the aim of improving the net­work's manageabilit y and reducing th e capital expenditure . NFVimplement s the network function (NF) (e.g., routing, detection, andload balance , etc.) as software in virtual machines (VMs) or con­tain ers, which allows virtual NFs to be deployed on commodityoff-the-she lf (COTS) serve rs . On the othe r hand , being deployedbetween the end devices and the data centers, these COTS serversdeployed with NFs are naturally formed as a mesh network of com­puting nodes that can process or store data locally and pu sh allreceived data to a central cloud , as shown in Figur e l.

In fact, th e European Telecommunications Standar d Institute(ETSI) advoca tes reusing existing NFV infrastructure and the man­agement capabilit y of NFV to the largest exten t possible in theedge computing era [16]. In recent years, NFV at the network edge(e.g. virtualized Customer Premises Equipment, vCPE) is gaining in­creasing attention. According to an IDC report [30], the worldwide

market for NFV infrastructure at the network edge is expected togrow from a base of $67.8 million in 2016 to $1.16 billion in 2021 ata compound annual growth rate (CAGR) of 76.4%. Such increasingyet under-utilized [40] computing infrastru cture makes it attrac tivefor users who want more affordable, deploy-on-demand edge com­puting power. Doing so also allows one to speed up th e adoptionand delivery of our edge applications as well.

In this paper, we propose EdgeMiner, an augmented EDP frame­work that enables NFV-ready COTS serv ers to efficiently host aug­ment ed EDP applications such as th e edge video enco ding andimage processing applications. Towards this goal, we perform exten­sive exper iments to character ize the resource utilization of DPDK­based NFV COTS servers. Intel DPDK [18] is designed to mitigatethe overhea ds of softwa re packet pro cessin g by optimizing th ekern el network stack and allowing direct dat a access to bypassthe kern eL However, our cha rac terization results show th at , incontras t to the core network scenar io, DPDK-based NFV platformhas poor CPU utilization at the network edge with lower packetreceive/sendin g rate. Motivated by our observa tion, EdgeMine rsmartly mines the idle CPU resources of DPDK-based NFV plat­forms without impairing the performance and capacity of VNFs. Itleverages batch proce ssing and batch-int errupt to discover the idleCPU resource of DPDK-based NFV platforms. We also propose aDynamic Search Core (DSC)algorithm to guarantee the QoS ofE DPapplications. We evaluate our framework using the typical networkfunction and popular EDP applications such as image proc essin gand video encoding.

To the best of our knowledge, th is work is first to provision EDPapplications on the flexible network infr astructures.

Thi s paper makes the followin g key contributions :

• We conduct extensive experiment s to measure the architec­ture characteristic of a DPDK-based NFVplatform under var­ious configurations. We demonstrate the under-utilizationissue of serve rs.

• We propose EdgeMiner, a light -weight edge processing frame­work that allows VNFs and EDP tasks to co-locate on com­modity servers. We design scheduling schemes that can guar ­ant ee the QoS of both type s of applications.

• We evaluate EdgeMiner using typical network functions andEDP applications such as x264 (a video encoding benchmark)and vips (an image processing benchmark). We show thatEdgeMiner can save 13%-90% CPU utili zation when th ereare no EDP applications and guarantee the QoS of EDP ap­plications when deploying them on the NFV platforms.

The rest of this paper is organized as follows. Section 2 providesthe background and furth er motivates our work. Section 3 character­izes the packet processing behavior on DPDK-based NFV platformsand investigates the capacity potential of the COTS servers. Section4 proposes our EdgeMiner framewo rk. Section 5 evaluates our de­sign with representative appli cations. Section 6 introduces relatedwork and finally Section 7 concludes the paper.

2 BACKGROUND AND MOTIVATIONIn th is section we briefly int rodu ce th e essential concep ts of NFVand EDP. Our driving insight is that traditional DPDK-based NFVplatforms are generally underutilized in the edge; one can actually

Page 3: Characterizing and Orchestrating NFV-Ready Servers for

Cha racte rizing a nd Orchestratin g NFV-Ready Servers for Efficient Edge Dat a Processing IWQoS '19, June 24- 25, 2019, Phoen ix, AZ, USA

deploy EDP applications as a type of network fun ction (NF) tofully utili ze the NFV servers. In such a way, we can provide edgecomputing services with high cost-efficiency and service agility.

2.1 Network Function VirtualizationNFV proposes to move various network functions from hardware"middleboxes" to software appliances in VMs or containers. Deploy­ing virtualized network functions (VNF) as softwa re on commod­ity servers has man y advantages such as reducing cost th roughworkloa d conso lidation, simplifying resource management, andfast deployment as so forth.

In order to achieve the line rate for packet processing, today 'sNFV-enabled serve r adopts several op timization stra tegies . ManyNFV platform s take advantage of the state-of-the-art I/O librariessuch as Intel DPDK [18], RP_Ring [28] and Netmap [31]. All theabove libraries reduce the performance overheads in the traditionalkern el network stack. In order to deeply study NFV platforms, inthi s paper, we mainl y focus on NFV platforms usin g DPDK forhigh-performance packet I/O.

2. 1. 1 Packet Processing of DPDK. Intel DPDK packet I/O libraryoffers a set of primitives th at allow users to create efficient user­space NFs on x86 platforms, particularly for high-speed data planeapplications. DPDK mainly operates in a polling mode ra ther thanth e traditi onal int errupt packet I/O, which can reduce the timespent for packet traveling in the server. DPDK also uses huge pagesto pre-allocate large regions of memory, which can reduce the TLBmiss and packet transmission overhea d. In thi s case, applicationsperform DMA opera tions and access data directly from the NICwithout involving kernel process ing and memory copy. Moreover,some architectures can further use DDIO (Direct Data I/O) whi chdirectly accesses data from NICs to LLC to reduce the lat ency ofmemory access. Additionally, DPDK requires each VNF to occupyone dedicated CPU core to avo id context switches . Ail of theseefforts are made to provide high-performance packet I/O opera tionsand a high-performance NFV system.

While DPDK achieves very high th roughput with agg ress iveoptimization, it has sacrificed the efficiency of resource utili zation.When the packet transmission rate is low (e.g., below 10 Gigabits/s),DPDKwill suffer from rather poor CPU utili zation due to its relianceon busy polling. Additionally, since it pre-allocates large regions ofmemory, it unnecessarily causes memory capacity waste when thepacket access rate is low.

2.1.2 Service Funct ion Chaining. Network services often requirevarious NFs. A packet usually traverses a series of sequenced net­work fun ctions before th e final application pr ocesses it, which iscalled Service Function Cha in (SFC). SFC implement ed in NFVplatform s can easily scale and change its locations on demand .

EFSI standards [10] show that different NFs have significantlydifferent processing and performance requir ement s. Some NFs havea high per-core throughput (Mpps), e.g., switches; and some have alow thr oughput as a few kilo pp s, e.g., encryption applications. Inthi s case, one NF in a service cha in that drops packet s will wastevas t amount of processing resources in earlier NFs of the cha in.The imbalanced computing capabilities cause resour ce waste, andas the length of SFC increases, this situa tion will be worse.

2.2 Edge Data Processing on NFV PlatformsFor tradit ional Io'T sys tems, raw data generated from edge devicesis sent to applications deployed in the Cloud thr ough the network.There are several disadvantages of th is Cloud-centric model: 1)large latency overh ead: a large amount of data movement thr oughthe network will cause severe energy and latency overheads; 2) de­ployment problems: it is impractical to connect all the edge deviceswith th e Cloud, and; 3) secur ity and pr ivacy issues: sending th eedge data to the Cloud may raise secur ity and privacy issues.

Rather than cons tan tly movin g a hu ge amount of data to theCloud, recent years has witnessed a new trend towards edge-orienteddata processing. EDP is a natural extens ion with the evolution ofmobile base stations and the converge nce ofIT and telecommunica­tions networkin g. Typically, EDP applications can be implementedin a virtualized enviro nment [16, 32]. In othe r words, th e un der­lying platform tha t hosts EDP tasks and NFV workloads is quitesimilar. EDP emphasizes data processing at the edge network, whilethe NFV platform is focused on processing network packets. Withappropriate design and modification , one can actually deploy com­put ational edge serv ices as network functions. It allows operatorsto benefit as much as possible from their inves tment, by hostingboth VNFs and EDP applications on the same platform .

Nevertheless, it is imp ortant not to aggress ively subscribe theNFV-enabled servers as both EDP tasks and VNFs have QoS requir e­ments. VNFs gene ra lly pursue high th roughput and low latencyfor packets processing. Similarly, EDP applications face very strictdeadlin e. Moreover, EDP applications are cha racterized by latency,proximity and real-tim e running. Thu s, intelligent orchestration ofcomputing and network resour ces is of paramount import ance.

3 CHARACTERIZING NFV SERVERSTraditi onal NFV platforms mainly focus on high performance butoverlook resource efficiency. In this section, we conduct extens iveexperiments to investigate the resource utilization of a DPDK-basedplatform in terms of CPU and memory utili zation und er differentconfigura tions. The results of experiments show that there are lotsof underutili zed resources, which motivate our design of co-runningEDP applications on network infrastructure.

3.1 Experiment Setup3.1.1 Platform Conf iguration. Our physical platform configurationis show n in Table 1. The server uses 4 Intel 1350 IGigabit Etherne tNICs assoc iated with the single socket. Th e operating sys tem isUbuntu Linux 16.04 and th e version of DPDK is 17.08. All of ourexper iments are based on openNetVM [42].

The platform architecture in our experiment is shown in Figure2. Wh en the sys tem starts, a man ager thread pr e-allocates mem­ory pools trte rnempooi creates in Huge Pages to store incomingpackets. Ail threads are pinned to dedicated CPU cores in our ex­periment. Next, we will introdu ce the flow of packet processingin detail: Wh en packets arrive in th e sys tem, th e RX thread willfetch the data from the NIC with zero copy (0). Then the RXthr eadwill look up th e flow table to decide whi ch VNF is selected andth en send the packet to th e corres ponding VNF's RX queue (@).VNFI reads packets from its RX queue and processes them usin guser-defined functions (0). Then, all packets will be sent to VNFI's

Page 4: Characterizing and Orchestrating NFV-Ready Servers for

IWQoS '19, Jun e 24-25, 2019, Phoenix, AZ, USA Lu Zhang, Chao Li, Pengyu Wan g, Yunxin Liu, Yan g Hu, Ouan Chen, an d Minyi Guo

Table 1: Platform configuration Table 2: Different VNFs computation cost

ItemCOTS System

Processor

Memory

NIC

ConfigurationUbuntu 16.04, single socketInt el Xeon £5-2620 [email protected] phys ical cores (disabled HyperThreadin g)15MB 13 cache for the socket64KB11 cache,256KB L2 cache each core32GB DDR4 tot alHugeSize=2MB and 1024 HugePageInt el i350, 4 port each 1Gigabit

UserSpace

~ Forward DPI Aes_encryptPkt Size64bytes 40cycles 300cycles 3.3K cycles128bytes 40cycles 330cycles 9K cycles256bytes 40cycles 340cycles 20K cycles512bytes 40cycles 360cycles 43K cycles1024bytes 40cycles 370cycles 88K cycles

AES_en cryption/DES3ncryption: This func tion aims to en ­crypt/dec ryp t UDP packets using speci fied encryp tion/decryp tionalgori thm, and then forward them to spec ific NFs. Since thi s fun c­tion needs to encry pt the packet payload usin g th e encrypti onalgori thm, it's a computation-intens ive fun ction .

Table 2 shows the computa tion cost of processing one packetun der different packet sizes. Forward consumes only about 40 cy­cles and the comp utation cost maintain s although the packet sizeincreases du e to its simple ope ration. DPI needs to rea d both theheader and payload of a packet and inspect packet for secur ity. Withpacket size increasing, its computa tion cost increases slightly. Asfor AES_ encrypt, it needs not only to read the header and payload ofa packet , but also to pro cess the packet using a complex algorithm.Thus, AES_e ncryp t consumes the most cycles to process a packet ,and the computation cost is highly corre lated to the packet size.

Figure 2: Packet processing on NFV platforms

TX queue. When the polling mode is used, CPU cores will be alwaysbusy to wait and pro cess packets while VNF1 is running (i.e. busypolling). If the interrupt mo de is used, the core pinned to VNF1will be sleep when there is no incomin g packets or other situations(e.g., backpressure). The wake -up th read aims to monitor each NF'sinformation and decide wh en to activa te the VNF thread (ID). Th eTX thread polls to read packets from the Nf'I 's TX queue, and thenmoves the pac kets into anothe r NF's RX (NF2 RX) or sends themto the NIC based on the destination information of the packets (@).The packets processed by NFl will be processed by NF2 «i)). Thus,th e TX th read moves the pac kets from NFl to NF2, and then NF2processes the packets in the same manner as step 0 . After NF2 pro­cesses the packet s, the TX th read checks th e packet' s destinati oninformation and decides to move the packet out of the system (fil) .

3.7.2 Network Function Workloads. In our experiments, we choosethree typical NFs to explore reso urce utili zation characteristics ofDPDK-based NFV platforms:

Forward: Thi s network func tion is similar to the Ipv4/Ipv6packet forwarding network funct ion . After receiving packets, For­ward reads the destination inform ati on of packets and then sendspackets to the next VNF.

DPI: Deep Packet In spection (DPI) is an esse ntial securi ty ap­proac h in the network. It is applied in network applications includ­ing network intrusion detection system (IDS) and Web applicationfirewalls.Thi s network function can locate, identify, classify, rerout eand block packets.

3. 7.3 Test Traffic. In th is paper, we utilize a DPDK-based packetgenerator to generate packets with different packet sizes and differ­ent trans miss ion rates. Since AES_encrypt can only process UDPpackets, all the experim ents are based on UDP packets. We have 4ports in our packet gene rator serve r, we can gene ra te up to 4 Gbnetwork packets. The packet gene rator runs on a separate serverand connec ts to the test server directly.

3.2 VNF Memory BandwidthSince mem ory access is an important component wh en VNFs pro­cess packets, it is necessary to study the characteristics of VNFsmemory utilization. We use events provided by Performance Moni­tor Unit [6] to monit or the memory bandwidth. The resour ce mon ­itor distill s cruc ial inform ati on for th e sys tem to make informe ddecisions. We use libpfm4 library to translate human -readable per­formance event names to machine-readable event code. The monit oruses the perf-event interface of Linux to access hardware perfo r­man ce counters. Th e perfor mance event name used in the mem­ory bandwidth monitor is hswep_unc_ imcX::UNC_M_CA S_COUNT.Thi s register coun ts the numb er of DRAM CAS commands recordedat integrated memory controller (IMe) at CPU socket X. Addition­ally, to obtain the peak memory bandwidth of our server, we utili zeth e STREAM Benchmark [3] to measure th e peak memory band­width and we get 18.4GB/s.

As show n in Figure 3, we present the memory utilization ofthreenetwork fun ction s wi th different packet sizes and different tr an s­mission rates. For each VNF, the memory bandwidth usage growsas the packet size and transmission rate increase. It is clear that thememory bandwidth usage is very small in all the experiments, andth e highest mem ory bandwidth usage is onl y 4.7% of the server

Page 5: Characterizing and Orchestrating NFV-Ready Servers for

Characterizing and Orchestr ating NFV-Ready Servers for Efficient Edge Data Processing IWQoS '19, June 24- 25, 2019, Phoen ix, AZ, USA

5

~ 4o 64byte s o 128bytes o 256bytes D5 12bytes o 1024bytes

~=~ 3.....Q.,

2~=......eE

0...:.;

OP I Aes_encrypt

Figure 3: Memory bandwidth usage of different VNFs under different packet rate s

Fig ure 4: CPU utilization of on-demand inter rupt

3.3 VNF CPU Utilizat ion

peak bandwidth usage from VNF Aes_encrypt with 1024 packet sizeunder 4Gb spee d. In thi s case , we can cons ider that VNFs in ourexpe riments are not memory-bandwidth bounded. Thus, we willstudy the cha racteristics of CPU utili zation in detail and focus onCPU utili zation in our framework.

In this subsection, we set up exper iments to explore CPU utili zationof VNFs. Under the DPDK polling mode, NFs that are busy waitingfor packets waste lots of CPU resource in low-speed network. Thu s,an int errupt mode would be helpful for saving the resource, as isdone in NFV platforms such as Netm ap [31] and ClickOS [27].

because the packet rate decreases and the NF has larger probabilityto become the SLEEP status.

Considering th at a VNF needs to switch between the SLEEPand ACTIVE status in th e interrupt mode, the context switch andother extra sys tem costs are inev itable. To obta in a detailed CPUutili zation breakd own of interrupt mode, we collect two kinds ofCPU utilization: User CPU, which is occupied by the user applica­tion; and System CPU, which contains CPU time utilized by systeminterrupts, context swi tches and othe r sys tem operations. In th ispart , we only explore the CPU utilization of Forward and DPI sinceAes_encrypt are always busy.

Figure 5 shows a breakdown of CPU utili zation ofVNFs in theon-demand interrupt mode. Except for 64Bytes packet processingin DPI, all the other resu lts show that user-defined functions onlyaccount for less than 50%total CPU cycles. When running Forward,there is only at most 36%CPU utili zation on user applications andthe othe rs are wasted by system opera tions. Both results showhu ge resource waste and with th e packet size increasin g, systemopera tion cost becomes more severe.

However, the wasted CPU utilization is beyond what we can seein the above cases. Considering that when the packet size is 64 bytes,the VNF will receive a packet every 1488 cycles with 2.4GHz fre­quency. Note that Forward processes a packet with about 40 cycles.Thu s, CPU utilization for processing a packet is 40/ 1488 "" 3%. Eventhough this procedure contains other user-defined applications suchas reading packets from the RX queue and writing packets into theTX queue, 3%and 82%have too large difference.

To handl e the shortcoming of on-demand int errupt, we designa batch int errupt to measure thi s sys tem. The experiments are asfollows: when VNF's RX queue is empty, the NF thr ead will enterSLEEP status to save the CPU cycles. The Wakeup Thr ead monitorsVNF's RX queue and if the size of the RX queue is larger than thebatch size, it will send the ACTIVE semaphore to the VNF Thr ead.

Figure 6 shows the CPU utili zation in the batch interrupt mode.Both Forward and DPI save a great amount of CPU ut ilization . Inaddition to th e total CPU utilization , we explore th e breakdownof CPU utilization in thi s batch situation as well. Wh en it usesbatch int errupt manner, user CPU utilization will occupy a largeratio in both VNFs. Even in th e wors t case, wh en th e packet sizeis 1024 bytes, the user CPU utili zation will occupy 67%and 82%inForward and DPI respectively. In this case, using batch interrupt is

1024bytes

• Forw ardD DPIo Aes encrypt

I28bytes 2S6bytes Sl 2bytesPacket Size

Don ly pollin g

Mbytes

0.8

0.2

o

0.4

0.6

3.3.1 CPU Utilization ofSingle VNF. On-demand interrupt is usedin Netmap and ClickOS: if there is no packet in the VNF RX queue,th e NF Thread will enter sleep sta tus and wait for the wakeupsemaphore specifically for it. In the meantime, the wakeup threadmonitors th e VNF's RX queue. Once the VNF's RX queue is notempty, the wakeup thread will send the semaphore to wake up theVNF thread immediately.

Figure 4 shows the CPU utili zation of different applications us­ing thi s strategy. Since the pollin g mode of DPDK will be alwaysbusy wa iting and processin g the pac kets , the CPU utili zation ofthe polling-only mode remains at 100%. The CPU utili zati on ofAes_encrypt is also 100%in both polling and interrupt mode due toits high computation cost. DPI and Forward reveal import ant CPUresource-wasting information in DPDK-based platforms. As thepacket size grows, the CPU utilization of the NF decreases mainly

Page 6: Characterizing and Orchestrating NFV-Ready Servers for

IWQoS '19, Jun e 24-25, 2019, Phoenix, AZ, USA Lu Zhang, Chao Li, Pengyu Wang, Yunxin Liu, Yang Hu, Ouan Chen , and Minyi Guo

0.8 'EQ!

0.6 =>c.

0.4 U..."0.2 ::>

= SYS_CJlU......... Forward user cpu ratio

c::::::J user_cpu......... DPI user cpu ratio

0.35I--------~r="*::::::::.::----,

0.3~.~ 0.25 .--..........~ 0.2::; 0.15=> 0.1o 0.05

O+-'-----'-.....----'-.....--'-.....--'-,.-'=....----' ....----'--,'----Jy'--'-r-'--'-tO

VNFs and Packet Size(Bytes)

c::::::J user_CllU ~S)'S_C PU

~ Fonvard user cpu ratio ........ DPI user cpu ratio0.9 .,-- - - - - - - - - - - - - - - - - -, 0.6Q8 OS= 0.7 .• 'E

~U ~~

] 0.5 0.3 §:s ~:~ 0.2 ';::~ 0.2 0.1 ~U 0.1

o 0

Figure 5: CPU utilization breakdown of on-demand inter­rupt

Figure 6: CPU ut ili zation breakdown for batch interrupt

Figure 7: CPU utilization of varying interrupt size

(a) Forward

Packct .1 Forward 1 Drop .1 A cs_ cncrY)lt 1 Packets .( 1.488i\1)l)ls) . 0.688i\1)l)ls _~'--"";';----J (0.82I\1)l)ls)

Cap acity : Cal' aci ty:1.4881\1pps 0.821\1pps

Fig ure 8: The example of SFC wi th different cos t

Qlen < i\IinThreshold&& ACT IV E

Qlen > i\Iax Th resh old&& S LEEP

- 64byles- 128byles- 2S6bvles- 512bYles- 1024b Ie

I 13 25 37 49 61 73 85 97Interrup t Size

(b) DP!

o

100 , - - -r=::::;::;:;:='====il~ 80

5 60

E: 40U

20 ~~~g;~gj

- 64byles- 128byles- 2S6byles- Sl 2byles- 1024bvle

~~""" M0"];1iI 13 25 37 49 61 73 85 97

Interrupt Size

100 , - - - , ;:==0=:::== :::;""]

~80

== 60=>E: 40U

20

a good way to save CPU utili zation for ene rgy-saving purpose ordata processing purpose.

To und erstand th e relationship between batch size and CPUutili zation, we design a fine-grained experiment to measure CPUutilization . In our experiment, we change the batch size from 1to 100. As shown in Figure 7, as interrupt size increase s, CPUutilization of VNFs decreases. In our experiment, no experimentdrops any packet s. In othe r word s, it keeps a desired throughputof VNFs. Our results show that smartly choos ing the interrupt sizeallows us to save more resources.

3.3.2 CPU Utilization in Service Function Chain. In the above sub­sections, we have discussed th e CPU utili zation of sing le VNF inDPDK-based NFV platforms with different configurations. In thereal world, network functions will be chained to process the packetsassigned by users. The CPU utili zation of Service Function Chain(SFC) is also an important design cons idera tion. Due to th e im­balanced computation cost in SFC, th e downstr eam VNF will beoverloaded and it will drop the packets which have been processedin th e up stream NF. The resources used by the up stream NF isuseless if th e downstream NF has larger computation cost thanthe upstream VNF[ll] . Figure 8 shows such situation when wastedwork exists.

In order to avoid wasted work due to the imbalance of VNFs inSFC, we design and impl ement a local backpressure algorithm asshown in Figure 9. The Wakeup Thread chec ks the inform ationof the upstream VNF (Forward!DPI) Thread and the downstre amVNF (A es_encrypt). Once the queue size of Aes_encrypt is larger

Figure 9: Local backpressure diagram

than the maximum thr eshold, the ForwardlDPI Threa d will enterSLEEP status, and if the queue size of Aes_encrypt is less than themin threshold, the Wakeup Thread will immediately wak e up theForwardlDPI Thread and continue processing packets. In this way,we can guarantee the throughput of the whole SFC and reduce CPUutilization waste.

We test the CPU utilization of th e first network fun ction asshown in Table 3. In this case, we can also see lots of CPU utili zationsaved using backpr essure .

All above experiments aim to show the characteristic of the CPUutilization on DPDK-based NFV platforms. We mine those free CPUcycles of servers running VNFs and we deploy EDP applications onthe NFV platforms to share the precious computing resource. In thenext section, we will discuss how EDP can be deployed in the NFVplatform with the QoS guar antee of both the VNF and the EDP.

4 DEPLOYING EDGE COMPUTINGOur charac terization and evaluation results show that there are stilllots of resources wasted in traditi onal DPDK-based NFV platforms.How to utili ze those idle reso urces present s a grea t challenge. Anintuitive way to improve sys tem utili zation is to co-locate somedifferent applications on the same server [19]. Cons idering thatthere are a growing amount of EDP applications (e.g., image pro­cessing, video processing) actively looking for auxiliary computingresource suppo rt at the edge, it would be highly profitable if we

Page 7: Characterizing and Orchestrating NFV-Ready Servers for

Characterizing and Orchestr ating NFV-Ready Servers for Efficient Edge Data Processing IWQoS ' 19, June 24- 25, 2019, Phoen ix, AZ, USA

(1)

Table 3: In flu en ce of backpressure

Po lling Mode BackpresureService Drop Service Drop Forward DPI

Rate (Mpps) Rate (Mpps) CPU CPU(Mpps) (Mpps) Util. Util

64B 1.488 0.688 0.87 0.07 9% 20%128B 0.844 0.524 0.33 0.04 3% 7.6%256B 0.453 0.313 0.142 0.012 1.5% 3.3%512B 0.23 0.163 0.0668 0.008 0.7% 1.7%1024B 0.12 0.0877 0.0323 0.003 0.3% 1%

c:::> Core with VNF c:::> Core with EDP- ----------------- ---- --- ---- ----------- ,II---- - --- - --- - --- - - ~ I- - --- - - -- - --- - --- -~ I: : Edge Serve r I :: Edge Server 2 : :"~~"~~'I:,~~: ,~~::II~~I I~~II

::~~: :~~: :,L L _

~ ~~p~~~t~ ~~~~~~ J

:~----------.:_---------.:_-----------~~~::::::::::::::::::::::::::::::;:.:: : Edge Server I : : ' £4 e Server ]../ ::I, I I I,: I Core 0 Core 1 : I Core 0" , Core 1 : II: I : ' ... I I:~__~~~~ ~ !:~~_3__ J ~__~~e_: ~~~t J!~ -- -- -- ----- - ---~~-~~~~~g--- - --- - --- -- -_:

Figure 10 : Resource saving with core sharing

can smartly deploy th em on NFV platforms. In thi s case, we canjointl y improve network infrastru cture utili zation and reduce edgeserve r requirement , as shown in Figure 10.

In this paper, we propose EdgeMiner, a resource harvesting strat­egy for deploying EDP applications on flexible network infrastruc­tur e. Some prior works, which focus on the application co-locationin a data center, allow the latency-critical and batch applications toshare the memory system resources (e.g., Last Level Cache, DRAMbandwidth) [26, 39]. However, very few studies focus on sharingthe CPU cores for edge computation augmentation. Prior work [19]tri es to enable the sha ring of cores in a data center, but it mainl yemphas izes shar ing-aware dynamic voltage and frequency scaling(DVFS). Different from prior ar ts, EdgeMiner is a simple but effec­tive meth od that allows one to co-run packet processing tasks andEDP applications on share d cores .

Co-running EDP applications and VNFs is non-trivial. Due toth e tran smission latency, unfinished EDP applications canno t bemoved to the Cloud in the last minut e. These tasks have to respecta strict deadlin e and be well-managed. We cannot simply scheduleEDP applications only during the server's idle period, since thereexists limited CPU slack lime that can be exploited on current NFVplatforms. Thus, we design a sche duling algorithm to guaranteethe QoS of EDP applications.

4.1 Over view of EdgeMinerEdgeMiner can improve th e CPU util ization of edge servers aswell as meeting th e QoS of EDP applications. The overv iew ofEdgeMiner is show n as Figure 11. This framework mainly contains

three modules: EDP profiling, sha ring initialization , and dynamictuning. Import antly, we use a metric called QoS Urgency (QU) toevaluate the difficulty of achieving the QoS.

actua l runt imeQU = ---,---­time limit

As an EDP application approac hes the deadline, its QU becomeslarger. In this case, it often becomes more difficult to meet the targetQoS due to limited time and greatly increase d computing resourcerequirement. In the following subsections, we will discuss the threecompo nents in detail.

4.7.7 fOP Profi ling. In this component, we characterize commonEDP applications and record their approximate runtime. Thus, wecan estimate the CPU cycles needed for processing incoming edgeapplications, based on profiling results such as sizes of data. Addi­tionally, each EDP application also has its certain QoS requirementwhich we can consider it as the time limit. After profiling, we cancalculate the QUof each EDP application. This is the most importantfactor to cons ider in the following steps.

4.7.2 Sharing Initialization. In this compo nent, we decide how toco-run the NFs and the EDP applications. In Figure 11, we can seethat different cores and different EDP applications are labeled bydifferent colors. The core color show n in the figure represents differ­ent CPU utili zation used by VNFs. Similarly, different color of EDPapplications means different QU value. In the initialization ph ase,we sort the EDP applications by the QU and the CPU utili zation ofthe core. Afterwards, we co-run the least-QU EDP application withthe highest CPU util ization core; we also co-run the highest-QUEDP application with the lowest CPU utilization core. If there existstwo EDP applications that have the same QU, we sort them based onthe time limit. If the number of EDP applications is larger than thenumber of NFs, we will run th e EDP applications on the idle core .If the numb er of NFs is larger than the numb er ofEDP applications,there will be some cores only host VNFs. According to the aboveco-running mechanism, VNFs tha t exhibit th e lowest utili zationwill occupy a single core without shar ing. In this case, it will savepower due to the low CPU consumption.

4.7.3 Dynam ic Tuning. The above co-running mechani sm alonemay not meet EDP applications ' target QoS. In order to meet th eQoS requirement, we monitor the information ofEDP applicationsand check if EDP applications should move to the idle core or thecore which has more CPU resources . We called this mechanism asDyn amic Search Core (DSC), as shown in Algo rithm 1. It allowsEdgeMiner to efficiently tap into curre nt NFV-ready servers .

5 EVALUATIONIn this section, we firstly evaluate the performance ofEDP applica­tion s when co-ru nning with VNFs using batch-interrupt and th ebackpressure algorithm. Secondly, we demonstrate the performanceof our DSC algorithm.

5.1 WorkloadWe have introduced VNFs and the DPDK-based platform in Section3. Since we mainly focus on CPU utili zation, we select CPU-boundapplications as our benchm arks. We choose two applications from

Page 8: Characterizing and Orchestrating NFV-Ready Servers for

IWQoS '19, Jun e 24-25, 2019, Phoe nix, AZ, USA Lu Zhan g, Chao Li, Pengyu Wan g, Yunxin Liu, Yan g Hu, Ouan Chen , and Minyi Guo

Core 0

Dynam ic Tuning

EDP3

EDPI

IIIIIIIIL _

.-----------------------IIIIIII

~ :EDQoS

Monitor

S har ing In itia lization-- ----- ---~--- - ---- ---

EDP Profil ing

{ · ·· · · · ·· · · T~·~t··· ···· · ···~ :·· · ··· ·· Vid~· ··· · · · · :

: ~.~.~~r.~~~ ; ; ~.~~?:~~i.~.!? :

............................... ..........................:Image . . Speech

: ~.':??:~~.i.~.!? : ; ~.':??:~~i.~.!? .

J (r un-time, target QoS)p--- -- - -- -- -- -- - - ----- ---- ---- - - ,

I

Edge Data Processing :IIIII

f-----+!: ~===:IIIII

-- - - - - - - - - - - - - - ~ ~ - - - - - - - - - - - - - - _ :Core with

VNF-::_-_-_-......' Idle Core C PU Used by VNF

__--:l~~ increas ingIncreasing QU

Figure 11: Overview of EdgeMin er

Algorithm 1 Dynamic Search Core Algori thmDefinition:

Trea l : T im e rea ll y elaps ingTtar get : Longest ti me of Edge data pr ocessingTsingle : Run ti me of EDP ru nn ing in single coreTtrans : Ti me to change coreRat io, : Ratio of NF using th e it h core

I : fo r EDP is run ni ng w ith core} do2: Get Trea l3: Find th e CORESET w hose Ratio < (1 - QU)4: for corei. E CORESET do5: T _ (I-Rat iok)·Ttarget - Tsingle

tra ns - Rat io j Rat iov6: if Ttra ns > Treal then7: Conti nue8: else if Ttran s < Treal then9: rem ov e coree out of CORESET

10: else11: change EDP f rom core) to corek12: end if13: end for14: end for

co-ru nning combina tion. As the figure shows, the batch interruptscheme can steep ly degrade the EDP application's processing timecompared with the on-de mand int errupt mechan ism. It is evidentthat we can harvest mor e CPU resources on VNF-ready servers toprocess EDP applications.

5.3 Effectiveness of Back Pressure AlgorithmAs mention ed in section 3.3.2, local backpressure algo rithm cansave CPU utili zati on in SFC without affecting the performan ce ofnetwork services. In thi s expe riment we use backpressure algo­rithm to harvest CPU free cycles and measur e the perform ance ofEDP applica tions . As shown in Figure 13, th e normalized perfor­mance degradation of co-located EDP applications is less than 15%.Import antl y, our result s show that if we use the backpressure mech­anism, it only slightly delays EDP applicatio ns wh en the packetsize is large (e.g., more than 256B). In other words, we may providehigh -performan ce processing ofEDP applications depending onthe character istics of the netw ork packets.

Table 4: Edge processing workloads

Name Information Workloadvips Image processing Image wi 18K * 18K pixe lsx264 video enco ding 30fps /640 * 360 pixels

PAR-SEC [35]. The detailed inform ation of workloads is shown inTable 4. Both the two wo rkloads are popular in EDP applications :v ips is an image processing librar y and x264 is an app lica tion forencoding video streams int o the H.264 compression form at.

5.2 Effectiveness of Batch InterruptWe evalua te the impact of batch packet process ing mechani sm onEDP applications performance. We run EDP applications on the corethat has used by VNFs and record the EDP application's processi ngtime. Figure 12 shows the normalized processing time of different

5.4 QoS of Edge Data ProcessingAlthough batch int errupt and backpressure schemes provide a wayto exploit free computing resour ces, they still cannot provide satis­factory QoS guaran tee. To ensure bett er QoS ofEDP applications,it is imp ortant to carefully schedule computing resources. We usethe 90% QoS po licy (the QoS tar get performanc e is 90% of soloperformance) to evaluate our DSC algorithm. According to the 90%QoS policy, if the run -time degradation is less than 1.11x, there willbe no imp act on the QoS.

For the above experiment, when the EDP applications (both thex264 and vip s) co-ru n wi th Forward, all the degradati on of EDPapplications are less th an l.llx (the most is l. 08 wh en the packetsize is 64 an d Forward co-ru ns with x264). In thi s case, using th ebatch packet processing, we can guarantee the QoS of cert ain EDPapplications even without extra compensation. Moreove r, we testthe 95% QoS policy as well. Th e results are shown in Figure 14.From Figure 14, we can see that the DSC can achieve the QoS ofthe EDP applications (both 90%QoS policy and 95%QoS policy).

Page 9: Characterizing and Orchestrating NFV-Ready Servers for

Characterizing and Orchestr ating NFV-Ready Servers for Efficient Edge Data Processing IWQoS '19, June 24- 25, 2019, Phoen ix, AZ, USA

12~ OO" ",,,.~ Interru pt~ ~ 00,, _ . ,,"",,"" ~ 7

.~ 10 • Bat ch Int e r rupt .~ 6 DBatch Interrupt

& 8

~~ 5

~]

~~Ih]4

~~Ih~ 45 3§ 2

~ z;1

0128 256 512 1024

Pa cket Size( B)'tes)

6464 128 256 512 1024Packet Size(Dytes)

64 128 256 5 12 1024Packet Size( Bytes)

tr rj n-dema ud In lelTIII)1

• Bat ch Interru pt

n l ~ , Ih

o On-d emand Interru pt

64 128 256 512 1024Pa cke t Size( Bytes)

10 -,------------,

(a) Forward +x264 (b) Forward-vips (c) DPI+x264 (d) DPI+vips

Figure 12: The Influence of batch processing when co-running the NFs and the edge date processing. a) -d) represent differentco-location between the VNFs and EDP applications. Y-axis represents the time co-running with NF / the time when runningthe edge data processing in a single core.

(a) EDP co-run with Forwa rd

DPl+ x26Jthreading scheduling [13] and reducing cache miss [15]. All of priorworks focus on high performance of packet processing to meet thenetwork line rate. Th ey lack cons ider ing th e reso urce utilization .Excep t for these NFV accelera tion techniques, th e NFVnice [21]and Flurries [41] benefit from running multip le NFs in a single coreto increase th e core utili zation and make th e NFV system moreflexible. Th ese works are orthogo na l to our work and we mainl yfocus on the edge serve r to deploy EDP applications in the coreused by the NF to increase the core util ization .

(b) EDP co-run with DP!

1.2 -,-- - - - - - - - --,

6-1 128 2S6 512 102..Packet Si"le(h~tes)

0.95

·1 1.15... 1.1

"] 1.05..Ei

FflnHu tl+x26-1

... ...

6-1 128 256 5 12 102-1Pac kd Sizc( B~,tcs)

l.l18

" ~ 1.416

§1.0-1"]1.112

li

0.98

0.96

6.2 Edge Data Process ingTo reduce the latency, there is a trend to process data on the edgedevices close to the users. Li et al. [24] develop a susta inable in-situserve r sys tem to pre-process the raw data near where th e data islocated. However, this work mainly focuses on the energy manage­ment in the in-situ data center. Lane et al. [22] use a static model toper form deep learn ing recogniti on on loT devices; Song et al. [33]represent an autonomous and increment al computing frameworkand architecture for deep learning based Io'Tapplications. All theabove work mainly focuses on how to process the edge data in ahigh-performance or sustainable way. However, they don't considerhow to deploy EDP applications in the existed systems such as theNFV-Ready servers.

tHb yt es II 28bytes I 64byt es 1128bytes I 64bytes II28b)'te s I256b ytes I S12bytes

NDPl+ x264 FOlwal1~+x26-t NDPl+ x264Packet size and Coloca tion

: ~~ rr=;----------r=,--r:::::;:::;::;::O::=~:;;=il

107105

_ 103

~1~";: 97~ 95

939 1898785 t-'--'--'-r'--'--'-t-'--'--',-'--'--'''':'-'--'-r'--'--'-,-'--'--',-'--'--'-j

Figure 13: Normalized runtime when EDP applications co­run with VNFs using the backpressure algorithm

Figure 14: Using DSC algorithm to meet QoS of EDP

6 RELATED W ORKIn this section, we discuss represent ative prior studies in differentdomains that are most relevant to our work.

6.1 H igh-Performance NFV SystemMan y pri or work s aim to provide high-performan ce and flexibleplatforms for NFY. Prior work s [23, 29] enhance the indi vidu alNF performance to speed up the NFV's performance; Some works[17, 27, 42] focus on th e packet delivery between different NFsto reduce the performance degradat ion ; In works [14, 20], theyemploy heterogeneous platforms such as GPU to accelera te thepacket processing. There are also some works use the characteristicof computer architecture to accelerate the packet processing such as

6.3 Im proving Utilization of Data ProcessingThere have been many prior works on improving the QoS ofla tency­critical applications [26, 39]. These works leverage different mech­anisms to make the latency-critical app lications co-loca ted with abatch job on the same serve r to improve th e resource utili zation .It is critical to guarantee th e QoS of th e latency-crit ical applica­tions at the same time. Bubble-Up [26] and Bubble-Flux [39] boundperformance degradation while impr oving chip multiprocessor uti­lization . Additionally, HOPE [12] exploits management workloadsscheduling and applies grap h-base d task allocation to improve com­putation and reduce energy consumption. However, all these worksmainly focus on the interference between the latency-critical tasksand batch jobs or performance improv ing of tasks in a data center.Our work concentra tes on th e edge computing scena rio and th eco-running of VNFs and EDP applications.

Page 10: Characterizing and Orchestrating NFV-Ready Servers for

IWQoS ' 19, June 24-25, 2019, Phoenix, AZ, USA Lu Zhang, Chao Li, Pengy u Wang, Yunxin Liu, Yang Hu, Ouan Chen, and Minyi Guo

7 CONCLUSIONThe rapid growth of loT applications places demand on innovativedata processing infrastructure that are responsive, efficient , andscalable. In thi s paper, we study the CPU utili zation cha rac teris­tics of DPDK-based NFV platforms in detail to demonstrate theunde rutilization issue of servers in edge network. We show thatone can effectively harvest 13%-90%free CPU resources on DPDK­based platforms with batch interrup t mechanism and backpressurealgori thm. Using th e saved resources, we prop ose EdgeMiner, alight- weight edge processing framework that allows VNFs and EDPtasks to co-run on commodity servers. We also devise schedulingschemes that can guarantee the QoS of both type s of applications.The proposed design , which expands the breadth and impact of to­day's network infrastructure , has the potent ial to greatly contributeto building a more connected, smarter world.

8 ACKNOWLEDGEMENTThis work is partially sponsored by the National R&D Program ofChina (No. 2018YFB1004800). Yang Hu is supported by NSF grant1822985. Corresponding author is Chao Li.

REFERENCES[1] 2014. X10 Protocol. https://buil dyour smarthome.co/home-automation/pr otocols/

x10/.[2] 2014. Z-Wave. https://buildyoursm arthom e.co/h om e-aut om ation/protocols/

z-wave/.[3] 2016. STREAM Benchmark htt p://www.cs.virgini a.edu /str eam /FTP/Code/.[4] 2018. ZigBee Alliance. https:/ /www.zigbee.or g.[5] Azure . 2019. Azure Data Box fam ily. htt ps://azure.microsoft.com/en-us/services/

storage/da tabox /.[6] Intel. Corpora tion. 2017. In tel 64 and ia -32 archite cture s develop er 's man ­

ual.. https://www.int el.com /cont en t/www/u s/en /ar ch itecture-and - technology/64- ia-32-arc h itectur es- softwa re-develop er - manual-325462.h tml.

[7] Amir Vahid Dastjerdi and Rajkumar Buyya. 2016. Fog com puting: Helpin g th eIntern et of Things rea lize its po tential. Computer 49. 8 (2016), 112- 116.

[8] Har ish chandra Dubey,Jing Yang , Nick Cons tant, Amir Mohammad Amid , QingYang, and Kunal Makodi ya. 2015. Fog data: Enhancing telehealth big data throughfog computing. In Proceedings of ASE BigData & Sociaiinfo rmatics 2015. ACM, 14.

[9] Natha n Eddy. 2015. Gar tne r: 21 Billion loT Devices To Invade By2020. https://w w w.inform ati on week com /m ob ile/m obil e-d evices/ga rtner-21-billion - iot-devices- to- invade- by-2020/d/d-id/1323081.

[10] GSNFV ETSI. 2013. Network fun ction s virtual ization (nfv) : Architectural frame ­work ETsI Gs NFV 2. 2 (2013), VI.

[11] Joel Halp ern and Carlos Pigna taro . 2015. Service f unction chaining (sfc) architec­ture. Technica l Report.

[12] Yang Hu , Chao Li, Lon gju n Liu , and Tao L1. 2016. HO PE: Enablin g EfficientServic e Orche stration in Software-Defined Dat a Centers. In Proceedings of the2016 International Confe rence on Supercomputing (ICS '16). ACM. New York , NY,USA, Ar ticle 10, 12 pages. htt ps:/ /doi.org/10.1145/2925426.2926257

[13] Yang Hu and Tao Li. 2016. Tow ards efficien t se rve r architectur e for vir tualizednet work fun cti on deploym ent : Implications and implem en tati on s. In The 49thAnnual IEEE/A CM International Sym posium on Microarchit ecture. IEEE Press, 8.

[14] Yang Hu and Tao Li. 2018. Ena bling Efficien t Network Service Func tion Cha inDeploym en t on Heterogeneous Server Platfo rm. In High Perf orman ce ComputerArchit ecture (HPCA), 2018 IEEE International Sy mposium on. IEEE, 27-39.

[15] Yang Hu , Mingco ng Song, and Tao L1. 2017. Towards full conta inerization incon taine rized network func tion virtualization. A CM SIGOPS Operating SystemsReview 51, 2 (2017), 467-481.

[16] Yun Chao Hu, Milan Patel, Dario Sabella, Nurit Sprecher, and Valerie Youn g. 2015.Mobile edge computing - A key technology toward s 5G. ETSI whit e paper 11, 11(2015), 1-16.

[17] jinho Hwan g, K K_ Rama krishnan, and Timothy Wood . 2015. NetVM: high per­forman ce and flexible networking using virtualization on commodity plat forms.IEEE Transactions on Network and Service Management 12, 1 (2015), 34-47.

[18] Intel. 2012. Data Plane Deve lopmen t Kit. https://w ww.dpdkorg/.[19] Har shad Kastur e, Davide B Bar tolini, Na tha n Beckmann, a nd Dan iel Sanche z.

2015. Rubik: Fast analytical power managem ent for laten cy-criti cal sys tems. InMicroarchitecture (AfICRO), 2015 48th An nual IEEE/ACM Internat ional Symposiumon. IEEE, 598-610.

[20] Ioongi Kim , Keon j an g, Keunho ng Lee, San gwoo k Ma,J unhy un Shim, and SueMoon . 2015. NBA (netwo rk balanci ng act):A high-perform an ce pa cket processingframework for heterogeneou s processors. In Proceedings of the Tenth EuropeanConf erence on Computer Systems. ACM, 22.

[21] Sameer G Kulkarni , Wei Zhang ,Jinho Hwang, Shriram Rajagopalan, KK Ramakr ­ishnan , Tim othy Wood, Mayutan Arumaithura i, and Xiao ming Fu. 2017. Nfvn ice:Dyn ami c backp ressure and scheduling for nfv se rv ice cha ins. In Proceedings ofthe Conf erence of the ACM Special Interest Group on Data Comm unication. ACM,71-84.

[22] Nicho las D Lane , Sourav Bhatt acharya , Petko Georgiev, Claudio Forli vesi, andFahim Kaw sar . 2015. An ea rly resource cha ra cterization of deep learning onwea ra bles, smartpho nes and internet-of- thin gs devices. In Proceedings of the 2015internat ional workshop on internet of things towards appli cations. ACM, 7- 12.

[23] Boj ie Li, Kun Tan , Layon g Lar ry Luo, Yanqing Pen g, Renqian Luo, Nin gyi Xu ,Yongqian g Xion g, Peng Che ng, and Enhong Chen. 2016. Clicknp: Highly flexibleand high pe rforma nce network processin g with reconfigurable hard war e. InProceedings of the 2016 ACM SIGCOMM Conf erence. ACM, 1-14.

[24] Chao Li, Yang Hu , Lon gjun Liu , Ju ncheng Gu, Mingco ng Song , Xia oyao Lian g,Jinglin g Yua n, and Tao L1.2015. Towards sustainable in-situ serv er systems inthe big data era. In ACM SIGARCH Computer Archi tectur e New s, Vol. 43. ACM,14-26.

[25] Chao Li, Yushu Xue,Jing Wang, Weigon g Zha ng , and Tao Li. 2018. Edge-O rientedComputing Paradigm s: A Sur vey on Arch itecture Design and System Mana ge­ment. ACM Computing Surveys (CSUR) 51, 2 (2018), 39.

[26] Jason Mar s, Lingjia Tang, Robert Hundt, Kev in Skadro n , an d Mar y Lou Soffa.2011. Bubble-up : Increasing utili zati on in mod ern warehouse sca le computers viasensible co-loc ations. In Proceedings ofthe 44th annual IEEE/ACM InternationalSy mposium on Microarchitecture. ACM, 248- 259.

[27] joao Mar tins, Moham ed Ahmed, Cos tin Raiciu, Vladim ir Oltea nu , Michio Ho nda,Roberto Bifulco, and Felipe Hu ici. 2014. ClickOS and the art of network fun ctionvirtualization . In Proceedings ofthe 11th USENIX Conf erence on Networked SystemsDesign and Implementation. USENIX Associat ion, 459-473.

[28] nt op . 2015. PF_RING zc. http://ww w.n top.org/p rod ucts/p Cringi pCring-zc-zero-copy /.

[29] Auro j it Panda , Sangj in Han , Keon Jan g, Me lvin Walls, Sylvia Rat na samy, a ndScott Shenker. 2016. NetB rick s: Taki ng the V out ofNFV.. In OSDI. 203- 216.

[30] Rohit Mehra Rajesh Ghai, Petr Ji rovsk y. In . d.]. Worldwid e vCPE/uCPE For e­cas t, 20173A$202 1: NFV at the Netwo rk Edge. http s://www.idc.coml getdoc.jsp?containerId=US41429616.

[31] Luigi Rizzo. 2012. Netm ap : a novel fram ewo rk for fast pa cket I/O. In 21st USENIXSecurity Symposium (USENIX Security 12).1 01-11 2.

[32] Mahadev Satyanara yana n, Victo r Bahl, Ramon Caceres , and Nigel Dav ies. 2009.The case for vm-based cloudlets in mobile comp uting. IEEEpervasive Computing(2009).

[33] Mingco ng Song, Kan Zhong , Ji aqi Zhang , Yang Hu , Du o Liu, Weigo ng Zhang,Jin g Wang, and Tao L1. 2018. In-Situ AI: Towards Autonomous and Increment alDeep Learning for loT Systems. In 2018 IEEE InternatiOnal Sy mpOsium On HighPerfOrmance COmputer Archit ecture (HPCA). IEEE, 92- 103.

[34] Bo Tang, Zhen Chen, Gera ld Heffe rman , Tao Wei, Haibo He, and Qing Yang. 2015.A hierar ch ical distr ibuted fog computing arc hi tec ture for big dat a ana lysis insmart cities. In Proceedings ofASE BigData & Sociaiinf ormatics 2015. ACM, 28.

[35] Prin ceton Univ ersity. 2010. PARSEC. h ttp :/ /parsec.cs.princeton .edu/ .[36] Aas en Wang, Lizhon g Che n , and Wenyao Xu. 2017. XPro : A cros s-end pro cess­

ing archite ctur e for da ta ana lytics in weara bles. In ACM SIGA RCH ComputerArchit ecture News, Vol. 45. ACM, 69-80.

[37] Dale Willi s, Arkodeb Dasgup ta , and Suma n Banerjee . 2014. Pa raD rop : a mu lt i­tenant plat form to dynamically install third party services on wireless gateways.In Proceedings of the 9th ACM workshop on Mob ility in the evolving internetarchitecture. ACM, 43-48.

[38] Yi Xu a nd Sumi Hela l. 2014. Applica tio n caching for clo ud-sensor sys tems. InProceedings of the 17th ACM international confe rence on Modeling, analys is andsimulation of wireless and mobile sys tems. ACM, 303-306.

[39] Hailon g Yang, Alex Breslow, Jason Mars, and Lingj ia Tang. 2013. Bubb le-flux:Precise online qos management for increased utilization in warehouse scalecomputers. In SIGARCH Computer Archit ecture News, Vol. 41. ACM, 607-618.

[40] Yishay Yove!. 2017. Why NFV is Long on Hyp e, Sho r t on Value. http s://w ww.catone tworks .com/blog/why- nfv- is- lon g-on- hyp e- short-on- value/ .

[41] Wei Zha ng, j inh o Hwa ng, Sh riram Rajagopalan , KK Ramakrishn an, and Tim othyWood . 2016. Flurries: Countless fine-grain ed nfs for flexible per-flow customiza­tion . In Proceedings of the 12th Internat ional on Conf erence on emerging NetworkingEXperim ents and Technologi es. ACM, 3-17.

[42] Wei Zhang , Guyue Liu, Wenhui Zha ng , Neel Sha h , Phillip Lopreiat o, Grego ireTodeschi, KK Ramakrishnan, and Tim othy Wood. 2016. OpenN etVM: A platfo rmfor high performan ce network service chains. In Proceedings of the 2016 workshopon Hot topics in Middleboxes and Network Function Virtualization. ACM, 26- 31.