project no: 644312 · ref. ares(2015)2565621 - 18/06/2015. d2.1 application analysis and system...
TRANSCRIPT
D2.1 Application analysis and system requirements
Page 1 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Project No: 644312
D2.1 Application analysis and system requirements
June 16, 2015
Abstract:
This deliverable describes the application analysis and the system requirements for the selected use-cases: a real-time
face recognition engine, a hand tracking software, and an antivirus application. A brief introduction of the three use-
cases is initially presented followed by the performance/bandwidth requirements, and runtime library dependences.
Finally, the document concludes with a detailed review of the validation scenarios.
Document Manager
David Oro Herta Security
Document Id N°: rapid_D2.1 Version: 2.4 Date: 16/6/2015
Filename: rapid_D2.1_v2.4.docx
Confidentiality
This document contains proprietary and confidential material of certain RAPID contractors, and may not be
reproduced, copied, or disclosed without appropriate permission. The commercial use of any information
contained in this document may require a license from the proprietor of that information.
Ref. Ares(2015)2565621 - 18/06/2015
D2.1 Application analysis and system requirements
Page 2 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
The RAPID Consortium consists of the following partners:
Participant No. Participant Organisation Names Short Name Country
1 Foundation of Research and Technology Hellas FORTH Greece
2 Sapienza University of Rome UROME Italy
3 Atos Spain S.A. ATOS Spain
4 Queen's University Belfast QUB United
Kingdom
5 Herta Security S.L. HERTA Spain
6 SingularLogic S.A. SILO Greece
7 University of Naples "Parthenope" UNP Italy
The information in this document is provided “as is” and no guarantee or warranty is given that the
information is fit for any particular purpose. The user thereof uses the information at its sole risk and
liability.
Revision history
Version Author Notes
0.5 Javier Vera Initial version
1.0 David Oro Updated introduction and description of use-case apps
1.1 David Oro Added benchmarks
1.2 Iakovos Mavroidis Initial hand tracking use-case description
1.3 David Oro Proofreading and minor format/spacing changes
1.4 Nikolaos Kyriazis Description of the hand tracking application completed
1.5 Giorgos Vasiliadis Added description of the antivirus application
1.6 Carles Fernández Final formatting and homogenization of sections
1.7 Nikolaos Kyriazis Added required CUDA calls for hand tracking application
1.8 Javier Vera Added required CUDA calls and assembly
1.9 Carles Fernández Incorporated changes from SILO
2.0 Iakovos Mavroidis Added formal list with requirements from Herta and FORTH
2.1 David Oro Reformatted the text and modified table spacing
2.2 David Oro Text modification to ensure quality compliance
2.3 Iakovos Mavroidis Text modification to ensure quality compliance
2.4 Fco. Javier Nieto Modifications in requirements tables according to comments
D2.1 Application analysis and system requirements
Page 3 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Table of Contents
1. Introduction .................................................................................................................................. 5
1.1. Glossary of Acronyms ............................................................................................................. 6
2. BioSurveillance Application ........................................................................................................ 8
2.1. Client-Server Mode Analysis ................................................................................................... 9
2.2. Requirements ......................................................................................................................... 10
3. Kinect 3D Hand Tracking Application ...................................................................................... 12
3.1. Standalone Mode Analysis .................................................................................................... 13
3.2. Client-Server Mode Analysis ................................................................................................. 14
3.3. Requirements ......................................................................................................................... 16
4. Antivirus Application ................................................................................................................ 18
5. Application Requirements and Validation Scenarios ................................................................ 20
References .......................................................................................................................................... 28
List of Figures Figure 1: RAPID infrastructure ............................................................................................................... 5
Figure 2: Tegra K1 development board ................................................................................................... 6
Figure 3: Tegra X1 development board ................................................................................................... 6
Figure 4: Face recognition pipeline ......................................................................................................... 8
Figure 5: Face recognition use-case ....................................................................................................... 10
Figure 6: Graphical illustration of the proposed method. A Kinect RGB image (a) and the
corresponding depth map (b). The hand is segmented (c) by jointly considering skin color and depth.
The proposed method fits the employed hand model (d) to this observation recovering the hand
articulation (e). ....................................................................................................................................... 12
Figure 7: A basic schematic of the manually derived client-server decomposition of the 3D hand
tracking application................................................................................................................................ 14
Figure 8: Antivirus application. Files are mapped onto pinned memory that can be copied via DMA
onto the graphics card. The matching engine performs a first-pass filtering on the GPU and return
potential true positives for further checking onto the CPU. .................................................................. 19
D2.1 Application analysis and system requirements
Page 4 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Executive Summary
The RAPID project aims to transparently offload computer-intensive kernel task computations from
low-power devices such as tablets and/or smartphones to a remote cloud populated with high-
performance multicore CPUs and GPUs. Under this scenario, it is expected that devices designed for
power envelops as greater as 150W will run on a datacenter which is not power-constrained. From a
user-level perspective, the main goal of the project is thus to build a state-of-the-art acceleration as a
service (AaaS) platform with QoS capabilities on top of a virtualized pool of nodes.
This deliverable describes the application analysis requirements of three use-cases developed by
HERTA and FORTH research partners. These use-cases were carefully designed to test and determine
how the full-blown capabilities of the RAPID infrastructure would perform under real-world
workloads. HERTA provides a GPU-based face recognition application called BioSurveillance that
works with standard IP or USB cameras. The design of such software is scalable in bandwidth,
throughput, power, and I/O requirements. Even with an embedded device such as the NVIDIA Tegra
K1 or NVIDIA Tegra X1, the face processing engine is capable of yielding real-time performance.
On the other hand, FORTH provides two innovative applications. The first one relies on a Kinect 3D
depth sensor to provide an interactive graphical hand tracking experience. As such, this task is fully
parallelizable, and thus could also benefit from offloading computer-intensive operations to the
RAPID cloud. To conclude, the second application consists of an antivirus software that performs
pattern matching also in parallel using GPUs.
D2.1 Application analysis and system requirements
Page 5 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
1. Introduction
RAPID will develop an efficient heterogeneous CPU-GPU cloud computing infrastructure, which can
be used to seamlessly offload CPU-based and GPU-based tasks of applications running on low-power
devices ranging from smartphones, notebooks, tablets, portable/wearable devices, robots, and cars to
more powerful devices over a heterogeneous network (HetNet) such as NVIDIA’s Tesla GPUs or Intel
Core i7 multicore CPUs. The idea behind the abovementioned infrastructure is to dynamically off-load
computer-intensive applications that cannot be executed on embedded platforms due to the
performance limitations derived from the power constraints to a remote cloud infrastructure populated
with both powerful CPUs and GPUs.
Figure 1: RAPID infrastructure
As it is depicted in Figure 1, the RAPID infrastructure consists of an embedded platform (colored in
blue) and a remote cloud (colored in green). Typically, the embedded platform is powered by a system
on chip (SoC) microarchitecture. This platform usually integrates a low-power multicore CPU paired
with a GPU with low core counts. Additionally, it also integrates a hardware video decoder and an I/O
interface (e.g. Ethernet 802.3 or 802.11). These hardware blocks are included on the same die and
typically dissipate less than 5 Watts.
On the other hand, the devices used in the cloud are high-end CPUs and GPUs with high core counts
and double precision floating point capabilities with a power envelope of roughly 150W.
Unfortunately, the high-end computing devices are just simply impossible to be powered by a battery.
Therefore, the RAPID approach to deal with this issue is to identify the most power-intensive tasks
implemented using multithreaded data-parallel kernels with annotations, and then offload them to the
heterogeneous CPU/GPU cloud. Recent advances in equipment and link-level protocols that leverage
Wi-Fi and fiber optic networks enable low-latency and high-bandwidth communications between the
embedded devices and the remote pool of high-performance CPUs and GPUs.
The selected embedded platform will be powered either by a NVIDIA’s Tegra K1 or X1 SoC (see
Figure 2 and 3). These families of chips feature respectively a general-purpose quad-core or eight-core
ARM Cortex CPU with a 192 or 256 CUDA-enabled GPU. Unlike OpenCL, CUDA-enabled GPUs
are available only from NVIDIA. The decision of restricting the scope of supported GPUs to this
Low-end CPU + GPU
Video Decoder
Ethernet I/O
GPU CPU
CPU Network GPU
CPU GPU
Embedded Platform Remote Cloud
D2.1 Application analysis and system requirements
Page 6 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
manufacturer is due to the fact that it provides the best driver and software stack currently available in
the market. Portability of kernel code between embedded NVIDIA and server/workstation GPUs is
guaranteed by the CUDA API and related framework libraries.
The cloud infrastructure will be populated by virtualized multicore Intel Core i7 CPUs and NVIDIA
GeForce GTX 980 GPUs. Unlike the GPU code, the compute-intensive CPU host code needs to be
either manually ported or recompiled, since the embedded CPUs and the cloud platform CPUs employ
a different ISA.
Figure 2: Tegra K1 development board
Figure 3: Tegra X1 development board
1.1. Glossary of Acronyms
Acronym Definition
CO Confidential
D Deliverable
DMP Data Management Plan
DoA Description of the Action
EC European Commission
EU European Union
GA Grant Agreement
PU Public
SVN Subversion
D2.1 Application analysis and system requirements
Page 7 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
WP Work Package
CPU Central Processing Unit
GPU Graphics Processing Unit
SoC System on Chip
CUDA Compute Unified Device Architecture
ISA Instruction Set Architecture
V4L Video For Linux
API Application Program Interface
RTSP Real Time Streaming Protocol
SIMD Single Instruction Multiple Data
SSE Streaming SIMD Extensions
CCTV Closed-circuit Television
FPS Frames Per Second
D2.1 Application analysis and system requirements
Page 8 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
2. BioSurveillance Application HERTA’s BioSurveillance software is designed for performing unconstrained real-time face
recognition using standard CCTV cameras. This face recognition software is implemented in a
pipeline that can be split into several stages: video decoding, face detection, feature extraction and
template matching.
Figure 4: Face recognition pipeline
As shown in Figure 4, the pipeline starts from an input video stream. This video feed is usually
broadcasted from a surveillance IP camera or a high-definition webcam using the H.264 video codec.
The software automatically handles both the container demuxing and transport layer decoding (e.g.
RTSP protocol) so it can work with any surveillance camera manufacturer just by specifying the IP
address in the command line. Similarly, it also works with any USB webcam through the usage of the
Video for Linux (V4L) API. For the webcam scenario, video decoding is not required since USB
webcams usually broadcast uncompressed video frames. If required, the parsed H.264 frames are sent
to the video decoding stage for further processing.
Video decoding is conducted on the Tegra chip using the on-die hardware video decoder by leveraging
the OpenMAX IL abstraction API provided by NVIDIA. Then the decoded H.264 slices are sent in
NV12 format to the face detection stage where the location of faces are determined. This latter stage is
performed on the GPU cores available in the Tegra K1/X1 by leveraging the CUDA parallel
programming model. Both the (X,Y) coordinates and size (MxN) of each detected face are packed and
then sent to feature extraction stage which analyze multiple face regions with the purpose of building a
summarized template that characterizes dimensions. Finally, the template matching required for the
classification stage is computed to determine the similarity between histograms.
The face recognition use-case application has several dependencies with third party libraries and APIs.
GPU offloading is also achieved through the usage of the NVIDIA CUDA programming model.
Specifically, the application requires at least CUDA 6.5 for launching and executing data parallel
kernel operations on GPU cores. Additionally, OpenCV (www.opencv.org) is required for managing
the input camera/video and show the results (e.g. detected/recognized faces) on the screen.
Advanced CUDA features such as managed memory transfers or allocations, shuffle instructions and
pinned host memory are used in BioSurveillance. For this reason, a GPU board with Compute
Capability 3.0 or greater is required to correctly offload computations.
Input
Video Decoding Face
Detection Feature
Extraction Template Matching
D2.1 Application analysis and system requirements
Page 9 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
2.1. Client-Server Mode Analysis
The face recognition BioSurveillance application is highly sensitive to variations in network activity
as it has to send face templates to the GPU cloud in order perform the matching process. This latter
step must conduct the comparison between extracted face templates from the input video and
templates of the subjects stored in a database. Table 1 illustrates the latencies for each step of the face
recognition pipeline for both an Intel Core i7 and a Tegra K1 SoC using a USB webcam as an input.
These statistics were gathered from the ARM CPUs by carefully setting the clock frequency to the
maximum performance profile available in the Linux Kernel (i.e. /sys/devices/system/cpu/).
The video decoding step was performed using a pure software decoding engine based on the
FFmpeg/Libav open source projects [1][2]. For this specific test, GPU-based face detection was also
intentionally disabled in order to accurately determine the slowdown/speed up (S) between a desktop
CPU (Intel Core i7) and a low-power CPU (Tegra K1).
In both architectures, the usage of SIMD instructions (e.g. SSE and NEON) was enabled to fully
exploit the underlying vector extensions available on CPUs and thus increase the performance of
floating point operations. These extensions were enabled simply by setting the –msse4.2 -
mfpmath=sse and –mfpu=neon respectively for both Intel x86-64 and ARMv7 architectures.
Step Intel Core i7 (ms) TEGRA K1 (ms) S
Video Decoding 4.3 24.1 5.6
Face Detection 8.2 26.3 3.2
Feature Extraction 6.9 58.6 8.5
Template Matching 23.3 58.2 [Local CPU] 2.4
5.0 [Remote GPU] 0.21
Table 1: Face recognition pipeline latency comparison between Intel Core i7 and Tegra K1
Bandwidth measurements were performed using the open source Wireshark [3] packet analyzer engine
(see Figure 5). Under the RAPID architecture, it is expected to perform the face matching process on
the remote cloud. In order to simulate this environment, a customized CPU client software was
developed by targeting the Tegra K1 architecture for performing video decoding, face detection and
facial feature extraction. The final step of template matching was performed on a remote host
equipped with an NVIDIA Quadro K2200 GPU card. Therefore, the extracted templates were sent to
the remote host using TCP sockets.
Under this scenario, the average obtained latency for GPU-based template matching was 5
milliseconds. These results mean that the remote Quadro K2200 GPU was10 times faster than the
ARM CPU implementation, and 4.6 times faster than an Intel Core i7 including memory and network
transfers. Regarding bandwidth requirements, the application sent on average a given TCP packet each
0.017 ms over a wired Gigabit Ethernet link. It should be noted that this experiment was conducted by
attaching an USB webcam to the Tegra K1 board. Finally, the obtained average frame rate was
D2.1 Application analysis and system requirements
Page 10 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
increased from 6 FPS to roughly 9 FPS (1.5X speed up) by transparently offloading template matching
computations to the remote GPU.
Figure 5: Face recognition use-case
These preliminary results show great potential for the remote GPU offloading techniques to be
developed in RAPID, as the minimum tolerated latency rates for commercial face recognition usually
ranges between 500 and 1000 ms.
2.2. Requirements
The current implementation of client-server BioSurveillance has the following technical requirements:
Client: NVIDIA Tegra TK1 or TX1
Server: NVIDIA GTX 750 Ti or better
CUDA Compute Capability 3.0
CUDA Pinned memory
CUDA Managed memory
More concretely, current implementation requires support for following CUDA runtime calls and
assembly instructions:
CUDA calls runtime API required by BioSurveillance
cudaCreateTextureObject cudaMallocPitch
cudaCreateChannelDesc cudaMemset2D
cudaFree cudaMemcpy2D
cudaMallocManaged cudaStreamDestroy
cudaStreamAttachMemAsync cudaStreamCreateWithFlags
Intrinsics required by BioSurveillance: (CC >= 3.0)
D2.1 Application analysis and system requirements
Page 11 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
__shfl_down which calls assembly: asm volatile ("shfl.down.b32 %0, %1, %2, %3;" : "=r"(ret) :
"r"(var), "r"(delta), "r"(c));
__shfl_xor which calls assembly: asm volatile ("shfl.bfly.b32 %0, %1, %2, %3;" : "=r"(ret) : "r"(var),
"r"(laneMask), "r"(c));
The minimum and optimal performance of the application in terms of latency and FPS are contained in
the following table, based on the conducted measurements. The minimum values are based on strictly
necessary requirements for successful commercial implementations (typically one second of maximum
delay), whereas we consider as optimal the values that would reproduce the same results that were
measured in this single client, single server architecture.
Minimum performance Desired performance
Latency 1000 ms 115 ms
FPS 6 FPS 9 FPS
Required upstream bandwidth 1.0Mbps 1.5 Mbps
Required downstream bandwidth 36 Kbps 55 Kbps
Table 2: Minimum and optimal performance for a database of 192 templates.
The bandwidth requirements are computed as follows: given the necessary framerate, the amount of
data to be sent to the server is simply the size of each template (21,600 bytes), and the amount of data
to be received corresponds to floating point score values for all the subjects in the database. Given that
all measurements have been carried out with 192 enrollee images in the database, this represents 768
bytes for each template matching step. Multiplying these data sizes by the FPS yields the required
bandwidth.
D2.1 Application analysis and system requirements
Page 12 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
3. Kinect 3D Hand Tracking Application
The 3D tracking of articulated objects is a theoretically interesting and challenging problem. One of its
instances, the 3D tracking of human hands has a number of diverse applications including but not
limited to human activity recognition, human-computer interaction, understanding human grasping,
robot learning by demonstration, etc. Towards developing an effective and efficient solution, one has
to struggle with a number of complicating and interacting factors such as the high dimensionality of
the problem, the chromatically uniform appearance of a hand and the severe self-occlusions that occur
while a hand is in action. To ease some of these problems, some very successful methods employ
specialized hardware for motion capture [4] or the use of visual markers [5]. Unfortunately, such
methods require a complex and costly hardware setup, interfere with the observed scene, or both.
Several attempts have been made to address the problem by considering only marker less visual data.
Existing approaches can be categorized into model- and appearance-based. Model-based approaches
provide a continuum of solutions but are computationally costly and depend on the availability of a
wealth of visual information, typically provided by a multi-camera system. Appearance-based
methods are associated with much less computational cost and hardware complexity, but they
recognize a discrete number of hand poses that correspond typically to the method’s training set.
The input to the proposed method (see Figure 6) is an image acquired using the Kinect sensor, together
with its accompanying depth map. Skin color detection followed by depth segmentation is used to
isolate the hand in 2D and 3D. The adopted 3D hand model comprises of a set of appropriately
assembled geometric primitives. Each hand pose is represented as a vector of 27 parameters. Hand
articulation tracking is formulated as the problem of estimating the 27 hand model parameters that
minimize the discrepancy between hand hypotheses and the actual observations. To quantify this
discrepancy, we employ graphics rendering techniques to produce comparable skin and depth maps for
a given hand pose hypothesis. An appropriate objective function is thus formulated and a variant of
PSO is employed to search for the optimal hand configuration. The result of this optimization process
is the output of the method for the given frame. Temporal continuity is exploited to track the hand
articulation in a sequence of frames.
Figure 6: Graphical illustration of the proposed method. A Kinect RGB image (a) and the corresponding
depth map (b). The hand is segmented (c) by jointly considering skin color and depth. The proposed
method fits the employed hand model (d) to this observation recovering the hand articulation (e).
The most computationally demanding part of the proposed method is the evaluation of a hypothesis-
observation discrepancy where a hypothetical image is compared against the actual observed image.
This computation involves rendering, pixel-wise operations between an observation and a hypothesis
map and summation over the results. We exploit the inherent parallelism of this computation by
performing these operations on a GPU. Hardware instancing is employed to accelerate the rendering
process, exploiting the fact that the hand model is made up of transformed versions of the same two
D2.1 Application analysis and system requirements
Page 13 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
primitives (a cylinder and a sphere). The pixel-wise operations between maps are inherently parallel
and the summations of the maps are performed efficiently by employing a pyramidal scheme. More
details on the GPU implementation are provided in previous works [6].
3.1. Standalone Mode Analysis
Kinect is a motion sensing input device by Microsoft for the Xbox 360 video game console and
Windows PCs. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables
users to control and interact with the Xbox 360 without the need to touch a game controller. Recently,
there is an increasing research interest on pattern recognition (with emphasis on head pose and gesture
recognition) applications [7][8][9][10] due to the novel Kinect depth sensor. The Kinect 3D hand
tracking by FORTH has international recognition (Microsoft has shown a lot of interest) and it is
considered to be one of the state-of-the-art 3D hand tracking software [11].
While several applications have used Kinect 3D cameras there is still an important barrier for the wide
adoption of this technology in robotics domain. The main problem is that these applications require
tremendous processing power and memory, and associated energy. For example, Kinect 3D hand
tracking software can derive the following performance results depending on the system configuration:
Tracking FPS=1.73: CPU: Pentium(R) Dual-Core CPU T4300 @ 2.10GHz with 4096 MBs of
RAM, GPU: GeForce GT 240M with 1024 MBs of RAM
Tracking FPS=2.15: CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz with 4096 MBs of
RAM, GPU: GeForce 9600 GT with 1024 MBs of RAM
Tracking FPS=2.66: CPU: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz with 4096 MBs
of RAM, GPU: Quadro FX 1600M with 256 MBs of RAM
Tracking FPS=19.94: CPU: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz with 6144 MBs of
RAM, GPU: GeForce GTX 580 (www.geforce.com) with 1536 MBs of RAM,
Using large, high-end power-hungry servers for low-power robots is not an attractive approach.
RAPID aims to solve both the high-energy and the low-performance issues. Within the RAPID
project, we will target to reach 30 FPS, which is the maximum frame rate the Kinect cameras support,
by offloading all the compute-intensive tasks to the RAPID-based accelerator. However the resulting
FPS measure depends on the performance of the server, the performance of the client, the network and
the RAPID infrastructure. The Kinect 3D hand tracking software exhibits huge parallelism (even in
the processing of a single frame) that can be exploited by using many cloud GPUs in parallel. In this
way, RAPID is expected to bridge the gap between 3D pattern tracking applications and robotics
domain.
D2.1 Application analysis and system requirements
Page 14 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
3.2. Client-Server Mode Analysis
ObservationCompute
bounding box
Preprocessing
Upload preprocessed observations
Compute pose
ServerClient
Visualization
Figure 7: A basic schematic of the manually derived client-server decomposition of the 3D hand tracking
application.
The steps comprising the 3D hand tracking algorithm can be grouped in fundamental modules of
operations, as shown in Figure 7. A brief walkthrough of the presented flowchart, depicting the 3D
hand tracking loop, follows.
Observation: Whenever acquisition from the camera is required as an input to some module,
images (RGB image and a depth map) are captured and stored. The output comprises a RGB
image of 480x640x3 bytes (per pixel byte triplet) and a depth image of 480x640x2 bytes (per
pixel unsigned short).
Compute bounding box: From the tracking solution of the previous frame produced by
Compute pose and according to the assumption for temporal continuity a region of interest
(ROI) is formulated in the vicinity of this solution. Plainly stated, the next solution is expected
to be in the vicinity of the previous one. The ROI amounts to a rectangular area defined as the
2D bounding box of the back projected hand tracking solution for the previous frame. The
bounding box is padded with extra space so as to also account for potential motion outside the
tight bounds of the rendering for the previous solution.
Preprocessing: Features are extracted from observations. To implement the required focus,
feature extraction only occurs on images produced by Observation within the 2D bounding
box computed by Compute bounding box. The features comprise a pair of 64x64 images, i.e.
normalized extracted hand silhouette (64x64 bytes) and extracted hand depth measurements
(64x64x2 bytes).
Upload preprocessed observations: The computations involved in finding the next 3D hand
pose are parallel and are mostly performed on a GPU (3D rendering, GPGPU). Thus,
D2.1 Application analysis and system requirements
Page 15 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
observations provided by Observation and other information (e.g. camera calibration) is
uploaded to the GPU for further processing.
Compute pose: Given the observations uploaded by Upload preprocessed observations and
the tracking solution for the previous frame, thousands of hand pose hypotheses are made, 3D
rendered and evaluated against the observations, in order to find the most compatible
hypothesis. This best hypothesis is dubbed the solution for the current frame. This is the
computationally heaviest step of the presented pipeline. Output from this module signifies the
end of the current loop and the start of the next one, i.e. acquiring observations anew,
preprocessing, etc.
Visualization: The images input from Observation and the best matching 3D hand pose
computed by Compute pose are fused into a single visualization. In this visualization, a 3D
rendering of the hand pose is superimposed on the RGB image.
While the presented method can be used for offline processing, the most relevant variant to RAPID
and the most common and thus interesting use of 3D hand tracking is real-time application. In real-
time application a camera is used to capture the motion of a subject and produce a hand tracking
solution at interactive frame rates. In the following, we differentiate between two rates, tracking rate
and loop rate. Tracking rate regards the speed at which Compute pose is processed. Loop rate regards
the rate at which the entire graph is processed (including tracking rate). While tracking rate should be
considered fixed, as it would require a significant amount of work to accelerate even more, loop rate
comes with significant room for acceleration, through interleaving.
The following numbers are indicative and regard a platform with the following specifications: Intel
Core i7 CPU 950 @ 3.07 GHz with an NVIDIA GTX 970 GPU.
In the standalone version of 3D hand tracking, real-time execution yields the following rates:
Tracking rate: 28.7 FPS
Loop rate: 26.4 FPS
In the client-server version of 3D hand tracking, with both client and server running on the same
machine, real-time execution yields the following rates:
Tracking rate: 28 FPS
Loop rate: 21 FPS
The two applications have an interesting difference in their rates. In the standalone version the
tracking rate is slightly higher than the corresponding one in the client-server version. This is due to
the fact that in the client-server version GPU resources are shared between the client and the server
processes, through time slicing, which is efficient but it still incurs some overhead. The client-server
loop-rate is expectedly slower than the corresponding rate of the standalone version due to the
interprocess communication overhead. Executing the server on the described machine and moving the
execution of the client to a different machine yields the same tracking rate and varying loop rates,
which are affected by the throughput and lag of the connecting network and the processing power of
the client machine. Indicatively, running the client on a 100Mbps intranet and a client host with
similar specifications to the server yields the same results. Running the client on a laptop, over the
D2.1 Application analysis and system requirements
Page 16 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
internet and across countries (server at Heraklion, Greece and client at Amsterdam, the Netherlands)
yields a loop rate of around 5 FPS.
3.3. Requirements
The current implementation of client-server 3D hand tracking has the following technical
requirements:
1 or 2 machines to execute the server and the client.
TCP/IP network connecting the server and the client.
o Remote Procedure Call ability.
Server
o A multi-core CPU
o A CUDA-enabled GPU with the runtime supporting the instruction set of Table 3.
Client
o A multi-core CPU
o Optionally a CUDA-enabled GPU, for better performance, with the runtime
supporting the instruction set of Table 3.
o A RGBD camera or a stored RGBD video.
To achieve the real-time figures presented in the previous section the baseline specifications are as
follows:
CPU: Intel Core i7 CPU 950 @ 3.07 GHz or better.
GPU: NVIDIA GTX 970 or better:
o Number of cores: 1664
o Clock frequency: 1050MHz
o Memory frequency: 7Gbps
o Memory interface: 256bit
o Memory bandwidth: 224GB/s
o A better GPU would be one with higher frequency figures rather than more cores.
Network: The payload (w/o TCP/IP overhead) that needs to be transferred amounts to:
o Client Server: 20696 bytes per frame = 620880 bytes/s(30 FPS) = 5Mbps
RGB: 64x64x3 bytes
Depth: 64x64x2 bytes
Tracking solution: 27x8 bytes
o Client Server: 216 bytes per frame = 6480 bytes/s (30 FPS) = 0.05Mbps
Tracking solution: 27x8 bytes
o The lower the lag the greater the loop rate. The effect of lag in loop rate can be
improved with pipelining.
D2.1 Application analysis and system requirements
Page 17 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
cudaThreadSynchronize
__cudaRegisterFatBinary
__cudaUnregisterFatBinary
cudaGetErrorString
cudaMemcpy
__cudaRegisterFunction
cudaConfigureCall
cudaD3D9SetDirect3DDevice
cudaDeviceReset
cudaFree
cudaFuncGetAttributes
cudaGLSetGLDevice
cudaGetDevice
cudaGetDeviceProperties
cudaGetLastError
cudaGraphicsD3D9RegisterResource
cudaGraphicsGLRegisterBuffer
cudaGraphicsGLRegisterImage
cudaGraphicsMapResources
cudaGraphicsResourceGetMappedPointer
cudaGraphicsResourceSetMapFlags
cudaGraphicsSubResourceGetMappedArray
cudaGraphicsUnmapResources
cudaGraphicsUnregisterResource
cudaLaunch
cudaMalloc
cudaMemcpy2DFromArray
cudaSetupArgument
cudaThreadExit
cudaMallocPitch
cudaMemset
Table 3: The entry points of the CUDA runtime which 3D Hand Tracking software invokes.
D2.1 Application analysis and system requirements
Page 18 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
4. Antivirus Application
The ever-increasing amount of malicious software in todays connected world, poses a tremendous
challenge to network operators, IT administrators, as well as ordinary home users. Antivirus software
is one of the most widely used tools for detecting and stopping malicious or unwanted software. For an
effective defense, one needs virus scanning performed at central network traffic ingress points, as well
as at end-host computers. As such, anti-malware software applications scan traffic at e-mail gateways
and corporate gateway proxies, and also on edge compute devices such as file servers, desktops and
laptops. Unfortunately, the constant increase in storage capacity, number of end-devices and the sheer
number of malware, poses significant challenges to virus scanning applications, which end up
requiring multi-gigabit scanning throughput.
Analysis. Typically, a malware scanner spends the bulk of its time matching data streams against a
large set of known signatures. For instance, the signatures set of ClamAV [12], the most popular open-
source antivirus, contains more than 60 thousand string and regular expression signatures, that have to
be matched against each incoming data stream.
Pattern matching algorithms analyze the data stream and compare it against the set of signatures to
detect known malware. The signature patterns can be fairly complex, composed of different-size
strings, wild-card characters, range constraints, and sometimes recursive forms. Every year, as the
amount of malware grows, the number of signatures is increasing proportionally, exposing scaling
problems of anti-malware products.
Design. The antivirus application utilizes the highly parallel capabilities of commodity graphics
processing units to improve the performance of malware scanning programs. From a high-level view,
malware scanning is divided into two phases. First, all files are scanned by the GPU, to quickly filter
out the data segments that do not contain any viruses. The GPU uses a prefix of each virus signature to
quickly filter-out clean data. Most data do not contain any viruses, so such filtering is quite efficient.
This results in identifying all potentially malicious files, but a number of clean files as well. The GPU
then outputs a set of suspect matched files and the corresponding offsets in those files. In the second
phase, all those files are rescanned using a full pattern-matching algorithm.
The overall architecture of antivirus application is shown in Figure 8. The contents of each file are
stored into a buffer in a region of main memory that can be transferred via DMA into the memory of
the GPU. The SPMD operation of the GPU is ideal for creating multiple search engine instances that
will scan for virus signatures on different data in a massively parallel fashion. If the GPU detects a
suspicious virus, that is, there is prefix match, the file is passed to the verification module for further
investigation. If the data stream is clean, no further computation takes place. Therefore, the GPU is
employed as a first-pass high-speed filter, before completing any further potential signature-matching
work on the CPU.
D2.1 Application analysis and system requirements
Page 19 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Figure 8: Antivirus application. Files are mapped onto pinned memory that can be copied via DMA onto
the graphics card. The matching engine performs a first-pass filtering on the GPU and return potential
true positives for further checking onto the CPU.
The current implementation of antivirus has the following technical requirements:
A CUDA-enabled GPU.
Optionally TCP/IP network connectivity, in order to scan data received from network.
D2.1 Application analysis and system requirements
Page 20 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
5. Application Requirements and Validation Scenarios
This section provides the formal list of the system requirements as well as the validation scenarios of
RAPID based on the previous sections. There are 5 main categories of the application requirements:
Requirements related to the CUDA support derived from the BioSurveillance and 3D
Hand Tracking applications. These requirements are identified with the prefix
"BIOS_CUDA" and "HT3D_CUDA" respectively.
Requirements related to the RAPID devices derived from the BioSurveillance and 3D
Hand Tracking applications. These requirements are identified with the prefix
"BIOS_DEV" and "HT3D_DEV" respectively, and they are focused on the applications
themselves.
Requirements related to the network infrastructure derived from the BioSurveillance and
3D Hand Tracking applications. These requirements are identified with the prefix
"BIOS_NET" and "HT3D_NET".
Requirements related to the performance of the CPU and GPU of the RAPID server
derived from the 3D Hand Tracking application. These requirements are identified with
the prefix "HT3D_CPU".
System-level requirements related to the functionality of the RAPID infrastructure derived
from the 3D Hand Tracking application. These requirements are identified with the prefix
"HT3D_SYS".
More general system requirements which are related to added-value features in RAPID
that could be required by other applications, identified with the prefix “GEN_SYS”.
The application requirements are related to specific components of a RAPID-based system. The main
components of the RAPID infrastructure are ThinkAir [13], which is being developed by UROME and
provides the offloading mechanism, and GVirtuS [14], which is being developed by UNP and provides
the GPU-based virtualization on the server side. Moreover other important components of a RAPID-
based system are the low-power device on which the accelerated application is executed, the network
infrastructure between the low-power device and the RAPID server, and the RAPID server on which
the tasks are offloaded.
The requirements derived from the two applications as well as from other potential applications are the
following:
# Id BIOS_CUDA_01 Name Support CUDA capabilities 3.0 or higher
Priority High Req. Type Functional
Description RAPID infrastructure must support compute capabilities 3.0
Purpose GPU accelerators are all about performance. RAPID must support at least CC
3.0 to fully exploit modern GPU hardware. BioSurveillance use case will not
work on GPUs that does not support compute capability 3.0.
Use Cases BioSurvelliance use-case scenario, identifying faces by following the defined
workflow.
Validation
scenario
If GVirtuS or underlying GPUs do not support compute capability 3.0, then
BioSurveillance will not work.
Related WPs WP6
Components GVirtuS
Relationships BIOS_CUDA_01, BIOS_CUDA_02, BIOS_CUDA_03, BIOS_CUDA_04,
HT3D_CUDA_04
D2.1 Application analysis and system requirements
Page 21 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
# Id BIOS_CUDA_02 Name Support CUDA version 6.5
Priority High Req. Type Functional
Description RAPID must support at least CUDA toolchain 6.5 (this
includes cudaMallocManaged and variants)
Purpose GPU accelerators are all about performance. RAPID must support at least
CUDA libraries 6.5 to fully exploit modern GPU hardware. BioSurveillance
makes use of advanced memory allocation for improving performance on
unified-memory devices (cudaMallocManaged).
Use Cases BioSurvelliance scenario, identifying faces by following the defined
workflow.
Validation
scenario
If cudaMallocManaged or other CUDA 6.5 features are not supported,
BioSurveillance software will not work.
Related WPs WP6
Components GVirtuS
Relationships BIOS_CUDA_01, BIOS_CUDA_02, BIOS_CUDA_03, BIOS_CUDA_04,
HT3D_CUDA_04
# Id BIOS_CUDA_03 Name PTX Assembly instructions required
Priority High Req. Type Functional
Description Assembly instructions shfl.down and shfl.bfly are required to implement
__shfl_down and __shfl_xor intrinsics. These instrinsics are used in
BioSurveillance use case for efficient information exchange inside CUDA
warps. Any CUDA compatible card which supports compute capability 3.0
and beyond must implement these assembly instructions.
Purpose These assembly instructions allow fast and efficient data exchange between
threads in a warp without using shared memory.
Use Cases BioSurvelliance scenario, identifying faces by following the defined
workflow.
Validation
scenario
Run BioSurveillance application in RAPID infrastructure. If the GPU cards
support these assembly instructions, application will run. If not, the
application will fail to load.
Related WPs WP6
Components GvirtuS
Relationships BIOS_CUDA_01, BIOS_CUDA_02, BIOS_CUDA_03, BIOS_CUDA_04,
HT3D_CUDA_04
# Id HT3D_CUDA_04 Name CUDA runtime requirements
Priority High Req. Type Functional
Description The CUDA runtime should be of at least version 6.5 and should support the
runtime API sub-part of Table 1.
Purpose Set the minimum CUDA runtime requirements for 3D Hand Tracking.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
CUDA-powered aspects of 3D Hand Tracking, e.g. tracking and
visualization, work successfully.
Related WPs WP6
Components GVirtuS
Relationships BIOS_CUDA_01, BIOS_CUDA_02, BIOS_CUDA_03, BIOS_CUDA_04,
HT3D_CUDA_04
D2.1 Application analysis and system requirements
Page 22 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
# Id BIOS_DEV_01 Name Sensor devices required
Priority High Req. Type Functional
Description Standard webcams are required as sensing devices for the BioSurveillance
use-case application. Standard Logitech HD webcams are recommended for
the BioSurveillance application.
Purpose BioSurveillance requires real-time video streams as an input to be processed.
RAPID should take into account the kind of data managed by the
applications.
Actors N/A
Use Cases BioSurvelliance scenarios
Validation
scenario
Run BioSurveillance in stand-alone mode. The application should work
correctly with the connected webcam devices.
Related WPs WP3
Components Low-power devices
Relationships BIOS_DEV_01, HT3D_DEV_02
# Id HT3D_DEV_02 Name Input for 3D Hand Tracking
Priority Low Req. Type Functional
Description 3D Hand Tracking requires RGBD input, i.e. a stream of RGB images
accompanied with registered depth maps. For online execution data need to
be acquired from a depth sensor. For offline execution data could be loaded
from storage.
Purpose Establish input for 3D Hand Tracking. RAPID should take into account the
kind of data managed by the applications.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
Run 3D Hand Tracking in stand-alone mode.
Related WPs WP3
Components Low-power devices
Relationships BIOS_DEV_01, HT3D_DEV_02
# Id BIOS_NET_01 Name Support minimum bandwidth
Priority Medium Req. Type Non-functional
Description The network used among RAPID servers and clients must facilitate at least
the aggregated minimum requirements of downstream and upstream
bandwidths for all clients and servers within the deployed RAPID
infrastructure.
Purpose The indicated upstream and downstream network bandwidth requirements are
needed for the online use case applications to work sufficiently fluent and
within the expected framerate range.
Use Cases BioSurveillance application
Validation
scenario
Online use-case applications (such as BioSurveillance) running on RAPID
infrastructure will run correctly if minimum bandwidth requirements are
covered. Otherwise, they will experience additional network latencies.
Related WPs WP4-6
Components ThinkAir and Network
Relationships BIOS_NET_01, BIOS_NET_02, HT3D_NET_03, HT3D_NET_04
D2.1 Application analysis and system requirements
Page 23 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
# Id BIOS_NET_02 Name Work under maximum latency
Priority Medium Req. Type Non-functional
Description The BioSurveillance use-case application needs to work below the maximum
indicated network latency. RAPID has to monitor the network status and deal
with network latency as a Quality of Service (QoS) parameter.
Purpose Given the sensitivity of real-time security systems to the rapid availability of
alarm events, especially in the context of critical infrastructures, maximum
latency has to be respected in order to carry out successful commercial
implementations.
Use Cases BioSurveillance use-case scenario.
Validation
scenario
The network latency has to be empirically estimated for a BioSurveillance
client and server on a RAPID infrastructure, as the time elapsed since the
sending of descriptors and until the retrieval of scores. Valid latency values
must be under 1 second of delay.
Related WPs WP4-6
Components ThinkAir and Network
Relationships BIOS_NET_01, BIOS_NET_02, HT3D_NET_03, HT3D_NET_04
# Id HT3D_NET_03 Name Networking
Priority High Req. Type Functional
Description The 3D Hand Tracking server and client require intercommunication in order
to exchange data. The requirements include a Remote Procedure Call
infrastructure, built over TCP/IP. RAPID has to take into account this kind of
communication when doing offloading.
Purpose Establishing the networking infrastructure’s ability to support 3D Hand
Tracking.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
Hand Tracking should be successful in offline mode, processing data from
storage, where speed is not a factor.
Related WPs WP6
Components Network
Relationships BIOS_NET_01, BIOS_NET_02, HT3D_NET_03, HT3D_NET_04
# Id HT3D_NET_04 Name Fast networking
Priority Medium Req. Type Non-functional
Description For 3D Hand Tracking to be fast, intercommunication also needs to be
relatively fast. Due to the dimensions of the images used by the application,
RAPID must guarantee it is possible to achieve certain transfer rates.
Otherwise, it will not be possible to perform smooth real-time executions in
the platform. Therefore, network speed must be taken into account as another
QoS parameter.
Latency affects tracking rate, as it lengthens the client’s tracking loop, in
time. Latency should reflect the specifications of a fast local network, and this
has to be taken into account by RAPID as well.
Purpose Establishing the networking infrastructure’s ability to support real time 3D
Hand Tracking.
Use Cases Online or offline 3D Hand Tracking.
D2.1 Application analysis and system requirements
Page 24 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Validation
scenario
The processing rate across distinct nodes should closely approximate the
required rate when client and server are executed on the same node.
For real-time execution data need to be exchanged at a rate of 30 FPS. The
data to be exchanged comprise an RGB image of dimensions 64x64 ( ), a depth map of dimensions 64x64 ( )
and a vector of 27 doubles ( ). The networking infrastructure
should allow for the payload to be transferred at a rate of
over TCP/IP.
Related WPs WP4-6
Components ThinkAir and Network
Relationships BIOS_NET_01, BIOS_NET_02, HT3D_NET_03, HT3D_NET_04
# Id HT3D_CPU_01 Name Manage CPU/GPU server specifications
Priority High Req. Type Functional
Description The server processing node should be equipped with a multi-core CPU and a
CUDA-enabled GPU. These are requirements that RAPID must take into
account when performing allocation of resources. Therefore, it is necessary
that RAPID will be able to detect and process this kind of requirements.
Purpose Establish that 3D Hand Tracking can be executed on the server by fulfilling a
set of minimum requirements related to the number of CPUs and GPUs.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
Proof-of-concept server can handle more than 2 clients/devices that run 3D
Hand Tracking application.
Related WPs WP6
Components RAPID servers
Relationships HT3D_CPU_01, HT3D_CPU_02, HT3D_CPU_03
# Id HT3D_CPU_02 Name Manage server real-time specifications
Priority High Req. Type Non-functional
Description Apart from the type and number of CPU/GPUs, the application has concrete
requirements for this kind of resources, such as clock frequency, number of
cores, memory frequency, memory bandwidth, etc. RAPID has to take into
account that certain applications could have these concrete requirements, in
order to guarantee their correct execution.
Purpose Establish that 3D Hand Tracking can be executed on the server at about 30
FPS.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
3D Hand Tracking operates at about 30fps in real-time execution. For doing
so, the assigned server’s CPU and GPU specifications should meet the
following minimum requirements:
CPU: Intel Core i7 CPU 950 @ 3.07 GHz or better.
GPU: NVIDIA GTX 970 or better:
Number of cores: 1664
Clock frequency: 1050 MHz
Memory frequency: 7 Gbps
Memory interface: 256 bit
Memory bandwidth: 224 GB/s
Related WPs WP6
D2.1 Application analysis and system requirements
Page 25 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Components RAPID servers
Relationships HT3D_CPU_01, HT3D_CPU_02, HT3D_CPU_03
# Id HT3D_CPU_03 Name Client real-time specifications
Priority Medium Req. Type Non-functional
Description If the client’s CPU and GPU specifications met the server’s specifications
(HT3D_CPU_11), hand tracking would be executed at about 30fps.
Resources allocation should guarantee this is possible.
Purpose Establish that 3D Hand Tracking can be executed on the client at about 30fps.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
3D Hand Tracking operates at about 30 FPS
Related WPs WP6, WP7
Components RAPID servers
Relationships HT3D_CPU_01, HT3D_CPU_02, HT3D_CPU_03
# Id HT3D_SYS_01 Name Number of processing nodes
Priority Low Req. Type Functional
Description The server-client 3D Hand Tracking software may run on a single processing
node or more. The common scenario involves two processing nodes, one for
processing (server) and one for data acquisition and visualization (client). The
manually derived server-client decomposition may run on a single PC or two
PCs. In RAPID, the heavy computations devoted to the server may be further
delegated to even more processing nodes, if required. Purpose Investigate the execution model of server-client 3D Hand Tracking.
Use Cases Online or offline 3D Hand Tracking.
Validation
scenario
The decomposition across processing nodes should maximize tracking
throughput. Ideally, real-time execution (30 FPS) should be achieved on
powerful processing nodes.
Related WPs WP5, WP6
Components RAPID infrastructure and ThinkAir Server
Relationships N/A
# Id GEN_SYS_01 Name Manage available resources
Priority High Req. Type Functional
Description In order to perform an adequate management of resources, RAPID needs to
maintain a kind of model of the devices available and the resources they
provide to the applications which will be running. This model should reflect
the status of the infrastructure in terms of applications running, capability of
the nodes (CPUs and GPUs available, memory available…) and already
allocated resources.
Moreover, nodes will be classified as Class-1/2/3/4/5, depending on their
nature (from ultra-low-power devices to public clouds). Purpose Maintain the control of the infrastructure available, so it will be possible to
perform the best possible management of resources.
Use Cases All.
Validation
scenario
In every scenario, different nodes will be available for execution and
offloading. RAPID will identify with 100% accuracy these nodes and the
resources they provide.
D2.1 Application analysis and system requirements
Page 26 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
Related WPs WP5, WP6
Components RAPID infrastructure and ThinkAir Server
Relationships HT3D_CPU_01, HT3D_CPU_02, BIOS_NET_01
# Id GEN_SYS_02 Name Support application requirements
Priority Medium Req. Type Functional
Description RAPID must facilitate a way to provide infrastructure requirements for
applications in an easy way, so it will be possible to perform an adequate
management of the applications deployed in the infrastructure.
These requirements may include number of CPUs/GPUs required and their
characteristics, memory, etc…
This requirement implies to define a format for providing these infrastructure
requirements in a simple but effective way.
Purpose In order to perform the right allocation of resources, RAPID must be able to
retrieve infrastructure requirements from the applications, when these are
crucial for providing a minimum performance.
Use Cases All.
Validation
scenario
In each scenario, applications will provide these requirements and RAPID
will be able to extract all of them correctly.
Related WPs WP5, WP6
Components RAPID infrastructure and ThinkAir Server
Relationships HT3D_CPU_01, HT3D_CPU_02, BIOS_NET_01, BIOS_NET_02,
HT3D_NET_03, HT3D_NET_04
# Id GEN_SYS_03 Name Support Quality of Service
Priority High Req. Type Functional
Description In certain cases, applications will have some quality-related requirements that
must be taken into account, such as network latency or minimum bandwidth.
RAPID will deal with these requirements as Quality of Service (QoS)
requirements that should be agreed and monitored. This means that, when
deploying or offloading an application, RAPID will check it is possible to
provide the required levels of quality, trying to negotiate if it is not the case,
and providing the best solution available.
Moreover, these QoS parameters will be monitored whenever possible , in
order to guarantee the agreed quality levels are met. Purpose Some of the requirements applications may have are related to the resources
to be provided, but they are related to non-functional aspects, and they have
to be taken into account.
Use Cases All.
Validation
scenario
In each scenario, applications will provide these requirements and RAPID
will deal with them. Scenarios will check that, whenever it is possible to
fulfill the QoS aspects, they will be fulfilled 100%.
Related WPs WP5, WP6
Components RAPID infrastructure, ThinkAir Server and ThinkAir Clients
Relationships BIOS_NET_01, BIOS_NET_02, HT3D_NET_03, HT3D_NET_04,
GEN_SYS_02
D2.1 Application analysis and system requirements
Page 27 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
# Id GEN_SYS_04 Name Resources allocation mechanism
Priority High Req. Type Functional
Description There must be an adequate infrastructure management mechanism which will
facilitate applications deployment and offloading to the right devices.
Required infrastructure and other QoS parameters will be taken into account
when performing these operations.
The mechanism will take into account the infrastructure model, taking into
account certain policies for improving performance (i.e. offloading cannot be
done to nodes belonging to a lower class). Purpose Applications need that RAPID performs the adequate resources allocation, so
they will perform as expected, while the infrastructure resources are exploited
in an optimal way.
Use Cases All.
Validation
scenario
In each scenario, applications will provide these requirements and RAPID
will perform the resources allocation according to the requirements they have,
guaranteeing applications performance is adequate. This will be validated by
HT3D_CPU_01, HT3D_CPU_02, BIOS_NET_01, BIOS_NET_02,
HT3D_NET_03 and HT3D_NET_04.
Related WPs WP5, WP6
Components RAPID infrastructure, ThinkAir Server and ThinkAir Clients
Relationships GEN_SYS_01, GEN_SYS_02, GEN_SYS_03
# Id GEN_SYS_05 Name Smart offloading
Priority Medium Req. Type Functional
Description RAPID will enable offloading thanks to ThinkAir functionalities by
indicating the code which can be executed remotely. RAPID has to guarantee
offloading operations will be performed in an optimal way, by selecting the
adequate resources and devices.
As certain aspects are monitored (such as some QoS aspects), these will also
be taken into account to re-formulate resources allocation in case potential
issues are detected. Purpose In certain cases, it is necessary to offload part of the code of an application for
executing it in more capable machines. It is important that this offloading is
done adequately, so performance will increase.
Use Cases All.
Validation
scenario
In each scenario, applications will offload certain parts of their code in a
smart way, so QoS requirements will be fulfilled. This will be validated by
HT3D_CPU_01, HT3D_CPU_02, BIOS_NET_01, BIOS_NET_02,
HT3D_NET_03 and HT3D_NET_04.
Related WPs WP5, WP6
Components RAPID infrastructure, ThinkAir Server and ThinkAir Clients
Relationships GEN_SYS_01, GEN_SYS_02, GEN_SYS_03, GEN_SYS_04
D2.1 Application analysis and system requirements
Page 28 of 28
This document is Confidential, and was produced under the RAPID project (EC contract 644312).
References
[1] FFmpeg: https://www.ffmpeg.org/
[2] Open source audio and video processing tools: http://libav.org/
[3] Wireshark: https://www.wireshark.org/
[4] Mark Schneider and Charles Stevens. Development and Testing of a New Magnetic-tracking
Device for Image Guidance. SPIE Medical Imaging, pages 65090I–65090I–11, 2007.
[5] Robert Y. Wang and Jovan Popovic. Real-time Hand-tracking With a Color Glove. ACM
Transactions on Graphics, 28(3):1, July 2009.
[6] Nikolaos Kyriazis, Iason Oikonomidis, and Antonis A. Argyros, "A GPU-powered
Computational Framework for Efficient 3D Model-based Vision", Technical Report TR420,
ICS-FORTH, July 2011.
[7] P. Padeleris, X. Zabulis and A. Argyros, “Head pose estimation on depth data based on
Particle Swarm Optimization”, HAU3D, 2012.
[8] A. Shimada, K. Kondo, D. Deguchi, G. Morin and H. Stern, "Kitchen Scene Context based
Gesture Recognition", ICPR, 2012.
[9] Matthias Wölfel, "Kinetic Space", http://code.google.com/p/kineticspace/
[10] Alexey Kurakin, Zhengyou Zhang, Zicheng Liu, "A Real-Time System for Dynamic Hand
Gesture Recognition with a Depth Sensor", EUSIPCO, 2012.
[11] Kinect 3D Hand Tracking, First price at CHALEARN Gesture Recognition competition, 11
Nov 2012
[12] ClamAV Antivirus: http://www.clamav.net
[13] Sokol Kosta, Andrius Aucinas, Pan Hui, Richard Mortier, and Xinwen Zhang,
"ThinkAir: Dynamic resource allocation and parallel execution in cloud for mobile code
offloading", INFOCOM, page 945-953. IEEE, (2012).
[14] GVirtuS: https://code.google.com/p/gvirtus/
[15] RAPID description: http://www.rapid-project.eu/project-description.html