deliverable 6.1: piloting plan and evaluation · deliverable 6.1: piloting plan and evaluation...

Project funded by the European Union’s Horizon 2020 Research and Innovation Programme (2014 – 2020)

Coordination and Support Action

Big Data Europe – Empowering Communities with

Data Technologies

Deliverable 6.1: Piloting Plan and Evaluation

Dissemination Level Public

Due Date of Deliverable March 31, 2016

Actual Submission Date April 15, 2016

Work Package WP6, Real-Life Deployment & User Evaluation

Task T6.1

Type Report

Approval Status Approved

Version 1.0

Number of Pages 22

Filename D6.1_PilotingPlan_and_Evaluation.pdf

Abstract: This report documents the plan and process for the execution of the pilot sessions, indicating the required resources and materials that will be used. Furthermore, it defines the methodology for evaluating the results of the pilots, taking into account the general and domain-specific requirements expressed. The requirement analysis documentation delivered as part of T3.1 and T3.2 will be the guideline for the specification of what will be evaluated.

The information in this document reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided “as is” without guarantee or warranty of any kind, express or implied, including but not limited to the fitness of the information for a

particular purpose. The user thereof uses the information at his/ her sole risk and liability.

Project Number: 644564 Start Date of Project: 01/01/2015 Duration: 36 months

D6.1 – v. 1.0

History Version Date Reason Revised by

0.0 02/11/2015 Initial version Ronald Siebes (VUA)

0.1 12/12/2015 Outline by (sub)section Ronald Siebes (VUA)

0.2 02/01/2016 Generic infra questionnaire update and planning according to meeting Vienna

Ronald Siebes (VUA)

0.3 25/02/2016

Incorporating pilot feedback and generic platform component choices after Technical meeting Athens

Ronald Siebes, Erika Pauwels (Tenforce), Stasinos Konstantopoulos (NCSR-D)

0.4 09/03/2016 Final Draft Ronald Siebes (VUA)

0.5 14/03/2016 Reviewed version Erika Pauwels (Tenforce) and Aad Versteden (Tenforce)

0.6 15/03/2016 Final draft corrected according to review comments

Ronald Siebes (VUA)

Author List

Organisation Name Contact Information

Open PHACTS

Bryn Williams-Jones [email protected]

VUA Victor de Boer [email protected]

VUA Ronald Siebes [email protected]

NCSR-D S. Konstantopoulos, A. Charalambidis, I. Mouchakis, G. Stavrinos

[email protected]

D6.1 – v. 1.0

Executive Summary This document details the first version of the plan and process for the execution of the pilot sessions, indicating the required resources and materials that will be used. Furthermore, this report will define the methodology for evaluating the results of the pilots, taking into account the general and domain-specific requirements expressed. In other words, the generic evaluation criteria, which can be applied to every possible instance on the BDE infrastructure, are complemented with a set of evaluation criteria of unique characteristics held by each of the seven Societal Challenges.

D6.1 – v. 1.0

Abbreviations and Acronyms LOD Linked Open Data

SC Societal Challenge

BDE BigData Europe

GB Gigabyte

TB Terabyte

PB Petabytes

JSON JavaScript Object Notation

SC1 Societal Challenge 1

API Application programming interface

PDF Portable Document Format

JPEG Joint Photographic Experts Group

PNG Portable Network Graphics

GIF Graphics Interchange Format

GML Geography Markup Language

GeoTIFF Geographic Tagged Image File Format

TIF Tagged Image Files

GMLJP2 Geography Markup Language JPEG 2000

CCTV Closed-circuit television

CMIP5 Coupled Model Intercomparison Project Phase 5

CMIP6 Coupled Model Intercomparison Project Phase 6

CORDEX Coordinated Regional Climate Downscaling Experiment

SPECS Seasonal-to-decadal climate Prediction for the improvement of European Climate Services

HDF Hierarchical Data Format

NetCDF Network Common Data Form

ASCII American Standard Code for Information Interchange

MRI Magnetic resonance imaging

INSPIRE Infrastructure for Spatial Information in Europe

PPT PowerPoint templates

FURPS P14

IMS Open PHACTS Instance Mapping Service

OPS Open PHACTS

ECMF European Centre for Medium-Range Weather Forecasts

ESGF Earth System Grid Federation

FTP File Transfer Protocol

GIS Geographic Information System

D6.1 – v. 1.0

Table of Contents

1. Introduction ............................................................................................................. 8

2. Rationale for the Choice of Evaluation Methodologies ............................................10

3. Evaluation Strategy for the Generic BDE Infrastructure ..........................................11

3.1 BDE Infrastructure Questionnaire .........................................................................11

4. Pilot-Specific Evaluation Plans ...............................................................................15

4.1 SC1: Life Sciences and Health .............................................................................15

4.2 SC3: Energy .........................................................................................................17

4.3 SC5: Climate ........................................................................................................19

5. Conclusion .............................................................................................................22

D6.1 – v. 1.0

List of Figures

Figure 1: Work packages and respective project implementation phases .............................. 8

Figure 2: Pilot development and evaluation cycle .................................................................. 9

Figure 3: Pilot planning .........................................................................................................10

Figure 4: Architecture of the first SC1 pilot ...........................................................................16




D6.1 – v. 1.0

List of Tables

Table 1: Evaluation questions for the first SC1 pilot ..............................................................16




D6.1 – v. 1.0

1. Introduction

According to the planning in the Technical Annex WP6 – Real-life Deployment & User Evaluation – starts the end of year one and ends the third quarter of year three (cf. figure 4).

Figure 1: Work packages and respective project implementation phases

The pilot deployment and evaluations will be carried out in three cycles to have the

possibility to improve, adjust and extend each pilot by the evaluation results of each previous cycle. Given the ambitious goal of the BDE project to provide an infrastructure that facilitates at least three pilots for each of the seven Societal Challenges, the evaluation methodology for this first round of pilots is best served by a descriptive “lessons learned” approach which will form the prescriptive “must-haves” for the second round of pilots, starting in M18.

D6.1 – v. 1.0

Figure 2: Pilot development and evaluation cycle

During year one of the project, the first implementation of the BDE generic infrastructure is delivered in M12. The design and architectural decisions are based on the numerous feedback sessions and interviews with the project partners and the domain experts that are consulted by the various domain partners. The requirements and design specifications of the platform are described in deliverable 3.3, 3.5 and 4 which form the basis for the generic evaluation criteria. Basically, for each functional and non-functional requirement we need to specify how we are going to evaluate how and if these requirements, are met during the various phases in our project. It is important to realise that due to new insights, the choice of ‘core components’ that make up the BDE infrastructure is subject to major changes within the duration of the project. Therefore it is only possible to deliver a generic evaluation methodology independent of the chosen technology for the generic BDE infrastructure. However, we expect that the choice of these components will not change much after M18, which will allow us to perform a more technology dependent review from that moment on. Concluding, during the first cycle, we can only have a generic descriptive evaluation of the BDE platform which will give insight to the developers of the generic BDE infrastructure combined with pilot specific evaluations that give insight to the developers that are involved in implementing the challenge specific requirements.

The specification of the first cycle of pilots for each of the seven challenges is worked out in deliverable 5.2. From the seven challenges, four pilots are mature enough and have the resources available to make a realistic plan for a feasible implementation.

SC1 – Life Science & Health

SC3 – Energy

SC5 – Climate

SC7 – Security

The other three challenges are expected to deliver a first pilot description latest M18 in order to have enough time to develop and evaluate all the required pilot cycles.

D6.1 – v. 1.0

Figure 3: Pilot planning

2. Rationale for the Choice of Evaluation Methodologies

Deliverable 2.4.1 provides the results of an extensive requirements elicitation phase, which combined with the technical requirement analysis in deliverable 2.3 and the results of the interviews contain the functional and non-functional requirements for the BDE Platform. The goal of the evaluation process is to investigate at which level these requirements are met during the various implementation phases of this project.

For the functional and non-functional requirements of the generic infrastructure part the FURPS model1 is followed which classifies software quality attributes:

Functionality - Capability (Size & Generality of Feature Set), Reusability (Compatibility, Interoperability, Portability), Security (Safety & Exploitability)

Usability (UX) - Human Factors, Aesthetics, Consistency, Documentation, Responsiveness

Reliability - Availability (Failure Frequency (Robustness/Durability/Resilience), Failure Extent & Time-Length (Recoverability/Survivability)), Predictability (Stability), Accuracy (Frequency/Severity of Error)

Performance - Speed, Efficiency, Resource Consumption (power, ram, cache, etc.), Throughput, Capacity, Scalability

1 "FURPS - Wikipedia, the free encyclopedia." 2011. 2 Nov. 2015 <https://en.wikipedia.org/wiki/FURPS>

https://en.wikipedia.org/wiki/Functional_requirements

https://en.wikipedia.org/wiki/Functional_requirements

https://en.wikipedia.org/wiki/Usability

https://en.wikipedia.org/wiki/Usability

https://en.wikipedia.org/wiki/Reliability_engineering

https://en.wikipedia.org/wiki/Reliability_engineering

https://en.wikipedia.org/wiki/Computer_performance

https://en.wikipedia.org/wiki/Computer_performance

https://en.wikipedia.org/wiki/FURPS

D6.1 – v. 1.0

Supportability (Serviceability, Maintainability, Sustainability, Repair Speed) - Testability, Flexibility (Modifiability, Configurability, Adaptability, Extensibility, Modularity), Installability, Localizability

The details of each of these requirements are different for each challenge and need to be addressed as such in our evaluation strategy. However, as mentioned before, the generic BDE infrastructure also can be evaluated independent from these challenges according to the FURPS model. This evaluation strategy for the first development cycle of the BDE infrastructure is described in section 3. The challenge specific evaluation strategies require a fine-tuned approach which is described in section 4.

3. Evaluation Strategy for the Generic BDE Infrastructure

Deliverable 3.2 provides the details on requirements and the initial choice of software components for the generic BDE infrastructure. The functional and non-functional requirements were gathered from all the societal challenges categorized by the four Big Data challenges namely Volume, Velocity, Variety and Veracity of data. It has been found that the data requirements include all the four features of big data with particular focus on volume and velocity. The analysis of data value chain has revealed that each societal challenge has a different set of requirements resulting in a diverse set of tools and frameworks required for each step of handling in the data value chain.

Deliverable 5.2 outlines the choices of the components that form the BDE infrastructure with respect to the demands specified by the individual pilots. Hence, the evaluation plan for the first pilot cycle regarding the generic infrastructure will mainly be focused on gathering feedback on the selected tools, especially whether or not they fulfill the requirements from the individual pilots.

To achieve this goal, we will ask the leading technical persons and the leading domain experts for each selected pilot to fill out a questionnaire during the first evaluation period (M18). This questionnaire addresses the non-functional and functional requirements for each task that the BDE infrastructure should support.

3.1 BDE Infrastructure Questionnaire

File System

The platform requires a distributed file system which provide storage, fault tolerance, scalability, reliability, and availability to the multitude of SC partners.

This has resulted in selection of Apache Hadoop Distributed File system, HDFS.

https://en.wikipedia.org/wiki/Serviceability_(computer)

https://en.wikipedia.org/wiki/Serviceability_(computer)

D6.1 – v. 1.0

Evaluation questions:

- How much data did you store?

- Please specify which storage mechanism(s)/tool(s) you used (e.g. HDFS, a structured database like Postgres, an RDF store like Virtuoso, a NoSQL store like Redis)

- For every mechanism you mentioned, please answer the following questions:

o Is the store being managed via the BDE infrastructure?

o If no, why not?

o If yes:

How much effort/time did it take to setup and understand the tool/mechanism?

Was the file system able to store and retrieve the files that you need for running the pilot?

Did you experience any problems related to fault tolerance, scalability, reliability and availability?

Was the upload time satisfactory?

Any other points you like to mention related to the Data Store?

Resource Manager

The platform should be able to provide resource management capabilities and support schedulers for high utilization and throughput.

This set of properties is delivered by Docker Swarm which offers optimal resource management for distributed applications.


- How many applications do you run in parallel during your pilot?

- Did you need an automatic resource manager to delegate the resources for distributed applications?

- If yes, did you use Docker Swarm?

o If no, what did you choose, and why not Docker Swarm?

o If yes

Is it easy to set up for your pilot requirements?

Which type of interface did you use to interact with Docker Swarm (e.g the HTTP REST API, a Web interface, the command line shell).

o storing and retrieving files?

D6.1 – v. 1.0

How did you describe the ‘tasks’ and was the ‘matchmaking’ by Docker Swarm finding the appropriate resources satisfactory?

- Any other remarks?

Scheduler

The scheduler needs to schedule the distributed tasks and offer resources to increase the throughput of overall system.

Two schedulers Marathon and Chronos have been selected for task scheduling in the framework.


Many system administrators heard of Cron(jobs) that at certain time intervals or other conditions manage starting and stopping processes. Chronos is the Cron for Mesos.

- Do you have experience with Cron or similar schedulers?

- Do you need a scheduler for your first pilot?

- If yes, did you use Marathon and Chronos for your first pilot?

o If you did not use Marathon and Chronos, please explain why and what you chose instead.

o If you did use Marathon and Chronos, what are your experiences?

Coordination

The platform requires an efficient system for managed state, distributed coordination, consensus and lock management in the distributed platform.

ZooKeeper will be used as a decentralized tolerant coordination framework.


The most important feature of ZooKeeper is to make sure the processes keep running and communicating. It responds to node crashes by for example delegating the job to another node.

- Does your pilot contain a set of continuously running applications that interact with eachother?

- If yes, did you choose for a coordination framework to keep the processes stable?

- If yes,

o which framework did you choose, and was it doing what you expected?

o Did you experience any problems like physical node failures or memory leaks that resulted in the coordinator to interfere?

D6.1 – v. 1.0

o If you chose something else than Zookeeper, please explain why.

o If you chose for Zookeeper, what were your experiences?

Data Acquisition

Owing to the wide range of input data properties, a set of tools is needed to support the process of gathering, filtering and cleaning data before it is put in a data warehouse or any other storage solution on which data processing can be carried out during data acquisition

The set of frameworks including Apache Flume and Apache Sqoop have been chosen with an ambition that it would cater for the all the four properties of Big Data.

Apache Flume: A framework to populate HDFS with streaming event data.

Apache Sqoop: A framework to transfer data between structured data stores such as RDBMS and Hadoop.


Most likely some of the data you use in your first pilot you already had available in one form or another. The ‘data acquisition’ process is the bridge between the BDE storage mechanism and the data.

- Is your data ‘special’ in order that it needs some tool to transform or process the data so that it can be stored in the respective data-store (for example streaming data, XML, data that needs to be transformed to JSON, data that needs to be migrated between DBS, etc)?

- If yes, which tools did you use for this pilot?

- Are any of these tools part of the generic BDE infrastructure?

o If no, would you recommend it to add it?

o If yes,

was it difficult to learn and setup the tool?

What is the weakest link in the pipeline (e.g. the store, the network, the tooling), and was it still performing well enough to be satisfactory for this pilot?

Data Processing

Data Processing Frameworks: The platform requires different frameworks for various SC instances. Each framework has a different set of strengths and is applicable for a specific set of properties of the underlying data.

A multitude of tools are available for the type of processing to be performed on the underlying data, this includes, but is not limited to MapReduce for batch processing, Spark GraphX for iterative processing, Apache Spark and Apache Flink for data stream and real time processing.

D6.1 – v. 1.0


- Which tools do you use to transform your (raw) data in order to, for example, perform analysis, filter noise, make it suitable for sorting and querying etc?

- Which of these tools are being supported by the generic BDE infrastructure? And did you use them?

- Did the tools you use, and which are supported by the generic infrastructure, do what you expected? Were they easy to use? Please give a summary about your experiences for each data transformation tool.

4. Pilot-Specific Evaluation Plans

The first cycle of pilots is specified in Deliverable 5.2. As mentioned before, each pilot will be evaluated on BDE generic and pilot specific requirements. The questionnaire described in section 3 deals with the generic part, the pilot specific questionnaires in this section cover the latter.

4.1 SC1: Life Sciences and Health

The first pilot in SC1 “Life Sciences and Health” tries to duplicate the Open PHACTS functionality on the BDE infrastructure.

The Open PHACTS functionality is twofold: 1) having a REST-full interface to the 2) integrated RDF store containing the data relevant to drug discovery. The current Open PHACTS infrastructure uses some commercial components (e.g. the cluster version of Virtuoso for RDF storage and 3Scale for delegating the API requests to the reasoner). The “BDE Open PHACTS” pilot will use open-source solutions for RDF reasoning and will provision the API via the BDE infrastructure (cf. Figure 4).

D6.1 – v. 1.0

Figure 4: Architecture of the first SC1 pilot

Key evaluation questions:

1. Did you manage to store all the RDF data and answer curated queries in a reasonable time?

2. Were you able to fill the Puella SPARQL templates with the HTTP-GET parameters to execute RDF queries on the 4Store DB?

3. How many of the 21 Open PHACTS research questions2 were you able to answer?

Other evaluation questions based on the requirements specified in D5.2:

Table 1: Evaluation questions for the first SC1 pilot

Requirement Evaluation questions

R1 The solution should be packaged in a way that makes it possible to combine the Open PHACTS Docker and the BDE Docker to achieve a custom integrated solution.

Is SWAGGER suitable for specifying a REST API on a dynamic distributed environment like the BDE infrastructure?

R2 RDF data storage. What are the experiences with the Docker version of the open-source Virtuoso software with respect to this pilot?

2 http://www.openphacts.org/documents/registered/deliverables/D%206.1%20Prioritised%20Research%20Questions_final%20version.pdf

D6.1 – v. 1.0

R3 Datasets are aligned and linked at data ingestion time, and the transformed data is stored.

How does the BDE infrastructure communicate with the external IMS provider? What is the difference in delay between the current OPS system and the pilot version?

R4 Queries are expanded or otherwise processed and the processed query is applied to the data.

Were there any changes in the SPARQL templates needed due to the transition to another RDF store (it’s known that some providers include extra ‘shortcuts’ and functionality next to the SPARQL standard)?

R6 Data and query security and privacy requirements.

Are there currently vulnerabilities in the BDE infrastructure that might reveal any sort of communication to a 3rd party (e.g. queries and results, or ip addresses)?

4.2 SC3: Energy

The first pilot in SC3 “Energy” is about operation, maintenance and production forecasting for wind turbines on real-time sensor data.


D6.1 – v. 1.0


1. Were you able to store and retrieve the binary blobs of temporal slices of the sensor data by using the HDFS module from the BDE?

2. Were the processing algorithms able to efficiently work with the HDFS module?

3. Please describe the positive and negative points of the first GUI for visualizing and querying the (derived and raw) data.




R1 The data will be sent (via ftp, or otherwise) from the intermediate (local) processing level to BDE.

How was the data transferred to the BDE HDFS module and/or Cassandra module?

R2 The application should be able to recover from short outages by collecting the data transmitted during the outage from the data sources.

Did you simulate ‘outages’? If yes, was the data connector able to request the missing data?

R3 The analysis software will be developed in Python.

Please describe how Spark is fulfilling your analysis requirements. What are the positive and negative experiences?

R4 Weekly execution of model parameterization and operational statistics.

The development of the analysis tool is one of the key components for this pilot. Did you use any of the BDE scheduling tools for initiating the weekly executions? Were the statistical results according to your expectations?

R5 Near-real time execution of parameterized models to return operational statistics, including correlation analysis of data across units.

Was the BDE infrastructure able to successfully process the near-real time data analysis? Please elaborate.

R6 Flexibility of data file input formats for future applications.

Please specify the formats and data models that the ingestion component supports after delivering this first pilot. How did you test the correctness of these results (e.g. which external tools were able to read the results successfully)?

R7 The GUI supports database querying and data visualization for the analytics results.

Please describe the experiences with the GUI. In particular, which tasks you are able to do via this GUI?

D6.1 – v. 1.0

4.3 SC5: Climate

The first pilot in SC5 “Climate” tries to develop a tool for a downscaling and retrieval process on (raw) climate data via user-defined parameters (e.g. geographical areas, time period, physical variables, computational grids, time steps).



1. Were you able to search the relevant datasets using thematic keywords? Which keywords did you use?

2. Were you able to set the geographic and temporal scope for creating datasets?

3. Which other parameters were you able to set, and which units could you specify, for creating a down-scaled model?




R1 Provide a means for querying ECMF, ESGF and local metadata using thematic and geo-temporal criteria.

Which metadata services did you integrate via Semagrow? What are your positive and negative experiences with Semagrow?

D6.1 – v. 1.0

R2 Download ECMF, ESGF datasets. How was the data added to the BDE data store (e.g. FTP, HTTP-POST, …). Did you use any of the generic BDE components?

R3 Download and upload NetCDF datasets from and to the institutional infrastructure hosting the WRF modeler.

Was the Big Data Integrator able to convert the data into NetCDF? Please elaborate.

R4 Orchestrate the remote running of WRF, via submitting jobs or otherwise. Preserve basic lineage information of the resulting datasets.

How did you submit jobs to WRF? Did you use any of the generic scheduling modules provided by the BDE infrastructure?

R5 Dataset transformation: selecting and combining datasets.

What is the interface for querying sub-sets (e.g. command line, and API, a GUI)? Were the results satisfactory? Was the processing time according to your expectations?

R6 Dataset transformation: re-scaling and unit/vocabulary translation.

Which re-scaling operators were not supported by the query language but were necessary for your task? Were you able to perform the operation via another tool? If yes, which one and how?

4.4 SC7: Security

The first pilot in SC7 “Security” is focused on getting insight in man-made surface changes triggered by automatic detection, news, or social media information. News sites and social media are monitored and processed in order to extract and localize information about events. Events are categorized as indicative of hot spots and localized to a hot spot area. For these areas current and earlier satellite images are downloaded and processed in order to detect changes. The end-user is notified about detected changes and can view the images and event information about this area.

D6.1 – v. 1.0



1. Was the generic BDE HDFS module able to store all the satellite images relevant to this pilot? And how did you retrieve them (ie. annotate them with relevant meta-data)?

2. Was Cassandra the right choice for storing news and tweets? How did you index the content?

3. What triggered the ‘event-detection’ that lead to initiating the relevant satellite image comparison procedures?

4. How satisfied are you with Strabon for storing the meta-data (e.g. the geo-locations)?

5. Did you use Spark and/or Flink for detecting changes? If yes, how? If no, why not and which technology did you use?



R1 Monitor multiple text services (Twitter and Reuters). Text is retrieved and stored together with provenance and any metadata provided by the service (notably, location).

Was the adoption of the NOMAD connectors for the Twitter and Reuters data successfully able to store the data into Cassandra, including provenance and other metadata like locations?

D6.1 – v. 1.0

R2 Regularly execute event detection on a single thread over the most recent text batch.

Did you use the generic BDE Scheduling module to automatically trigger the event detection pipeline?

R3 Download images for a given hotspot from the Sentinel service.

What is your experience with the Sentinel data connector to store image data for a given GIS location?

R4 Hotspots are manually defined by the user by selecting a map area.

Please describe your experiences with the GUI developed for selecting Hotspots.

R5 Hotspots are automatically defined by event detection.

Please look at the false-positives and false-negatives that the automatic event detector identified. What are your recommendations for improving the detection algorithm?

R6 Change detection will experiment with Spark and Flink implementations.

How do the state-of-the-art detection implementations compare to the performance of the Spark and Flink versions? Which of these implementations is the simplest to read, write and debug?

R7 Change detection and event detection store locations of changes in a Strabon database.

Is Strabon able to join geographical events?

R8 End-user interface is based on Sextant. What is your opinion on presentation layer that combines the text- and image analysis output?

5. Conclusion

This report provides the evaluation plan for the first cycle of pilots for the seven Societal Challenges. Deliverable 5.2 identified four challenges to have a feasible plan for an implementation in this first cycle. For these pilots this report outlined an evaluation plan, together with an overall evaluation plan for the generic BDE infrastructure. The main goal of this document is to have a practical and to-the-point open set of questions of which the answers are specific enough for making clear decisions for the next development cycles with the purpose of having a generic and useful BDE infrastructure at the end of this project.

deliverable 6.1: piloting plan and evaluation · deliverable 6.1: piloting plan and evaluation...

Documents