waterp water enhanced resource planning · the pim registers the pilots’ sensors in the sos...

24
WatERP Water Enhanced Resource Planning “Where water supply meets demand” GA number: 318603 WP 7: Pilots 7.3.2: Implementation of the Water Data Warehouse (WDW) V1.0 11/03/2015 www.waterp-fp7.eu Ref. Ares(2015)1494710 - 07/04/2015

Upload: hoangdung

Post on 12-Nov-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

WatERP

Water Enhanced Resource Planning

“Where water supply meets demand”

GA number: 318603

WP 7: Pilots

7.3.2: Implementation of the Water Data Warehouse (WDW)

V1.0 11/03/2015

www.waterp-fp7.eu

Ref. Ares(2015)1494710 - 07/04/2015

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 2 of 24

Document Information

Project Number 318603 Acronym WatERP

Full title Water Enhanced Resource Planning “Where water supply meets demand”

Project URL http://www.waterp-fp7.eu

Project officer Grazyna Wojcieszko

Deliverable Number 7.3.2 Title Implementation of the Water Data Warehouse (WDW)

Work Package Number 7 Title Pilots

Date of delivery Contractual M28 Actual M30

Nature Prototype Report X Dissemination Other

Dissemination

Level Public X Consortium

Responsible Author Johannes Kutterer Email [email protected]

Partner DISY Phone (+49) 721 1 6006-286

Abstract

(for dissemination)

Implementation of the Water Data Warehouse (WDW): This Task is implementing the central

database of the WatERP platform in the two pilot areas. The main activity in this task is to put

together the methods, reference data models and tools developed in WP3, to apply them to the

particularities of the two pilot areas, and to deploy them for the pilot users ACA and SWKA. Then,

the deployed technologies are thoroughly tested and examined with real-world data and usage

conditions.

Key words Water Data Warehouse (WDW), Pilot Integration Manager (PIM), WaterML2.0, SOS, software

integration, pilot implementation

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 3 of 24

Glossary of acronyms

ACA Agencia Catalana del Agua (trans.

Catalan Water Agency)

CUAHSI Consortium of the Universities for the

Advancement of Hydrological Science

Inc.

DMS Demand Management System

DRL Data Retrieval Language

DSS Decision Support System

DMZ Demilitarized Zone

HIS Hydrologic Information System

Hydro DWG Hydrology Domain Working Group

OWL Web Ontology Language

MAS Multi Agent System

O&M Observations and Measurements

ODM Observation Data Model

OGC Open Geospatial Consortium

OMP Open Management Platform

OWL Web Ontology Language

PIM Pilot Integration Manager

REST Representational State Transfer

SOS Sensor Observation Service

SensorML Sensor Markup Language

SWE Sensor Web Enablement

SWKA Stadtwerke Karlsruhe

WaterML Water Markup Language

WDTF Water Data Transfer Format

WDW Water Data Warehouse

XML Extensible Markup Language

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 4 of 24

Executive Summary

This document describes the activities to implement the WDW on the pilot site. It describes what sensor

observation data is currently stored on the pilot site and how this data is accessed and imported into the

WDW.

To fully understand this document, the following deliverables have to be read.

Number Title Description

D1.3 Generic Ontology for water

supply distribution chain

This deliverable summarizes the ontology including the scope, purpose and

implementation language to be used.

D1.4.1 Inference and Simulation

Engine Conceptual Design

This deliverable depicts the architecture of the Decision Support System

including a behavioural definition of the inference and simulation engine.

D2.1 External System Integration

requirement

Comprehensive review of generic water supply - distribution chain was

undertaken in order to understand general requirement for system

integration and interoperability to be performed within the WatERP project

development.

D3.1 WDW Conceptual Design

Report summarizing architectural design of the water data warehouse used

to integrate all data relevant for the WatERP project and the interfaces used

for data integration.

D3.2 WDW 1st Prototype

Description of the implementation, installation and usage of the Water Data

Warehouse.

D3.3 WDW 2nd

Prototype Description of the implementation, installation and usage of the Water Data

Warehouse.

D7.3.1

Implementation of the

Water Data Warehouse

(WDW)

Description of the implemention of the WDW on the pilot site.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 5 of 24

Table of contents

WATERP ................................................................................................................................................... 1

WATER ENHANCED RESOURCE PLANNING ...................................................................................... 1

“WHERE WATER SUPPLY MEETS DEMAND” ..................................................................................... 1

GA NUMBER: 318603 .............................................................................................................................. 1

WP 7: PILOTS ........................................................................................................................................... 1

7.3.2: IMPLEMENTATION OF THE WATER DATA WAREHOUSE (WDW) .......................................... 1

DOCUMENT INFORMATION ................................................................................................................... 2

GLOSSARY OF ACRONYMS .................................................................................................................. 3

EXECUTIVE SUMMARY .......................................................................................................................... 4

TABLE OF CONTENTS ............................................................................................................................ 5

TABLE OF FIGURES................................................................................................................................ 7

1. INTRODUCTION ............................................................................................................................... 8

1.1 WATERP OVERVIEW.................................................................................................................................... 8

FIGURE 1 "WATERP ARCHITECTURE FROM THE INTEGRATION POINT OF VIEW" ...................... 9

1.2 STANDARDS FOR INFORMATION INTERCHANGE ................................................................................................ 9

1.3 SCOPE OF CONSOLIDATION AND INSTALLATION PHASE ..................................................................................... 9

2. BASIC CONCEPTS ........................................................................................................................ 11

FIGURE 2 "INTEGRATION OF EXISTING PILOT INFRASTRUCTURE" ............................................. 11

FIGURE 3 "PILOT SITE ACCESS BY THE WDW AND FURTHER PROCESSING" ........................... 12

3. ACA SITE INTEGRATION .............................................................................................................. 13

FIGURE 5 “CURRENT INSTALLATION SCENARIO FOR ACA" ......................................................... 13

3.1 DATA STATISTICS ...................................................................................................................................... 14

4. SWKA SITE INTEGRATION ........................................................................................................... 15

4.1 EXISTING DATA INFRASTRUCTURE ................................................................................................................ 15

4.2 SWKA PILOT INTEGRATION MANAGER ........................................................................................................ 16

4.2.1 Data Access ....................................................................................................................................... 16

4.2.2 Data Mapping ..................................................................................................................................... 17

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 6 of 24

FIGURE 10 "FLOW CHART OF CLIENT APPLICATION" .................................................................. 18

4.2.3 Data Export ........................................................................................................................................ 18

4.3 DATA STATISTICS ...................................................................................................................................... 18

4.4 NEXT STEPS ............................................................................................................................................. 19

5. PERFORMANCE ............................................................................................................................ 20

5.1 CANDIDATES FOR PERFORMANCE TESTING ................................................................................................... 21

5.2 PERFORMANCE ACA ................................................................................................................................. 22

5.3 PERFORMANCE SWKA .............................................................................................................................. 23

6. CONCLUSIONS AND FUTURE WORK ......................................................................................... 24

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 7 of 24

Table of figures

FIGURE 1 "WATERP ARCHITECTURE FROM THE INTEGRATION POINT OF VIEW" .................................................................... 9

FIGURE 2 "INTEGRATION OF EXISTING PILOT INFRASTRUCTURE" ....................................................................................... 11

FIGURE 3 "PILOT SITE ACCESS BY THE WDW AND FURTHER PROCESSING" ..................................................................... 12

FIGURE 4: "ACA SENSOR DATA IN EXCEL FILE " ............................................................................................................ 13

FIGURE 5 “CURRENT INSTALLATION SCENARIO FOR ACA"............................................................................................... 13

FIGURE 6 "DATABASE TABLE FROM PILOT SITE SWKA" ................................................................................................... 15

FIGURE 7 "SWKA PUMP DATA IN ORACLE" .................................................................................................................... 15

FIGURE 8 “EXCEL FILE FOR SWKA SENSOR INFORMATION” ............................................................................................ 16

FIGURE 9 “EXCEL FILE FOR SWKA LOCATION INFORMATION” .......................................................................................... 16

FIGURE 10 "FLOW CHART OF CLIENT APPLICATION" ...................................................................................................... 18

FIGURE 11 "CURRENT INSTALLATION SCENARIO FOR SWKA" ......................................................................................... 19

FIGURE 12 RESPONSE TIMES OF THE SOS SERVER OVER DATASET SIZE ........................................................................... 20

FIGURE 13 COMPARISON OF RESPONSE TIMES OVER DATASET SIZE .................................................................................. 21

FIGURE 14 FLOW CHART OF THE PIM AND WDW COMPONENTS WORKING TOGETHER. THE CANDIDATES FOR PERFORMANCE

TESTING ARE MARKED AS DASHED AND DOTTED ARROWS. ....................................................................................... 22

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 8 of 24

1. Introduction

This document describes the activities to implement the Water Data Warehouse (WDW) in the two pilot

sites. It shows what actions have been taken and which tools had to be implemented in order to

integrate and make use of the existing data structures. The technical descriptions of tools that were

implemented can be found in more depth in D3.3-“WDW 2nd

Prototype” and in D3.4 “WDW Final

Prototype” which is scheduled for delivery on month 30.

This document will first give a very brief overview over the WatERP systems and will explain how pilots

are integrated from the architectural point of view. Next, the document gives a detailed description of

each pilot’s data infrastructure and what strategy has been implemented to tie the client to the WDW

platform. Finally, the document will summarize the state that has been reached so far and explain the

future steps that still have to be taken.

1.1 WatERP overview

As also described in D3.2-“WDW 1st Prototype” and D3.3-“WDW 2nd

Prototype” the WDW is the central

component of the WatERP software and data infrastructure. Its main function is to work as a reliable

and durable data basis for all other components, providing both data needed for analyses or other

functions as well as automated routines to incorporate new data sources and datasets at any given

time. The content is available for the analysis clients via the Multi Agent System (MAS). The WDW

provides ontology information, semantically derivable inferences, as well as observation results.

Figure 1 gives an overview over the different layers of the WatERP architecture and, especially, shows

how the layers interact. A more detailed description of the WatERP architecture can be found in D2.3-

“Open Interface Specification” chapter 3 “WatERP architecture”.

The integration of the pilot data is realized via an SOS server that contains both ontology metadata and

observation results.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 9 of 24

Figure 1 "WatERP architecture from the integration point of view"

1.2 Standards for information interchange

As it was decided in the D2.1-“External systems” chapter 5, all exchange of observation data is based

on WaterML2 time series. This standard provides an XML based combination of ontology and

observation results. For the pilot integration this means that the sensor data has to be transformed into

WaterML2 and stored into an SOS server that is accessed from the WDW to import the time series.

This way the WDW has only to deal with a single data format when it accesses the pilot infrastructure.

Every transformation and mapping has to be accomplished by the Pilot Integration Manager (PIM)

which is implemented depending on the pilots’ needs.

1.3 Scope of consolidation and installation phase

Whereas the first phase was focused on proof-of-concept and a first vertical integration of the PIM and

WDW, the second phase focused on the consolidation of the concepts and tools and the testing of the

complete setting of PIM and WDW. The integration of all components was tested by an installation on a

preproduction server. This deliverable covers consolidaton and further development of each pilot.

WDW

SOS server

SOS client

On

tolo

gy

MAS

DSS

OGC WPS server

DMS

OGC WPS server

Hydrological forecast

OGC WPS server

SWKA Pilot

OGC SOS Server

ACA Pilot

OGC SOS Server

WatERP Framework

Pilot Data integration

Building Block Integration

OMP

WaterML2

WaterML2, OWL, drl WaterML2, OWL WaterML2

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 10 of 24

Further it disscuses and highlighs changes in the concept that has been reworked from the first

integration phase.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 11 of 24

2. Basic Concepts

Section 2 of WatERP deliverable D7.3.1 presented the concepts and tools required for installing the

WDW software infrastructure in a pilot site.

There, the role of the Pilot Integration Module (PIM) that transports data from the pilot’s sensor

infrastructure and legacy databases to the WDW server through the SOS communication protocol using

the WaterML2 data model was described.

Figure 2 "Integration of existing pilot infrastructure"

Then, at the two pilot sites, the following preparatory tasks had to be performed:

(1) Data access – which is about getting access to the operational data without compromising

security, stability and efficiency of the existing operational systems.

(2) Data Mapping – which is about transforming the data formats in use internally in the pilot’s

existing systems into the OGC SOS WaterML2 format, and registering the required SOS

sensor.

(3) Data Import – which is about importing concrete data from the pilots’ existing systems and

about moving them to the SOS server.

After these preparatory steps, the following concrete activities are performed in order to access the pilot

site, transfer data and process the WaterML2 documents (see Figure 3):

1. The PIM registers the pilots’ sensors in the SOS server and adds the observations.

2. The sensors registered in the SOS server are also registered in the WDW.

3. The WDW polls the SOS server for new observation results and stores the observations in the

WDW SOS server.

4. The SOS data is also used to populate the entities in the triple store.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 12 of 24

Figure 3 "Pilot Site Access by the WDW and Further Processing"

More details on the technical realization of these steps have been given in deliverable D7.3.1.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 13 of 24

3. ACA site integration

In deliverable D7.3.1, it has already been shown:

- How the WaterOneFlow format previously used in ACA has been mapped to WaterML2.

- Which formats are provided by ACA’s WaterOneFlow Web Service interface.

- How the ACA Pilot Integration Manager is configured.

After realizing these first steps of pilot integration, the conversion from WaterOneFlow to WaterML2 had

to be adjusted. For each variable there had to be created a mapping to the matching offering and

observed property.

ACA has introduced an Excel sheet with information about the sensors. A snapshot of this Excel sheet

is shown in the Figure 4. This file contains all the required information to fill the ACA database and there

is no more need to extract the variables of site and variable combinations from the WaterOneFlow Web

service. Next step is to fill the PostgreSQL database for ACA with this Excel file as it is done for SWKA.

Figure 4: "ACA Sensor Data in Excel File "

The polling start time, which defines the extent of historical data available in the WDW, must be set

according to the requirements of the analysis. After installation of the ACA pilot integration module and

the pilot SOS server, the system will start to poll and fill the SOS database.

Then ACA will be prepared to be integrated into the WDW for further analysis. As the WaterOneFlow is

already available externally, a first integration in WatERP is possible without any installations on the

pilot site. Figure 5 shows the current installation scenario.

Figure 5 “Current Installation Scenario for ACA"

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 14 of 24

3.1 Data Statistics

For ACA there are 39 sensors provided in the Excel sheet. Each sensor has Feature of Interest

(FOI), Phenomenon, Procedure, type of Observation, Description, Site Code,

Variable ID and Network. Within Network a site code becomes a siteID which is used in the

GetSiteInfo web method of the WaterOneFlow web service as explained in D7.3.1 to get data about

that site such as name, geographic location, and available variable codes with related variable

information. A siteID with variable code is used in the GetValues web method to get the time-series

data for that site. This time-series data is classified as observation which goes in the Observation

table on the SOS server. The refresh rate of each sensor is different. The amount of time-series data for

ACA per month is between 2,000 and 3,000 rows in the SOS datatable.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 15 of 24

4. SWKA site integration

4.1 Existing data infrastructure

In deliverable D7.3.1, the existing data infrastructure at SWKA, which basically consists of a data

warehouse where all the sensor-network results are fed into, was already discussed.

Up to now, the work to analyze and process the data from the SWKA pilot site has been based on an

export provided in the format of an Oracle database dump file1 that was created by SWKA containing a

snapshot of the observation results. The file was imported in an Oracle2 database on the development

site to implement the data access. From the example table depicted in Figure 6, it can be noted that the

sensor data, with other data fields such as quality and rate, is a series of time and value pairs. In this

table, PP_ID is the sensor ID. Indeed, each sensor has different readings at different times. Thus, the

first level of integration is to convert the data into WaterML2, so that the data can be stored into an SOS

server. To accomplish this task, the entities identified in WP4 for SWKA that have to be part of the

WaterML2 document, are first mapped as described in the next subsection. SWKA has introduced

another Oracle database dump file with the time series data of pumping as shown in Figure 7.

Figure 6 "Database table from pilot site SWKA"

Figure 7 "SWKA pump data in Oracle"

1 Data Pump Export http://docs.oracle.com/cd/B28359_01/server.111/b28319/dp_export.htm#i1006388

2Oracle Database (”http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index.html”)

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 16 of 24

SWKA also has provided two Excel files: one with sensor information and another one with the location

information. Both are shown in Figure 8 and Figure 9, respectively. The idea is to store the information

from these Excel files into PostgreSQL datatables so that this information can be retrieved through PIM

on demand as explained below.

Figure 8 “Excel file for SWKA Sensor Information”

Figure 9 “Excel file for SWKA Location Information”

The logical model of SWKA and the mapping of the pilot’s internal data structures to WaterML2 have

already been explained in Section 4.2 of deliverable D7.3.1.

4.2 SWKA Pilot Integration Manager

This section is focused on defining the whole process of converting the data into WaterML 2, adding

sensor and sensor observation data from SWKA database onto SOS server by PIM (Pilot Integration

Manager). The Pilot Integration Manager is a console application where a user can interact through the

Application Initiation wrapper. It allows three main operations: 1) Select Excel files; 2) Register Sensors;

3) Insert Observations. The design of PIM for SWKA is extended from the previous version. The PIM

performs three main tasks, namely 1) Data Access; 2) Data Mapping; 3) Data Export. These tasks are

explained in the following subsections. A flow-chart diagram is shown below to describe these

implementational steps.

4.2.1 Data Access

The PIM consists of a wrapper module for each step of the process. The database wrapper performs all

the tasks related to data access and data provision. The database wrapper contains many classes and

each class performs a separate task. First task of the database wrapper is to send the sensor and

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 17 of 24

location information from Excel files to the PostgreSQL database tables. For this purpose, PIM allows

the user to select the Excel files for both informations from the system. When the data is inserted into

the system it asks the user to enter the datatable names from the Oracle database if the user wants to

insert observations. On receiving this information from the user, the database wrapper takes the

sensorID from the PostgreSQL table along with other sensor information such as FOI, Procedure,

Phenomenon, B1BEZ, B2BEZ, B3BEZ, ELEMBEZ and location. With this information it checks if

sensorID corresponds to a pump. If it is for a pump, it goes to the pump table in Oracle and checks if

the pump is on or off. If it is on it goes to the sensor Oracle table and gets the time series for the

sensorID for that time when the pump was on. If the pump is off it records the time. All these

timeseries data is brought back and provisioned to the Data Mapping modules that are explained in

next subsection.

4.2.2 Data Mapping

There are three wrappers to perform Data Mapping in SWKA PIM, namely; 1) XML schema wrapper; 2)

XSLT wrapper; and 3) Transformer wrapper. Each wrapper contains many classes with separate

functionality. The XML schema wrapper provides XML schemas files for WaterML 2.0 documents and

registerSensor document on demand when it is required by the Transformer wrapper. The XSLT

wrapper provides stylesheets for InsertObservation, RegisterSensor, and WaterML 2.0

documents when these are required by Transformer Wrapper to replace the values of objects defined in

it. These operations are defined in Document 3.3 “WDW 2nd

prototype” in more detail. The Transformer

wrapper takes the objects from XML Schema files and fills them in by replacing values in XSLT files.

This way it is the Transformers wrapper’s job to generate XML files for WaterML 2.0,

RegisterSensor and InsertObservation documents. But these files are not exported yet. This

task is defined in the next subsection.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 18 of 24

Figure 10 "Flow Chart of Client Application"

4.2.3 Data Export

The last task of SWKA PIM is to export standardized documents generated in Data Mapping. This is

done with the help of the SOS wrapper. The SOS wrapper uses HTTP to send a RegisterSensor

document on to the SOS server. When a sensor is registered the console brings back the confirmation

of registration. If the sensor is already present on the SOS server, it gives response of sensor present

already registered. Once a sensor is registered Observations can be inserted on to the SOS server for

that sensor. The user can select InsertObservation operation from console. SOS server uses

HTTP to insert the first observation and insert and update operations of SOS server database for the

rest of the observations. If the observation is not present on the SOS server it is inserted. Otherwise it is

updated.

4.3 Data Statistics

There are 24 sensors provided by SWKA in an Excel sheet. Each sensor has FOI, Phenomenon,

Procedure, Water Resource, Observation type and VariableID. VariableID is a sensor

ID which refers to PP_ID in the Oracle database. FOI describes if the sensor is related to pump or

MainReservoir. If the FOI is a pump the time-series data for the pump including pressure, flow, etc. is

taken from both the Pump Oracle table and Sensor table; otherwise it is taken from the Sensor Oracle

table alone. In the Sensor table, the observation for each sensor refreshes every minute. In the pump

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 19 of 24

table the sensor is turned to on and off at irregular times. The observations are required to download

only when the pump is on. The data for SWKA for all sensors in a month are approximately 18,000

rows.

4.4 Next Steps

Next steps for SWKA include adding all the sensors from pilot SOS to WDW SOS. To perform this step

for the moment, a script is being used to send data through the REST client. This script sends one

sensor at a time. This procedure will be replaced by a more efficient one to send all sensors at once.

The strategy adopted for the SWKA PIM is very different to the one applied at ACA where it is

reasonable to set up a poller that periodically queries the WaterOneFlow server. In SWKA the situation

is different as new data will only be available once a new export has been triggered in the data

warehouse. Therefore, an application which can be explicitly started after the export has been finished

successfully which is preferable to a system which polls in fixed time periods.

Figure 11 shows the current installation scenario according to the architecture described. The pilot SOS

server is installed inside the “Demilitarized Zone” (DMZ) of the SWKA network whereas the WDW runs

on BDIGITAL’s preproduction server.

Figure 11 "Current Installation Scenario for SWKA"

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 20 of 24

5. Performance

The PIMs have to perform three main tasks as described in chapter 2 which are relevant for the

performance:

1. Accessing the data from the site infrastructure

2. Mapping the data to OGC SOS WaterML2 format

3. Storing the mapped data in the PIM SOS server

The execution of these tasks has to be done without jeopardizing the stability, security and effiency of

the PIMs and the data sources. The performance of each of these steps can vary greatly for different

PIM site integrations.

An example of possible difficulties was identified in the performance review of the WDW in deliverable

3.3 (WDW – 2nd

prototype). The WDW performs the steps 1 and 3 when importing data from the PIM

SOS server into its own SOS server. Figure 12 shows a comparison of the response time of the SOS

server against the number of data points between WaterML2 and CSV format.

Figure 12 Response times of the SOS server over dataset size

It is evident that the generation of WaterML2 is a severe bottleneck in the 52 North SOS server. A more

specific analysis showed a large memory consumption caused by holding all data points in memory

during the WaterML2 generation. To mitigate this problem a specific polling strategy was devised that

limits the size of the queried data by creating multiple smaller requests. The impact is illustrated in

Figure 13 which shows a linear time consumption with this strategy (WaterML2*, green). The packet

size is manually configured in the PIM by determining the time interval which is queried for

observations.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 21 of 24

Figure 13 Comparison of response times over dataset size

From the example of the WDW server it is clear that great care is needed when polling data from

external sources. This is especially true when large time series are involved, e.g. by adding new

sensors.

5.1 Candidates for performance testing

Figure 14 shows the flow of data from the source to PIM and finally to the WDW. During PIM-internal

communication (where the PIM stores the data into the PIM-SOS database), normally no performance-

critical operation is performed. The speed of writing the data is only limited by the database

performance. However, the communication from the PIM to the source of the pilots could be a

bottleneck. This had been already shown in the Deliverable 3.3 for the SOS-Response in WaterML2 in

the WDW (see also figure 13 and 14). This potential bottleneck arises when importing large numbers of

observations at once (e.g. several Hundreds of Thousands of observations) – which typically arises only

when setting up the system and importing the legacy database or when registering a new sensor that

has already collected data for some period of time. The potentially performance-critical connections are

discussed in the next two sections for ACA and SWKA.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 22 of 24

Figure 14 Flow chart of the PIM and WDW components working together. The candidates for performance

testing are marked as dashed and dotted arrows.

5.2 Performance ACA

For the ACA PIM all time-series data is downloaded from a web service delivering WaterOneFlow data

(communication path (1) in Figure 14). From the WatERP project point-of-view this WS is considered an

operational ACA software used as a black-box and out-of-reach for manipulation by the WatERP team.

So, one can speculate about its internal implementation, strengths and weaknesses, but definitely, any

polling behaviour that might destabilise this server should be avoided. Further, it should not be

expected that the server connection is always completely reliable. Consequently, for data polling from

the WaterOneFlow server, the same packeting mechanism as in the WDW service is employed. This

realizes a conservative communication strategy that does not overstrain the web server at ACA side.

For the following lines one should bear in mind that the WaterOneFlow is an external web service. The

testing of PIM should be run without overloading the service. Therefore currently five concurrent threads

are concerned with polling data, each of them processing a single sensor at a time. Only after retrieving

all available data for that sensor, the thread will advance to the next sensor. For large time series

(which normally only occur when a new sensor is registered and legacy data is imported) this may take

a significant amount of time. If more than five of these time-consuming jobs (i.e. more than five new

sensors with more than several tens of thousands observations) occur, retrieval of data from other

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 23 of 24

sensors would be blocked by these operations for some time. This may become a problem especially

when a running WatERP system with important real-time analyses is on-the-fly extended by new

sensors with large datasets such that the import of the legacy data block the polling of an already-

registered operational sensor such that the real-time usage of this sensor’s data is delayed for some

time. It is possible to increase manually the thread count of the poller in the configuration of the PIM .

Further, one may increase the size of the data blocks to be retrieved (which leads to less

communication and administration overhead, but may lead to the above-explained problems with the

memory management of the SOS server). Such actions need careful adjustment so that the web

service is not overstrained. But all such fine tuning is dependent on the concrete hardware and software

situation in the actual running environment and has been chosen cautiously for the prototypes in the

pilot environments. For an operational solution, this has to be adjusted to the conditions of that

environment. .

During the first test runs with a fully operational PIM-WDW set-up and real-life data of ACA, one sensor

was registered in the PIM. In one hour, about 50,000 observations could be transferred. Hence, there is

no problem to import all existing ACA legacy data in a reasonable amount of time (for the ACA data

volume, refer to Subsection 3.1).

5.3 Performance SWKA

For the SWKA PIM all time-series data resides in a local database. Therefore no performance issue is

expected when retrieving the data. Here the main issue lies in mapping the data and storing it into the

SOS server. Again this import is a database operation which does not have the limitations of a web

service.

During the first test runs with a fully operational PIM-WDW set-up and artificial test data for SWKA

simulating 50 observations for each of the 24 sensors, the import took about 2 seconds in total. Hence,

integration of the full SWKA legacy database (see Subsection 4.3) will be no problem.

Ref. 318603 - WatERP D7.3.2_Implementation_of_WDW_v1.0 page 24 of 24

6. Conclusions and future work

This section summarizes the activities performed so far and describes the next steps.

In the first version of the WDW piloting, the existing data infrastructure were analysed and the sources

for observation result data were identified for both clients, ACA and SWKA. In this second piloting

phase, for both pilots the access to the pilot data has been implemented. The implementation of the

data access layer has been fully completed for both pilots and small adaptations have been realized

compared to the first version of WDW piloting.

Regarding the mapping of the pilots’ data to the ontology, an additional mapping of the WaterML1 data

to WaterML2 has been realized for ACA. Here, especially observed property and offering had to

be adjusted. Afterwards the list of required sensors has been defined and added to the sensor list that

controls the polling. For SWKA, the missing information for mapping has been acquired in the reporting

period and was used to finalize the mapping work, in particular the mapping of the sensor PP_IDs to

WaterML2.

The generation of the WaterML2 documents has been implemented for both pilots and constitutes the

basis for generating the ontology within the WDW. SOS clients for registering new sensors and

inserting observations have been completed for both pilots and have been integrated into the overall

architecture of the pilot hardware and software environments.

After all, the technical infrastructure for WDW filling from pilots’ real-world data has been fully

completed, installed in the pilot sites and exemplarily tested with pilot data. In the last phase of the

project, a regular data flow between pilots and WDW will be established and tested for stability and

performance. To find possible performance problems on the PIM there should be some performance

testing on retrieving the data from the pilots and store it in the SOS Server database. Therefore some

performance measurements for the SWKA PIM are planed. The data from SWKA should be read from a

database on the local machine. Then the PIM should be monitored while writing the data in the

PostgreSQL database of the SOS server. Regarding the WDW piloting, the last project phase will not

only be focussed on storing data from the pilot sites, but also accessing them for analysis purposes and

serving as a data layer for other WatERP tools developed in WP4 and WP5.