d6.3 state of the art - databio data-driven …...2017/12/29  · d6.3 – state of the art h2020...

78
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee. Project Acronym: DataBio Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action) Project Full Title: Data-Driven Bioeconomy Project Coordinator: INTRASOFT International DELIVERABLE D6.3 – State of the Art Dissemination level PU -Public Type of Document Report Contractual date of delivery M12 – 31/12/2017 Deliverable Leader VTT Status - version, date Final – v1.0, 29/12/2017 WP / Task responsible WP6 Keywords: Big data, data analytics, bioeconomy, agriculture, forestry, fishery, earth observation

Upload: others

Post on 24-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.

Project Acronym: DataBio

Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action)

Project Full Title: Data-Driven Bioeconomy

Project Coordinator: INTRASOFT International

DELIVERABLE

D6.3 – State of the Art

Dissemination level PU -Public

Type of Document Report

Contractual date of delivery M12 – 31/12/2017

Deliverable Leader VTT

Status - version, date Final – v1.0, 29/12/2017

WP / Task responsible WP6

Keywords: Big data, data analytics, bioeconomy, agriculture,

forestry, fishery, earth observation

Page 2: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 2

Executive Summary Big data technologies have shown significant benefits in many sectors of society, as diverse

as manufacturing, business management and health science. This report looks at the state of

the art of big data technologies and their application in bioeconomy, i.e. the parts of the

economy that use renewable biological resources from land and sea – such as crops, forests,

fish, animals and micro-organisms – to produce food, materials and energy. The DataBio

project in particular, addresses agriculture, forestry and fishery, where it aims to advance the

use of big data technologies by implementing several pilot demonstrations.

The purpose of the document is to provide an overview for the general public and non-expert

readers of recent developments in big data and highlight opportunities of how it could serve

the bioeconomy sector in the near future. The document is structured as follows:

Chapter 3 of the document includes an overview of general big data challenges and

opportunities. Chapter 3.1 introduces the concept of big data in general and chapter 3.2

introduces the use of big data in the bioeconomy sector. Big data management, analysis and

visualisation are discussed in chapters 3.3, 3.4 and 3.5 respectively. Finally, chapter 3.6

introduces big data frameworks and infrastructures.

Chapters 4, 5 and 6 go into more detail covering big data in agriculture, forestry and fishery

from the perspectives of the DataBio pilots.

Due to the different backgrounds and target applications in each pilot application, these

chapters present the state of the art from slightly different perspectives.

Chapter 7 concludes with an outlook to future opportunities in big data technologies.

Page 3: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 3

Deliverable Leader: Göran Granholm, VTT

Contributors:

Karel Charvat, Lespro

Seppo Huurinainen, MHGS

Per Gunnar Auran, SINTEF Fishery

Juliusz Pukacki, PSNC

Caj Södergård, VTT

Renne Tergujeff, VTT

Javier Hitado Simarro, ATOS

Miguel Angel, ATOS

Fabiana Fournier, IBM

Reviewers: Nikos Marianos, NP

Irene Matzakou, INTRASOFT

Approved by: Athanasios Poulakidas, INTRASOFT

Page 4: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 4

Table of Contents EXECUTIVE SUMMARY ....................................................................................................................................... 2

TABLE OF CONTENTS .......................................................................................................................................... 4

TABLE OF FIGURES ............................................................................................................................................. 5

LIST OF TABLES ................................................................................................................................................... 6

DEFINITIONS, ACRONYMS AND ABBREVIATIONS .............................................................................................. 7

1 INTRODUCTION ...................................................................................................................................... 10

1.1 PROJECT SUMMARY ................................................................................................................................. 10 1.2 DOCUMENT SCOPE .................................................................................................................................. 13 1.3 DOCUMENT STRUCTURE ........................................................................................................................... 13

2 BACKGROUND AND OBJECTIVES ............................................................................................................ 14

3 BIG DATA OVERVIEW - STATUS, CHALLENGES AND OPPORTUNITIES ..................................................... 15

3.1 INTRODUCTION TO BIG DATA ...................................................................................................................... 15 3.2 BIG DATA IN BIOECONOMY......................................................................................................................... 18 3.3 BIG DATA MANAGEMENT ........................................................................................................................... 20

3.3.1 Earth Observation data services ................................................................................................ 22 3.4 BIG DATA ANALYTICS ................................................................................................................................ 23 3.5 BIG DATA VISUALISATION AND USER INTERACTION ........................................................................................... 27

3.5.1 Sensor data ............................................................................................................................... 30 3.5.2 Earth Observation data ............................................................................................................. 32

3.6 BIG DATA FRAMEWORKS ........................................................................................................................... 33

4 BIG DATA IN AGRICULTURE .................................................................................................................... 37

4.1 INTRODUCTION ....................................................................................................................................... 37 4.2 STATUS OF BIG DATA IN AGRICULTURE .......................................................................................................... 37 4.3 FUTURE DEVELOPMENTS ........................................................................................................................... 39

5 BIG DATA IN FORESTRY........................................................................................................................... 41

5.1 INTRODUCTION ....................................................................................................................................... 41 5.1.1 Development/optimization focus in forestry .............................................................................. 41

5.2 BIG DATA APPLICATIONS IN FORESTRY - SCOPE, IMPACT AND BENEFIT OF DIGITAL FOREST MANAGEMENT ...................... 42 5.2.1 Forest Big Data platform ........................................................................................................... 42 5.2.2 Digiroad .................................................................................................................................... 43 5.2.3 Metsaan.fi e-Service .................................................................................................................. 44 5.2.4 Wuudis Service .......................................................................................................................... 46

5.3 FUTURE DEVELOPMENTS ........................................................................................................................... 49 5.3.1 Opportunities and possible big data solutions............................................................................ 49

6 BIG DATA IN FISHERY .............................................................................................................................. 55

6.1 INTRODUCTION ....................................................................................................................................... 55 6.1.1 Vessel monitoring systems and fisheries management .............................................................. 55 6.1.2 Optimization focus in fishery ..................................................................................................... 56 6.1.3 Machine learning applications in fishery .................................................................................... 57 6.1.4 Big Data information services in fishery ..................................................................................... 58

Page 5: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 5

6.1.5 Open data providers relevant for fishery Big Data analytics ....................................................... 62 6.1.6 Fisheries and open source software ........................................................................................... 63

6.2 CONCLUSION .......................................................................................................................................... 65 6.2.1 Current impact of Big Data in fisheries ....................................................................................... 65

6.3 FUTURE DEVELOPMENTS ........................................................................................................................... 66 6.3.1 User needs and Big Data opportunities ...................................................................................... 66

7 THE FUTURE OF BIG DATA ...................................................................................................................... 68

8 CONCLUSIONS ........................................................................................................................................ 70

9 REFERENCES ............................................................................................................................................ 72

Table of Figures FIGURE 1. THE EXPONENTIAL GROWTH OF DATA [REF-01]. ............................................................................................ 15 FIGURE 2. KEY CHARACTERISTICS OF BIG DATA (BASED [REF-02])..................................................................................... 16 FIGURE 3. THE DATA-INFORMATION-KNOWLEDGE-WISDOM HIERARCHY OF ACKOFF [REF-03]. ............................................. 17 FIGURE 4. BIG DATA MANAGEMENT PROCESS [REF-08] ................................................................................................. 21 FIGURE 5. BAR CHART (A) AND PIE CHART (B)............................................................................................................... 28 FIGURE 6. HISTOGRAMS (A) AND LINE GRAPH (B). ......................................................................................................... 29 FIGURE 7. SCATTERPLOT. ........................................................................................................................................ 29 FIGURE 8. PCA (A) AND PARALLEL COORDINATES VISUALIZATION (B). ................................................................................ 30 FIGURE 9. PCA (A) AND PARALLEL COORDINATES VISUALIZATION (B). ................................................................................ 30 FIGURE 10. VISUALISATION OF SENSOR DATA. .............................................................................................................. 32 FIGURE 11. BRUSHING. THE ROUNDED AREA IS HIGHLIGHTED IN THE HISTOGRAM AND ON THE MAP. ........................................ 33 FIGURE 12. BDVA REFERENCE ARCHITECTURE WITH NUMBERS OF DATABIO COMPONENTS. .................................................. 34 FIGURE 13. NIST BIG DATA REFERENCE ARCHITECTURE................................................................................................. 35 FIGURE 14. YIELD POTENTIAL APPLICATION.................................................................................................................. 38 FIGURE 15. DATA COLLECTED BY FOREST MACHINES HELP TO EVALUATE HARVESTING CONDITIONS, FOR EXAMPLE. PHOTO: ERKKI

OKSANEN. .................................................................................................................................................. 42 FIGURE 16. FOREST BIG DATA PLATFORM WITH FOREST BIG DATA AND APPLICATION COMPONENTS

(HTTP://WWW.DATATOINTELLIGENCE.FI/FOREST-BIG-DATA.HTML). ........................................................................ 43 FIGURE 17. METSÄÄN.FI SERVICE WITH RELATED OPERATIONS AND USER GROUPS. ............................................................... 45 FIGURE 18. ENTITY OF FOREST DATA DEVELOPMENT IN METSÄÄN.FI SERVICE. SPECIFIC FOCUS ON IMPROVEMENT OF DATA MOBILITY

AND DATA QUALITY, AND E-SERVICE PROMOTION. (METSÄTIETO 2020 - KEHITTÄMISSUUNNITELMA). ............................. 46 FIGURE 19. TRAGSA DRONES USED IN FORESTRY PILOT. ............................................................................................... 47 FIGURE 20. GENERATED PRODUCTS (IMAGERIES) IN FORESTRY PILOT. ................................................................................ 47 FIGURE 21. GENERATED INDEXES (IMAGES) IN FORESTRY PILOT. ....................................................................................... 48 FIGURE 22. FOREST VALUE CHAIN AND THE EXPECTED BENEFITS OF ‘WUUDIS DATA’ TO ALL SEGMENTS OF THE VALUE CHAIN. ......... 50 FIGURE 23. CONCEPT OF WUUDIS DATA. ................................................................................................................... 51 FIGURE 24. THE CONCEPT OF NEW SENOP HYPERSPECTRAL CAMERA, RELEASED ÍN 2018. ...................................................... 52 FIGURE 25. EXAMPLE OF CLOUD-FREE REFLECTANCE IMAGE OF THE FORESTS OF CZECH REPUBLIC GENERATED USING BIG DATA SPATIAL-

TEMPORAL ANALYSIS UTILIZING ALL-AVAILABLE SENTINEL-2 OBSERVATIONS BETWEEN JUNE AND AUGUST 2016................. 53 FIGURE 26. EXAMPLE OF SATELLITE-DERIVED PRODUCT DESCRIBING FOREST HEALTH STATUS - AMOUNT OF CHLOROPHYLLS IN FOREST

CANOPIES. RED AREAS ARE IDENTIFIED AS FORESTS WITH LOW CHLOROPHYLL CONTENT. CLOUD-FREE IMAGE MOSAIC GENERATED

ABOVE SENTINEL-2 BIG DATA WAS USED AS AN INPUT IN THE ALGORITHM. ................................................................. 54 FIGURE 27. ILLUSTRATION OF VMS (FROM EC COMMISSION, FISHERIES POLICY – CONTROL TECHNOLOGIES). ............................. 56 FIGURE 28. MARINETRAFFIC INFORMATION PORTAL SHOWING VESSEL TRAFFIC IN NORTHERN EUROPE BASED ON AIS DATA (FROM

WWW.MARINETRAFFIC.COM). ........................................................................................................................ 59

Page 6: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 6

FIGURE 29. NORWEGIAN SEA FISHING ACTIVITY ACCORDING TO GLOBAL FISHING WATCH (JUL/AUG 2017). ............................. 60 FIGURE 30. INFORMATION SERVICES IN BARENTSWATCH (BARENTSWATCH.NO, ACCESSED 29/11.2017). ................................ 61 FIGURE 31. THE FISHINFO SERVICE - EXAMPLE SHOWING FISHING ACTIVITY WITH NETS (BLUE), LINES(RED) AND PURSE SEINERS (PURPLE)

AS WELL AS RESTRICTED (BLACK POLYGONS) AND CLOSED (FILLED POLYGONS) FISHING AREAS (FROM THE FISKINFO.NO WEBSITE).

................................................................................................................................................................ 61 FIGURE 32. THE FLUX STANDARDS AND STATUS (FROM UN ESCAP PRESENTATION OF DR HEINER LEHR) [REF-37]. .................. 64 FIGURE 33. SUMMARY AND CONTEXT OF THE FISHERY PILOTS IN DATABIO. ........................................................................ 67 FIGURE 34. ELECTRICITY CONSUMPTION: COUNTRIES COMPARED TO IT SECTOR. .................................................................. 69

List of Tables TABLE 1:THE DATABIO CONSORTIUM PARTNERS........................................................................................................... 10 TABLE 2. OPEN DATA PROVIDERS RELEVANT FOR FISHERIES. ............................................................................................. 62

Page 7: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 7

Definitions, Acronyms and Abbreviations Acronym/

Abbreviation Title

AEF Agricultural Industry Electronics Foundation

ALS Airborne Laser Scanning

BDT Big data technology

BDVA European Big Data Value Association

CEMA European Agricultural Machinery organisation

CEN Comité Européen de Normalisation

CEOS Committee on Earth Observation Satellites

CRM Customer Relationship Management

D2I Data to Intelligence

EO Earth Observation

EP Exploration platform

ERS Electronic Reporting System

ESA European Space Agency

EUMOFA European Market Observatory for Fisheries and Aquaculture

FLUX Fisheries Language for Universal eXchange

FOCUS Fisheries Open source CommUnity Software

GLONASS Russian navigation satellite system

GNSS Global Navigation Satellite System

GPS Global Positioning System

HTTP Hyper Text Transfer Protocol

IaaS Infrastructure as a service

ICES International Council for the Exploration of the Sea

IOPS Input/output operations per second

IoT Internet of Things

ISO International Standardisation Organisation

Mha One million hectares, 10 000 km2

NAS Network-attached Storage

NIR Near infra-red

NIST National Institute of Standards and Technology

NOAA National Oceanic and Atmospheric Administration

OECD Organisation for Economic Co-operation and Development

OGC Open Geospatial Consortium

PCI-e PCI Express (Peripheral Component Interconnect Express)

PF Precision farming

PLIS Land Parcel Information System

RFID Radio Frequency Identification

Page 8: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 8

RTK Real Time Kinematic

SAM State-space Assessment Model

TEP Thematic Exploration Platform of ESA

TLS Terrestrial Laser Scanning

USL Uniform Resource Locator

VMS Vessel Monitoring System

VTS Vessel Traffic System

WMS Web Map Service

ZB Zetta byte, 1021bytes, 1billion terabytes

UAV Unmanned Aerial Vehicle

NDVI Normalized difference vegetation index

CARI Chlorophyll Absorption Ratio Index

GNDVI Green Normalized Difference Vegetation Index

NGRVI Normalized Difference Green/Red Normalized green red difference index

JRC Joint Research Centre

Term Definition

Cassandra Apache Cassandra is a free and open-source distributed NoSQL database

management system designed to handle large amounts of data across many

commodity servers, providing high availability with no single point of failure

EModnet The European Marine Observation and Data Network

EU GDPR General Data Protection Regulation

Excel A spreadsheet developed by Microsoft. It features calculation, graphing tools,

pivot tables, and a macro programming language

Hadoop Apache Hadoop, an open-source software framework used for distributed

storage and processing of dataset of big data using the MapReduce

programming model

Landsat The Landsat program is the longest-running enterprise for acquisition of

satellite imagery of Earth.

MapReduce MapReduce is a programming model and an associated implementation for

processing and generating big data sets with a parallel, distributed algorithm

on a cluster

R Open source programming language and software environment for statistical

computing and graphics that is supported by the R Foundation for Statistical

Computing

SAS software suite developed by SAS Institute for advanced analytics, multivariate

analyses, business intelligence, data management, and predictive analytics

Sentinel-2 Sentinel-2 is an Earth observation mission developed by ESA as part of the

Copernicus Programme to perform terrestrial observations in support of

services such as forest monitoring, land cover changes detection, and natural

disaster management

Page 9: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 9

SPSS Statistics is a software package used for logical batched and non-batched

statistical analysis.

Statgraphics Statgraphics is a statistics package that performs and explains basic and

advanced statistical functions.

Page 10: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 10

1 Introduction 1.1 Project Summary The data intensive target sector selected for the

DataBio project is the Data-Driven Bioeconomy.

DataBio focuses on utilizing Big Data to

contribute to the production of the best possible

raw materials from agriculture, forestry and

fishery/aquaculture for the bioeconomy

industry, in order to output food, energy and

biomaterials, also taking into account various

responsibility and sustainability issues.

DataBio will deploy state-of-the-art big data technologies and existing partners’ infrastructure

and solutions, linked together through the DataBio Platform. These will aggregate Big Data

from the three identified sectors (agriculture, forestry and fishery), intelligently process them

and allow the three sectors to selectively utilize numerous platform components, according

to their requirements. The execution will be through continuous cooperation of end user and

technology provider companies, bioeconomy and technology research institutes, and

stakeholders from the big data value PPP programme.

DataBio is driven by the development, use and evaluation of a large number of pilots in the 3

identified sectors, where also associated partners and additional stakeholders are involved.

The selected pilot concepts will be transformed to pilot implementations utilizing co-

innovative methods and tools. The pilots select and utilize the best suitable market ready or

almost market ready ICT, Big Data and Earth Observation methods, technologies, tools and

services to be integrated to the common DataBio Platform.

Based on the pilot results and the new DataBio Platform, new solutions and new business

opportunities are expected to emerge. DataBio will organize a series of trainings and

hackathons to support its take-up and to enable developers outside the consortium to design

and develop new tools, services and applications based on and for the DataBio Platform.

The DataBio consortium is listed in Table 1. For more information about the project see

www.databio.eu.

Table 1:The DataBio consortium partners

Number Name Short name Country

1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium

2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic

Page 11: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 11

3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic

4

FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER

ANGEWANDTEN FORSCHUNG E.V. Fraunhofer Germany

5 ATOS SPAIN SA ATOS Spain

6 STIFTELSEN SINTEF SINTEF ICT Norway

7 SPACEBEL SA SPACEBEL Belgium

8

VLAAMSE INSTELLING VOOR TECHNOLOGISCH

ONDERZOEK N.V. VITO Belgium

9

INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ

AKADEMII NAUK PSNC Poland

10 CIAOTECH Srl CiaoT Italy

11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain

12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI) EV INFAI Germany

13 NEUROPUBLIC AE PLIROFORIKIS & EPIKOINONION NP Greece

14

Ústav pro hospodářskou úpravu lesů Brandýs nad

Labem UHUL FMI Czech Republic

15 INNOVATION ENGINEERING SRL InnoE Italy

16 Teknologian tutkimuskeskus VTT Oy VTT Finland

17 SINTEF FISKERI OG HAVBRUK AS

SINTEF

Fishery Norway

18 SUOMEN METSAKESKUS-FINLANDS SKOGSCENTRAL METSAK Finland

19 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel

20 MHG SYSTEMS OY - MHGS MHGS Finland

21 NB ADVIES BV NB Advies Netherlands

22

CONSIGLIO PER LA RICERCA IN AGRICOLTURA E

L'ANALISI DELL'ECONOMIA AGRARIA CREA Italy

23 FUNDACION AZTI - AZTI FUNDAZIOA AZTI Spain

24 KINGS BAY AS KingsBay Norway

25 EROS AS Eros Norway

26 ERVIK & SAEVIK AS ESAS Norway

Page 12: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 12

27 LIEGRUPPEN FISKERI AS LiegFi Norway

28 E-GEOS SPA e-geos Italy

29 DANMARKS TEKNISKE UNIVERSITET DTU Denmark

30 FEDERUNACOMA SRL UNIPERSONALE Federu Italy

31

CSEM CENTRE SUISSE D'ELECTRONIQUE ET DE

MICROTECHNIQUE SA - RECHERCHE ET

DEVELOPPEMENT CSEM Switzerland

32 UNIVERSITAET ST. GALLEN UStG Switzerland

33 NORGES SILDESALGSLAG SA Sildes Norway

34 EXUS SOFTWARE LTD EXUS

United

Kingdom

35 CYBERNETICA AS CYBER Estonia

36

GAIA EPICHEIREIN ANONYMI ETAIREIA PSIFIAKON

YPIRESION GAIA Greece

37 SOFTEAM Softeam France

38

FUNDACION CITOLIVA, CENTRO DE INNOVACION Y

TECNOLOGIA DEL OLIVAR Y DEL ACEITE CITOLIVA Spain

39 TERRASIGNA SRL TerraS Romania

40

ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS

ANAPTYXIS CERTH Greece

41

METEOROLOGICAL AND ENVIRONMENTAL EARTH

OBSERVATION SRL MEEO Italy

42 ECHEBASTAR FLEET SOCIEDAD LIMITADA ECHEBF Spain

43 NOVAMONT SPA Novam Italy

44 SENOP OY Senop Finland

45

UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO

UNIBERTSITATEA EHU/UPV Spain

46

OPEN GEOSPATIAL CONSORTIUM (EUROPE) LIMITED

LBG OGCE

United

Kingdom

47 ZETOR TRACTORS AS ZETOR Czech Republic

48

COOPERATIVA AGRICOLA CESENATE SOCIETA

COOPERATIVA AGRICOLA CAC Italy

Page 13: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 13

1.2 Document Scope The purpose of this document is to give an overview of the state of the art of big data

technology in bioeconomy sector. It especially targets actors and stakeholders within

agriculture, forestry and fishery, including end-users and other operators who may not have

in-depth ICT knowledge or expertise in big data technologies.

1.3 Document Structure

This document is comprised of the following chapters:

Chapter 1 presents an introduction to the project and the document.

Chapter 2 presents motivation, background and objectives regarding the use of big data in

bioeconomy.

Chapter 3 gives on overview of big data.

Chapter 4 describes the status of big data in agriculture.

Chapter 5 describes the status of big data in forestry.

Chapter 6 describes the status of big data in fishery.

Chapter 7 looks into some future developments in big data technologies relevant to

bioeconomy.

Chapter 8 provides the conclusions based on the previous chapters.

Chapter 9 lists the references used in the document.

Page 14: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 14

2 Background and objectives The “Data-Driven Bioeconomy” (DataBio) project as a Large Scale Pilot (LSP) action address

domains of strategic importance for EU industry because the share of bioeconomy is

remarkably large in the national economy in EU countries. The European bioeconomy is

already worth more than €2 trillion annually and employs over 22 million people, often in

rural or coastal areas and in Small and Medium Sized Enterprises (SMEs).

The sectorial demonstrations are large scale efforts of the Data-Driven Bioeconomy

ecosystem containing not only the project partners but also a large number of other

cooperation parties creating different supply chains and value chains for the pilots.

In the demonstrations the following three sectors of bioeconomy are covered: 1. Forestry, 2.

Agriculture and 3. Fishery. The results can be replicated because standard technologies and

best practice solutions are used on different domain independent system levels:

1. Data gathering/data sets,

2. Platforms and interfaces,

3. Big Data tools and services.

In this LSP the sectorial demonstrations are large and the European coverage wide but not

complete. However, the international networks and the activities of the partners and of other

sectorial cooperation organizations participating actively in Big Data demonstrations make it

easy to transfer the results across the EU. The big data technologies, platforms and data

source interfaces of the project are domain agnostic generic solution. That is why the results

can be easily utilized in other contexts.

The objective of this state-of-the-art document is to support the transfer of knowledge across

various domains by providing an overview of current implementations and future outlook of

big data technologies in three key sectors of bioeconomy: agriculture, forestry and fishery.

Page 15: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 15

3 Big Data overview - status, challenges and

opportunities This chapter gives an overview of current developments in generation, management and use

of big data, and how the challenges of volume, velocity, variety and veracity are being tackled.

3.1 Introduction to big data Big data refers to large and complex data sets, which are challenging for normal computer

hardware and software to handle. Therefore, a range of Big Data Technologies is needed for

capturing, managing, processing, analysing, visualising and communicating the data. Gartner

coined three basic dimensions of Big Data - the three V’s: Volume, Velocity and Variety. Often,

a fourth V, Veracity and a fifth, Value, are added that underlines the aim of the computations:

to extract value from data.

The volume of data is growing exponentially doubling every 12 months (source. Data Alliance,

2015), enabled by numerous low-price IoT devices, like mobile phones, aerial and satellite

images, cameras, temperature and humidity sensors. In addition, a lot of data is created as a

by-product (=footprint) of digital interaction. Typically, the data volumes range between

terabytes to many petabytes (=1015 bytes). This means at the same time that the large data

sets have low information density. Large data sets relevant for DataBio are primarily satellite

images.

Figure 1. The exponential growth of data [REF-01].

The velocity comes from the need for real-time or near-to-real-time response and delivery of

datasets and data streams. This goes for IoT data coming from cameras and other sensors as

Page 16: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 16

well as media streams and simulation data in digital models. Typical cases for DataBio are

sensor data coming from fishing ship engines or from tractors driving on the fields.

Variety arises from the need to process texts, images, audio, video as well as fusioned data

sources. Relevant cases for DataBio are again series of satellite, aerial and drone images.

As the number of data sources increase, the importance of “volume” as the key characteristic

of big data is diminishing. In a survey on big data adoption targeting leading industry a

majority of respondents saw variety as the main characteristic [REF-02]. This perception also

seems to grow over time (Figure 2).

-

Figure 2. Key characteristics of big data (based [REF-02]).

For every discussion about knowledge or information management, it is important to

understand basic terms such as data, information and knowledge. For a better explanation,

we will use Ackoff’s Data-Information-Knowledge-Wisdom hierarchy [REF-03] (Figure 3):

• Data: as symbols;

• Information: as data that are processed to be useful; provides answers to "who",

"what", "where", and "when" questions;

• Knowledge: as application of data and information; answers "how" questions;

• Wisdom: as evaluated understanding.

Page 17: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 17

Figure 3. The Data-Information-Knowledge-Wisdom hierarchy of Ackoff [REF-03].

A further elaboration of Ackoff's [REF-03] definitions follows:

1. Data... data is raw. It simply exists and has no significance beyond its existence (in and

of itself). It can exist in any form, usable or not. It does not have meaning of itself. In

computer parlance, a spreadsheet generally starts out by holding data

2. Information... information is data that has been given meaning by way of relational

connection. This "meaning" can be useful, but does not have to be. In computer

parlance, a relational database makes information from the data stored within it.

3. Knowledge... knowledge is the appropriate collection of information, such that its

intent is to be useful. Knowledge is a deterministic process.

4. Wisdom... wisdom is an extrapolative and non-deterministic, non-probabilistic

process. It calls upon all the previous levels of consciousness, and specifically upon

special types of human programming (moral, ethical codes, etc.). It beckons to give us

understanding about which there has previously been no understanding, and in doing

so, goes far beyond understanding itself. It is the essence of philosophical probing

From the management point of view, we have defined three levels of management:

• Data Management;

• Information Management;

• Knowledge management.

Data management includes:

• Data governance;

• Data Architecture, Analysis and Design;

• Database Management;

• Data Security Management;

Page 18: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 18

• Data Quality Management;

• Metadata Management;

• Document, Record and Content Management;

• Reference and Master Data Management.

Information management includes:

• Records management;

• Can be stored, catalogued, organized;

• Align with corporate goals and strategies;

• Set up a database;

• Utilize for day-to-day optimum decision-making;

• Aims for efficiency;

• Data with a purpose.

Knowledge management includes:

• A framework for designing an organization’s goals, structures, and processes to add

value;

• Collect, disseminate, utilize information;

• Align with corporate goals and strategies;

• Focus on cultivating, sharing, and strategizing;

• Connecting people to gain a competitive advantage;

• Information with a purpose [REF-04].

3.2 Big data in bioeconomy In line with the European Commission, we define bioeconomy as comprising those parts of

the economy that use renewable biological resources from land and sea – such as crops,

forests, fish, animals and micro-organisms – to produce food, materials and energy [REF-05].

In particular, the DataBio project focuses on key sectors: agriculture, forestry and fishery.

Through history, these traditional sectors have gone through phases of continuous and

sometimes disruptive development that have affected the whole value chain from producer

to consumer and end user. The advent of information technology and the still ongoing

digitalisation of industry, and society in general, has been the most significant development

since the industrial revolution. The exponential growth of data through new data sources and

advanced analytics may help coping with the challenges of increasing productivity in a

sustainable way. Due to the nature and complexity of the bioeconomy sectors, this will

require a variety of different data sources ranging from large-scale earth observation to fine-

scale sensor input. A key challenge is to process this data to generate new knowledge and

Page 19: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 19

deliver these insights as meaningful information, forecasts and recommendations to users in

an accessible way.

In agriculture, big data technology (BDT) is implemented under the banner of precision

farming (PF) [REF-06]. BDT builds on geo-coded maps of agricultural fields and the real-time

monitoring of activities on the farm in order to increase the efficiency of resource use, reduce

the uncertainty of management decisions [REF-07]. Under PF, yield is increased due

particularly to the precise selection and application of exact types and doses of agricultural

inputs (crop varieties, fertilizers, pesticides, herbicides, irrigation water) for optimum crop

growth and development.

In terms of Technology Readiness Level (TRL), the current implementations in agriculture are

mostly positioned at the 6th and 7th TRL. Improved technologies such as new elite varieties

were developed, big data such as weather, soil, crop (phenotypic data), and other

environmental data are routinely collected and meta-analysed, and technological and

managerial services are already offered to farmers in a few nations for a number of crops

although not to a big data analytics technology level. There also exist experiences with farm

telemetry or utilization of satellite data (Earth Observation) in some countries. In addition,

the required skills are available to the organizations participating in the pilots, and there is a

good level of readiness of organizations to change their internal and external business

processes, which is a key factor for adopting the new technology.

It is envisioned, that big data analytics system will provide pilot managers with highly localized

descriptive (better and more advanced way of looking at an operation), prescriptive (timely

recommendations for operation improvement i.e., seed, fertilizer and other agricultural

inputs application rates, soil analysis, and localized weather and disease/pest reports, based

on real-time and historical data) and predictive plans (use current and historical data sets to

forecast future localized events and returns). Tracking the machinery fleet which allows

localization of farm vehicles in real time.

In most European countries, traditional methods for forest management are based on

“static” management plans, created at the planting stage and reviewed every 10 years. In

recent years, these management plans have become a declaration of intentions, including

objectives for multifunctional forests (non-wood products and services). However, these

plans often lack effective implementation and monitoring methods that allow forest owners,

managers and regulators to validate the progress in achieving the target objectives set out in

the management plan.

Big data methods bring the possibility to both increase the value of the forests as well as to

decrease the costs within sustainability limits set by natural growth and ecological aspects.

The key technology is to gather more and more accurate information about the trees from a

host of sensors including new generation of satellites, UAV images, laser scanning, mobile

devices through crowdsourcing and machines operating in the forests. This enables a

characterization of even single trees. Once accurate forest information has been gathered,

Page 20: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 20

the following step is to employ tools that mobile and cloud technology recently have made

available and deposit the measured data onto digital platforms that can be accessed by a

variety of user devices. The precise databases enable a sustainable growth of timber

extraction, an optimized use of the tree raw material and a higher long-range growth of the

biomass by precise support actions. At the same time, the costs for management, labour and

timber transport can be significantly reduced, which gives gains also in the short run. There

will be a variety of new services e.g. relating to timber sales, working and transport

assignments, that create economic growth.

In the fisheries sector, companies are ramping up their digitalisation efforts to start

harvesting the benefits from applying Big Data technology to optimize their business.

Although large efforts have been done to make scientific marine data sets available, e.g.

EMODnet, NOAA, Copernicus, the main problem is that much of the industrial data in fisheries

is either not recorded (for example hydroacoustics, operational data, energy consumption) or

is business sensitive and therefore not shared or openly available (detailed catch and price

data). Vessel equippers as well as manufacturers of fish finding and catch equipment are

increasing the capacity and functionality in their systems to store and collate data. As a result,

major industrial players are focusing on building their own Big Data platforms to gain a

business edge for their products through more advanced analytics and services, rather than

opening up their systems and sharing data. In contrast to this, there is a lot of effort in

scientific communities related to fisheries to open up data sets and leverage machine learning

analytics to provide open services for the common good, with Global Fishing Watch being an

excellent global showcase. Chapter 6 gives an introduction to the state of art of Big Data in

fishery, covering topics like vessel monitoring systems, optimization focus and machine

learning in fishery, relevant information services and data providers and open source

software and initiatives relevant for Big Data in fishery. A conclusion of the current state of

Big Data in fisheries is summarized at the end of the chapter before outlining some near future

development opportunities.

3.3 Big data management The characteristics defining big data, volume, velocity and variability, put high demands on

the process whereby this data is managed. Big data management is a discipline, where data

management techniques, tools and platforms including storage, pre-processing, processing

and security can be applied [REF-08]. The process flow is illustrated in Figure 4.

Page 21: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 21

Secu

rity

Big Data Sources

Decision Making

Classification

Processing

Pre-processing

Storage Management

Network Management

Figure 4. Big data management process [REF-08]

Big Data technology introduced some additional requirements on hardware infrastructure

and influences process of designing and building of data centres.

From the point of view of hardware infrastructure most of the issues are related to the data

storage architecture to bring the computation to data, and to make processing more efficient.

The key requirement of big data storage is to handle very large amounts of data and to keep

scaling with data growth. What is also important is to provide the input/output operations

per second (IOPS) level necessary to deliver data to analytics tools and to avoid performance

degradation with increasing storage space. Keeping processing time constant and short, while

data volumes increase, along with meeting real-time demands–all in an affordable way –are

serious challenges. These in turn have a strong impact on the infrastructure for Big Data.

Traditional approaches with relational databases and server scale up and server scale out will

eventually reach their limits.

Page 22: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 22

Currently, organizations already possess enough storage in-house to support a Big Data

initiative. However, agencies may decide to invest in storage solutions that are optimized for

Big Data. While not necessary for all Big Data deployments, flash storage is especially

attractive due to its performance advantages and high availability.

Large users of Big Data — companies such as Google and Facebook — utilize hyperscale

computing environments, which are made up of commodity servers with direct-attached

storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to

reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered

network-attached storage (NAS).

Cloud storage is an option for disaster recovery and backups of on-premises Big Data

solutions. While the cloud is also available as a primary source of storage, many organizations

— especially large ones — find that the expense of constantly transporting data to the cloud

makes this option less cost-effective than on-premises storage.

In the context of data centres design principles, the electrical infrastructure is one of the

major concerns for handling big data. Big data has an indirect impact on data centre power

consumption. As the electrical infrastructure expands, the electrical power consumption

increases many-fold. Reliability of electrical infrastructure is also important while considering

data volume and its processing.

The other issue is the cooling system that needs to perform well and scale with increasing

load of the computer systems.

Big data repositories may be built by integration of data coming from different, geographically

distributed sources. Taking into account this fact, a very important element of data centre is

network infrastructure. Because traffic generated by automatic data streaming is much higher

than human-generated requests, high performance network connections based on fibre

channels are essential. Big data sources can send huge volumes of data to data centres, which

will increase inbound bandwidth requirements. Therefore, the data centre network

infrastructure must be prepared to support the volume and velocity of data. It will also

increase the bandwidth requirement of the network.

The last but not least component of data centre infrastructure is security. Big data is all about

data, so its security at the storage level is a critical challenge to overcome. The data has to be

secured because it can contain an organization's confidential information. Organizations are

working on different approaches to avoid security threats. Data centre security has to be

implemented at the network level, storage level and application level.

3.3.1 Earth Observation data services

A lot of effort has been spent during the last years for standardising EO data management.

The interfaces for which widely accepted standards exist and are deployed include:

• EO dataset/product metadata,

Page 23: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 23

• EO dataset/product discovery,

• Online data access and

• Viewing

The current EO metadata standard supported by European and Canadian Space Agencies is

OGC 10-157r4 “Earth Observation Profile of Observations and Measurements (O&M)” [REF-

09]. Through the ESA FedEO endpoint (http://fedeo.esa.int), this type of metadata is available

from several backend systems, e.g. Sentinels Data Hub, ESA CDS (Copernicus CSCDA) and ESA

Virtual Archive-4.

Discovery of datasets and products is defined in CEOS OpenSearch Best Practice v1.1.2 [REF-

10]. These specifications can be used to allow for discovery of collections. Collection metadata

returned by the collection discovery service may be returned in various metadata formats.

Discovery of products is performed via a similar OpenSearch interface. The collection-specific

search responses are made available as defined in the CEOS Best Practices.

Current practice at many data providers Product Facilities is to make the products available

for online access via HTTP. The ESA Facilities LDS, OADS and others use this approach. The

dataset metadata and catalog search responses then include this download URL as part of the

search response and metadata description. The product search response typically includes a

(HTTP) download URL for the product.

EO dataset metadata or search responses typically contains a link to a view or quick look

image. This can be a static image or a reference to a View service implemented as an OGC

Web Map Service (WMS).

To allow efficient querying a large data repository, a so-called Map Reduce architecture

(invented by Google) is often used to split and distribute the queries across parallel processing

nodes (Map step), after which the results are gathered and delivered (Reduce step). Map

Reduce is implemented in an Apache open-source project called Hadoop. Apache Spark

expands Map Reduce by adding the ability to configure many operations. Apache Flink

(https://flink.apache.org) is a data processing system and an alternative to Hadoop’s

MapReduce component. It comes with its own runtime rather than building on top of

MapReduce. As such, it can work completely independently of the Hadoop ecosystem.

3.4 Big data analytics Data analysis has been studied intensively and numerous algorithms exists. It has applications

in different business, science, and social science domains. A wide range of tools and

commercial applications are available, some of which are highly competitive in markets, such

as Customer Relationship Management (CRM). There are also numerous statistics programs

and packages available, both for casual users and specialists (Excel, SAS, SPSS, R).

Some of the most common data analytics methods are introduced here. They cover methods

for data exploration, descriptive methods, predictive methods and methods for anomaly

Page 24: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 24

detection. This introduction omits a large number of analytics methods, including analysing

for text and other unstructured data.

The purpose of data exploration is to gain a better understanding of the characteristics of

data [REF-11]. The central methods are summary statistics and visualizations. Summary

statistics are numbers that summarize properties of the data. Amar et al. [REF-12] have

classified the statistical methods as:

• computer-derived values; average, median, count, correlations,

• finding extrema; finding data cases having the highest and lowest value,

• determining range: finding a span of values of an attribute of data cases, and

• characterizing distributions: creating a distribution of a set of data cases with a

quantitative attribute, e.g. to understand “normality”

The goal of descriptive methods is to discover patterns and rules in data. The methods focus

on finding clusters, patterns and associations from data [REF-11]. Clustering looks for groups

of objects such that the objects in a group will be similar (or related) to one another and

different from (or unrelated to) the objects in other groups. The similarity of objects is defined

based on similarity (or distance) measures. E.g. market segmentation is an application of

clustering.

Pattern detection involves finding combinations of items that occur frequently in data.

Sequential pattern discovery finds rules that predict strong sequential dependencies among

different events. Association rule mining involves the prediction of occurrences of an item

based on occurrences of other items. It produces dependency rules such as “buyers of milk

and diapers are likely to buy beer”.

The purpose of predictive modelling is to build models that predict the value of one variable

from the known values of other variables [REF-13]. The predicted objects are predefined.

Regression and classification are two much used predictive methods.

Regression predicts a value of a continuous variable based on other variables using linear or

nonlinear models [REF-11]. Linear regression is easy to visualize, often shown as a line on a

scatterplot diagram. The area is studied extensively and has its origins in statistics. Application

examples include predicting stock markets, or wind speed as a function of temperature or

humidity.

Classification creates a model for a class attribute as a function of the values of other

attributes (training set). Unseen records are then assigned to the class. The accuracy of the

models is evaluated with a test set. Several techniques have been developed including

decision trees, Bayesian methods, rule-based classifiers and neural networks. Classification is

a much-used method and commercial applications are also available. Examples include

classification of credit card transactions as legitimate or fraudulent, classification of e-mails

as spam, or classification of news stories [REF-11].

Page 25: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 25

Anomalies are observations whose characteristics differ significantly from the normal profile.

Methods of anomaly detection look for sets of data points that are considerably different

from the remainder of the data. The methods build a profile of “normal” behaviour and detect

significant deviations from it. The profile can be patterns or summary statistics for the overall

population. Types of anomaly detection schemes can be graphical-based, statistical-based,

distance-based or model-based. Credit card fraud detection, telecommunication fraud

detection, network intrusion detection and fault detection are examples of application areas

[REF-11].

Real time Analytics and Stream Processing relates to the velocity aspects of Big Data

applications. Streaming analytics enables businesses to respond appropriately and in real-

time to context-aware insights delivered from fast data [REF-14].

Forrester [REF-15] defines streaming analytics as: “Software that provides analytical

operators to orchestrate data flow, calculate analytics, and detect patterns on event data

from multiple, disparate live data sources to allow developers to build applications that sense,

think, and act in real time”. As pointed out in this same report, streaming analytics is about

finding and acting on insights from event data in real-time. It represents something that has

happened, whether it be physical or digital. It encompasses any data that enterprise

applications, mobile apps, websites, infrastructure, external feeds, and IoT devices emit.

Streaming analytics solutions identify patterns on these events in real-time. Insights

generated using streaming solutions are immediate but not valuable unless they are used to

take action.

Drivers to the adoption and expansion of streaming analytics:

• Internet of Things (IoT) growth and pervasiveness - Streaming analytics solutions are

particularly well suited to internet of things (IoT) applications because they are by

nature real-time and emit sensor data that can be analyzed in real time. As pointed

out by Gartner in [REF-16], “…much of the growth in streaming processing usage

during the next 10 years will come from the IoT”. Streaming analytics is the core

technology enabler for The Internet of Things [REF-14]. Characteristics of streaming

analytics are particularly suited to the processing of sensor data: the combination of

time-based and location-based data analysis in real-time over short time windows, the

ability to filter, aggregate and transform live data, and to do so across a range of

platforms from small edge appliances to distributed, fault-tolerant cloud clusters.

Sensor data volumes have already reached a level where streaming analytics is a

necessity, not an option.

• Need to meet the coming massive shortfall in storage capacity - In a recent report

[REF-17], IDC pointed out that, of the 160 ZB data which is forecasted to be generated

by 2025, about a quarter of it will be real-time data in nature (generated, processed

and instantly accessible) up from around 5 percent today, and most of that real-time

data (95 percent) will come from the world of IoT. Another interesting observation is

Page 26: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 26

that only between 3 percent and 12 percent, depending on the source, will be able to

be stored, as data storage won’t be able to cope with this vast amount of data.

Therefore, the logical conclusion is that this data must be collected, processed and

analysed in-memory, in real-time, close to where the data is generated.

• Improves the quality of decision making by presenting information that could

otherwise be overlooked [REF-16].

• Enables smarter anomaly detection and faster responses to threats and opportunities

[REF-16].

• Helps shield business people from data overload by eliminating irrelevant information

and presenting only alerts and distilled versions of the most important information

[REF-16].

• Vendors are bringing out new products, many of them open source, to handle

established and emerging use cases [REF-16].

• Business is demanding analytical support for better situation awareness and faster,

more-precise decisions [REF-16].

As Gartner pointed out in [REF-16], event streaming processing technology is maturing rapidly

and will eventually be adopted by multiple departments within every large company. Some

of the most prominent markets and use cases are listed below.

Capital Markets remains a strong sector for those with an event processing heritage. Use

cases include automated, algorithmic trading and for real-time trade compliance and audit,

but also an increase in deployments for fraud detection and trading analytics as a service.

Preventative maintenance could well be the silver bullet for streaming analytics in IoT. The

value to the customer is clear, to reduce operational and equipment cost by minimizing

unplanned outages, and to reduce the requirement for expensive site and maintenance visits

that could be avoided. For example, an IoT predictive maintenance application may monitor

temperature and vibration data streamed from a conveyer belt. The streaming analytics

solution could detect a spike in either temperature or vibration to indicate a looming

shutdown. The solution could then push an alert to an operator or trigger an automatic

shutdown of the machine. In addition, if the cadence of the streaming data is interrupted,

that may also indicate a problem with the sensors on the machine.

Retail - Real-time inventory updates are helping to drive business processes for inventory and

pricing optimisation, and for optimisation of the supply chain, logistics and just-in-time

delivery. This is also a market where wearables and the consumer market is poised for even

greater growth.

Industrial automation combines streaming and predictive analytics to optimize

manufacturing processes and product quality. Streaming analytics enables statistical analysis

of the manufacturing process, with alerting and automated shutdown when quality levels are

breached.

Page 27: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 27

Smart Energy - Scenarios from real-time monitoring of smart meters, smart pricing models

for electricity, to real-time sensor monitoring of wind farms (which produce a vast volume of

sensor data and where streaming analytics can drive a significant increase in efficiency and

energy output).

Healthcare - Smart sensors will play a pivotal role in exploiting the potential hidden in this

market. For example, where an SMS message can only remind a patient to take a pill, a smart

sensor on a pill bottle can report continuously if a pill has been taken and when, even if the

storage temperature is not correct.

Summary

Streaming analytics will be adopted in tomorrow’s organizations in almost every domain. The

pervasiveness of IoT will drive both the necessity and exploitation of real-time analytics for

actionable decision making. The domains of Fishery, Agriculture, and Forestry studied in the

scope of DataBio have been enriched in the past years by applying sensors and therefore are

not an exception. This is a unique opportunity to demonstrate streaming-driven applications

in these emerging domains.

3.5 Big data visualisation and user interaction Information visualization can be defined as “The use of computer-supported, interactive,

visual representations of abstract data to amplify cognition” [REF-18]. The goal is to improve

understanding of the data with graphical presentations. The principle behind information

visualization is to utilize the powerful image processing capabilities of the human brain.

Visualizations increase the human cognitive resources. They extend the working memory,

reduce the search of information and enhance the recognition of patterns.

Data visualization may handle abstract, non-physical information using abstract but well

understood visualization structures like trees or graphs. It has applications with measurement

data, business information, document collections, web content and other big data assets that

cannot be understood without highlighting the important characters. Big data visualization

renders visible properties of the objects of interest and can be combined with interactive

information access techniques.

Interactive visualization process is vital part of big data analysis framework described in

Chapter 3.4. Complex and heterogeneous data from different sources and various types and

levels of quality need to be curated and transformed to suitable format. The data sources can

range from well-organized databases to continuous input data streams. The data is analysed

using mathematical, statistical and data mining algorithms and models. Visualizations

highlight the important features, including commonalities and anomalies, making it easy for

users to perceive new aspects of the data. Visualizations are optimized for efficient human

perception, taking into account the capabilities and limitations of the human visual system.

Interactivity in visualizations allows users to explore the data and achieve new knowledge and

insight.

Page 28: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 28

The Visual Analytics Agenda [REF-19] introduces three principles of data selecting a

visualization method, adapted from [REF-20]: appropriateness, naturalness and matching.

Appropriateness states that visual representations should provide neither more nor less

information than that needed for the task at hand. Naturalness calls for visual representations

that most closely match the information being presented; new visual metaphors are only

useful for representing information when they match the user’s cognitive model of the

information. The matching principle states that representations are most effective when they

match the task to be performed by the user.

Characteristics of some of the most common visualization types are shortly described here.

For more detailed information, see e.g. [REF-21]. Bar charts and pie charts (Figure 5) are the

basic methods used to visualize univariate ordinal data, i.e. data which consists of

observations that have natural, ordered categories on only a single attribute.

Figure 5. Bar chart (a) and pie chart (b)

Histograms are similar to bar charts, but are used to represent quantitative data (Figure 5 a).

The histogram defines a sequence of breaks and then counts the number of observations in

the bins formed by the breaks. Line graphs are used for displaying quantitative data as a

continuous function of a single variable (Figure 6 b). Common uses are showing frequency

distributions and time series.

Page 29: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 29

Figure 6. Histograms (a) and line graph (b).

The basic visual method for analysing bivariate data, i.e. data for two (usually related) data

variables, is the scatterplot (Figure 7). Scatterplots are a good means of finding correlations,

clusters and outliers between two attributes. A third dimension can also be added by using a

visual effect such as colour and size of plot (bubble charts), or animations (animated bubble

charts).

Figure 7. Scatterplot.

Often, real-world data is multidimensional, consisting of many data items or without a clear

hierarchy. Dimension reduction methods aims at projecting data into a low dimensional space

(1D-3D) while maintaining the correct relations between the nodes. There are several

methods with different optimization goals and complexities. One of the best known is

Principal Component Analysis (PCA, Figure 8a). It tries to find a linear subspace that has

Page 30: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 30

maximal variance. In parallel coordinates visualization each of the dimensions corresponds to

a vertical axis and each data element is displayed as a series of connected points along the

dimensions/axes (Figure 8 b).

Figure 8. PCA (a) and parallel coordinates visualization (b).

If there are more than two variables in the dataset, correlation matrixes or correlation

networks can be used to show pairwise correlations for all variable combinations (Figure 9).

Figure 9. PCA (a) and parallel coordinates visualization (b).

3.5.1 Sensor data

Sensor data is the output of a device that detects and responds to some type of input from a

physical element or in other words, sensors ‘listen’ to the physical world, converting energy

into electric signals. Sensors are embedded into machines that we are used to use daily, like

Page 31: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 31

timers, thermostats, remote controls, etc. The data or output they produced may be usable

to provide information or input to other systems.

During the recent years, there have been enormous advances in hardware technology like the

development of sensors, which allows collecting huge kinds of different real-time data. In

addition, the cost of sensor hardware has been decreasing during the last years, allowing

them to participate or becoming as key player is several domains of application, i.e.

environment for weather detection and climate trend.

With the introduction of sensors in our daily labour, new challenges for ICT technologies have

arisen, principally related to collection, storage, processing and visualisation. Although those

challenges require the use of efficient systems and technologies, the correct use of them will

allow to have an added-value on the processed sensor data gathered. Saying so, the scope of

this chapter is related to the visualisation of sensor data.

Both from the final user and the application developer perspective, it is of interest the

information derived from the sensor data. As an example of this interest is the validation of

the sensor measurements through visual tools like web browsers or mobile devices.

Nowadays, there exists in the market different solutions, in form of IoT platforms, that easily

allows to connect sensor data to a whole bunch of applications that goes from analytics to

visual features. Some of the most relevant are:

• Microsoft Azure: In regard to the sensor data visualisation, it offers the Microsoft

Power BI tool. This tool offers the visualisation of real-time data or other type of data

coming from heterogeneous resources through a complete set of dashboards or

charts. It provides the capacity to display those results under mobile devices as well

as more complete user interfaces like web applications.

• IBM Watson: IBM offers an IoT platform in order to cover all the full-cycle of the IoT

devices management: from connectivity, to storage, processing and visualisation. In

the context of visualisation, new boards are available where it can be built custom

dashboards. Some of the capabilities that can be implemented with the dashboards

are:

o visualisation charts for the real-time data from devices.

o gauges for visualizing physical quantities like temperature, pressure.

o donuts and bar charts to display the current value of the data points.

o See the Data and storage consumption of your devices.

• FIWARE: Around FIWARE ecosystem there exists a wide range of solutions that can

benefit the exploitation of sensor data, for example:

o cartodb – allowing to display the location of the data producers in a map

o ducksboard – widget based solution allowing to show historic evolution of

entities

o freeboard – which is a very simple to use providing complete set of

functionalities to control the life-cycle of sensor data producers.

Page 32: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 32

Figure 10 shows an example of how the actual visualisation capabilities can be presented. Of

course, the presentation will vary from one vendor to another, but in every case the intention

will be the same, to offer the most complete key information in a simple and attractive

manner.

Figure 10. Visualisation of sensor data.

3.5.2 Earth Observation data

The interaction tasks vary depending on the visualization representations. Different

operations are required for spatio/temporal visualizations or hierarchical and network

structures. Shneiderman [REF-22] introduces seven tasks for information seeking when

interacting with large data sets: overview, zoom, filter, details-on-demand, relate, history,

and extract.

In visual interaction, there are two basic interaction techniques: Direct manipulation, which

allows the user to filter or select elements of visualizations, and dynamic queries where the

user interacts with sliders, menus and buttons. Direct manipulation techniques are

recommendable because they do not distract attention from the analysis process. The menus,

buttons and sliders are often scattered around the user interface, and using them requires

extra effort [REF-23].

A popular technique in visual analytics is using coordinated multiple views [REF-23] which is

a specific exploratory visualization technique. Data is represented in multiple windows and

operations in the views are coordinated. This means that data elements which are selected

and highlighted in one view are highlighted concurrently in all other views that include the

same data element. This operation is often called brushing. The user can change the style of

Page 33: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 33

brush, the bounding region and the brushing effects. The method is effective for discovering

outliers. An example of brushing is shown in Figure 11.

Figure 11. Brushing. The rounded area is highlighted in the histogram and on the map.

3.6 Big data frameworks Big data processing involves of series of data collection, storage and preparation stages

before the data can be analysed. In fact, even though the big data analytics is what everybody

is talking about, data preparation (e.g. collecting, curating and organising data) accounts for

up to 80% of the of the data scientist’s work [REF-24]. Therefore, frameworks and platforms

to help manage the complete big data processing chain are needed.

An effort to model a reference framework that describes logical components of a generic big

data system has been made by European Big Data Value Association (BDVA), whose

framework has been used for guiding DataBio platform development. Figure 12 shows the

BDVA Reference Architecture where the numbers describe number of the tools DataBio

project’s software vendors are providing for each part of the framework.

Page 34: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 34

Figure 12. BDVA Reference Architecture with numbers of DataBio components.

Another reference architecture for a big data interoperability framework has been published

by National Institute of Standards and Technology (NIST) [REF-25] (Figure 13). The framework

defines broad level data and service use flows between the framework components, denoting

needs for application interfaces.

Page 35: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 35

Figure 13. NIST Big Data Reference Architecture.

There are several existing big data platforms that implement, at least partially, the above-

mentioned frameworks. One of the main issues in big data management is scalability and

distributed data management. The Hadoop framework [REF-26] has been a very successful

distributed data processing framework and it has been widely adopted in industry and

research. The Hadoop project includes a distributed file system that provides high-throughput

access to data, a job scheduling and cluster resource management system and a system for

parallel processing of large data sets. Hadoop is widely used by the biggest data analytics

users, such as Amazon, Facebook, Google, IBM and Twitter.

Hadoop framework is often complemented with other big data processing platforms, such as

Spark processing engine [REF-27] or new types of databases, often referred to as NoSQL

databases. Hadoop and the other open source platforms are supported for industry use by

several service providers, such as Cloudera and Hortonworks as well as all the major software

vendors e.g. Microsoft, Oracle and IBM. In addition to Hadoop based systems, broad-based

data-management vendors offer big data analytics tools from data-integration and database-

management systems to business intelligence, with integration to their own applications.

An Earth Observation (EO) exploitation platform [REF-28] is a collaborative, virtual work

environment providing access to EO data and the tools, processors, and Information and

Communication Technology resources required to work with them, through one coherent

interface. As such, the exploitation platform may be seen as a new ground segments

operations approach, complementary to the traditional operations concept.

Page 36: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 36

OGC Testbed-13 [REF-29] supports the development of ESA’s Thematic Exploitation Platforms

(TEP) by exercising envisioned workflows for data integration, processing, and analytics based

on algorithms developed by users. These algorithms are initially developed by TEP users in

their local environments and afterwards tested on the Exploitation Platform. The goal is to

put an application into an Exploitation Platform (EP) Application Package, upload this package

to the Exploitation Platform, and deploy it on infrastructure that is provided as a service (IaaS)

for testing and execution. An Application Deployment and Execution Service acts as a front

end to cloud platforms, and is used by clients to deploy and execute application packages.

Page 37: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 37

4 Big data in agriculture 4.1 Introduction The agriculture sector is of strategic importance for European society and economy. Due to

its complexity, agri-food operators have to manage many different and heterogeneous

sources of information. Agriculture requires collection, storage, sharing and analysis of large

quantities of spatially and non-spatially referenced data. These data flows currently present

a hurdle to uptake of precision agriculture as the multitude of data models, formats,

interfaces and reference systems in use result in incompatibilities. In order to plan and make

economically and environmentally sound decisions a combination and management of

information is needed [REF-30].

Big data technology (BDT) is a new technological paradigm that is driving the entire economy,

including low-tech industries such as agriculture, where it is implemented under the banner

of precision farming (PF). Here it is necessary to mention, that farmers primary focused not

on (big) data, but on knowledge generated from this data.

4.2 Status of big data in agriculture Big data is moving into agriculture in a big way. A number of new technologies is now

influencing farming:

• Sensors on fields and crops are starting to provide data points on soil conditions, as

well as detailed info on wind, fertilizer requirements, water availability and pest

infestations.

• GPS units on tractors, can help determine optimal usage of agriculture machinery

• Unmanned aerial vehicles, or drones, can patrol fields and alert farmers to crop

ripeness or potential problems.

• RFID-based traceability systems can provide a constant data stream on farm products

as they move through the supply chain, from the farm to the compost or recycle bin.

Individual plants can be monitored for nutrients and growth rates [REF-31].

• There has been an explosive growth in the use of Remote Sensing data in recent years

in terms of volume and also velocity. Such data-collection possibilities are of

significant benefit to several application domains, including atmosphere/marine/land

monitoring, emergency management, and security etc. It is estimated that the

European Copernicus programme alone should bring 13.5 billion Euros and provide

around 28’000 jobs between 2008 and 2020 [REF-32].

Operative aerial Remote Sensing for the whole area of interest when mapping fields at high

spatial resolution but with low frequency (also known as temporal resolution). The aim is to

prepare prescription maps for spatially variable applications of fertilizers and pesticides,

estimated by the spectral measurement of crop parameters. The frequency of the survey

depends on the crop type, agronomical operations, crop management intensity, and weather

conditions. Aerial imaging is usually carried out using a multispectral camera by an external

Page 38: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 38

provider of photogrammetric services. Analyses may be performed through interpretative

algorithms after pre-processing of the acquired images, i.e. radiometric and geometric

corrections.

Periodic satellite Remote Sensing for the wide-ranging identification of spatial variability and

for the simultaneous capturing of the dynamics of vegetation growth, both at the medium

level of spatial resolution; such as in the case of Landsat 8 images at 30 metres per pixel, once

per 14 days. European Sentinel-2 data seem to be a valuable data source for periodic satellite

Remote Sensing, which significantly reduces the temporal resolution, e.g. to about 6 days for

most of Central Europe when combining Landsat and Sentinel data. The main information lies

in the vegetation indices determined from the R (red), NIR (near infra-red), and R-edge bands.

Absolute values of vegetation indices, their relative-to-mean values for the field, and the

detection of changes in these values are used for the assessment of crop stands and for

delineating management zones. Yield potential zones are areas with the same yield level

within the fields. Yield is the integrator of landscape and climatic variability and therefore

provide useful information for identifying management zones [REF-33]. This presents a basic

delineation of management zones for site specific crop management, which is usually based

on yield maps over the past few years. Similar to the evaluation of yield variation from

multiple yield data described by Blackmore [REF-34], the aim is to identify high yielding (above

the mean) and low yielding areas related as the percentage to the mean value of the field. In

addition, the inter-year spatial variance of yield data is important for agronomists to

distinguish between areas with stable or unstable yields. The presence of complete series of

yield maps for all fields is rare, thus remote sensed data are analysed to determine in field

variability of crops thru vegetation indices.

Figure 14. Yield potential application.

Machinery monitoring typically obtained through the Global Navigation Satellite System

(GNSS), no matter whether it is the American Global Positioning System (GPS NAVSTAR), the

European Galileo system, the Russian GLONASS system, the Chinese BeiDou system, the

Indian NAVIC system (officially named as the Indian Regional Navigation Satellite System), or

Page 39: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 39

any other. The basic principles are the same for any of the abovementioned systems, even

though technical details may vary. A GNSS receiver is mounted on a moving vehicle, typically

a tractor and/or an application machine. Both the position and trajectory of a

tractor/application machine may then be tracked. Using RTK GNSS methods, a geospatial

accuracy of approx. 0.03 m can be achieved; see e.g. [REF-34]. Experiments involving cell

phone-based monitoring have also been performed, such monitoring approaching a

geospatial accuracy of up to 0.20 m, according to the type of cell-phone network. Machinery

management is focused mainly on collecting telemetry data from machinery and analysing

them in relation with other farm data. The main challenge is access to data and data

integration, when farmer uses tractors and equipment from various manufacturers with

different telematics solutions and different data ownership/sharing policy. In many cases

farms or agriculture service organizations owns tractors of more than one brand/family.

Although the communication protocols used in control units of farm machinery and data

collection are subject of standardization, the telematics solutions including data

ownership/usage policy are usually specific to each tractor brand/family and the level.

Furthermore, attention shall be payed to ISO and CEN standards regulating data sharing in

agriculture basing on the input coming from industry organizations like CEMA and AEF.

Although this is not issue and can be even desirable for purposes of tractor producer’s

customer care responsible for solving technical problems on tractor, for farmers it can be hard

or impossible to connect the data coming from tractor with other farm data relevant for

agronomical / economical evaluation of machinery usage. Despite the fact that the tractor

has telematics solution, the farmer sometimes needs to use third party device and software

to obtain data for field specific analysis. Zetor Company is currently developing and testing

modular telematics solution which is supposed to be part of all Zetor tractors. The solution

will provide several levels of functionality ranging from basic telematics for customer care and

basic location information for customer to field specific economic analysis and precision

agriculture. The highest level of modular solution will offer connection to other data relevant

for farm management like field boundaries obtained Land Parcel Information system (LPIS),

elevation model and possibly yield potential maps derived from EO data.

Meteorological monitoring at farm level to capture the detailed dynamics of weather

conditions on the ground. Weather data together with the positions at which they are

collected are recorded at specific localities at a high frequency (every 10 to 15 minutes). The

main goal is to obtain data for the modelling of crop growth and to support decision making

by agronomists with respect to plant protection (the prediction of plant pests and

infestation), plant nutrition (crop growth and nutrient supply), soil tillage (soil moisture

regime), and irrigation (soil moisture).

4.3 Future developments To transforming data onto knowledge we need Big data analytics system, which will then

provide pilot managers with highly localized descriptive (better and more advanced way of

looking at an operation), prescriptive (timely recommendations for operation improvement

Page 40: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 40

i.e., seed, fertilizer and other agricultural inputs application rates, soil analysis, and localized

weather and disease/pest reports, based on real-time and historical data) and predictive

plans (use current and historical data sets to forecast future localized events and returns).

Page 41: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 41

5 Big data in forestry 5.1 Introduction The EU-28 had close to 182 Mha of forest. Among EU-28, Sweden reported the largest

wooded area in 2015 (30.5 Mha, 16.8% of the total area of the EU-28), followed by Spain (27.6

Mha, 15.2% of the total area of the EU-28), Finland (23.0 Mha, 12.7% of the total area of the

EU-28), and France (17.6 Mha). In 2010, 60.3 % of the EU-28’s forests were privately owned.

Regular engagement between different forest stakeholders drives the path of creating digital

forest ecosystem based by taking advantage of big data models.

5.1.1 Development/optimization focus in forestry

Forestry is developing rapidly with the help of new technologies and procedures. Numerous

methods provide information on forests each with their own time cycles, granularities,

accuracies, costs, and viewpoints. Effective utilization of available forest resources is thus not

only based on short-cycled, increasingly accurate, even cost-effective data inventory

methods. Instead, by providing easy access to best available up-to-date information on

forests is expected to generate new applications and businesses and bring together varying

users, thus enhancing the utilization of forest resources. Better data enables more efficient

and higher quality planning and operations in the entire wood supply chain.

The Data to Intelligence (D2I) research program aiming to build the foundations for the next

generation forest resource management system in Finland, recognised the following

development opportunities:

• Terrestrial laser scanning (TLS) especially can provide the tools to measure and predict

single-tree-level AGB components with high detail using metrics describing the shape

and size of the trees. And Airborne Laser Scanning (ALS) could be used to predict this

information to larger areas.

• Timber assortments can be accurately predicted using TLS or multisource approach.

Also, tree quality features can be measured accurately to further improve the value of

forest resource information.

• Automatic processing of TLS data was demonstrated to be effective and accurate and

could be utilized to make future TLS measurements more efficient.

• Multisource approaches provide new possibilities to improve the accuracy of single-

tree measurements but also for predicting values for larger areas.

One of the main objects of D2I was to study operational harvester data potential in updating

forest resource information and as a reference data for Airborne laser scanning.

Page 42: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 42

Figure 15. Data collected by forest machines help to evaluate harvesting conditions, for example. Photo: Erkki Oksanen.

5.2 Big Data applications in forestry - scope, impact and benefit of

digital forest management

5.2.1 Forest Big Data platform

There has already been a strong effort in Finland to build up a general Forest Big Data

platform. The goal of the research task of Metsäteho was to specify and demonstrate a

platform providing data inquiry services for users and applications to easily access available

forest data sources. The Forest Big Data covers forest resource, forest condition, and wood

procurement process data. The FBD Platform connects and refines data from various data

sources and delivers refined data to application suppliers. The application suppliers, for their

part, sell various services to end users (i.e. actors in the wood procurement chains).

Applications are divided into two groups: the FBD Applications that are developed in

cooperation with the FBD Platform and other applications that are developed without

cooperation.

Page 43: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 43

Figure 16. Forest Big Data Platform with forest big data and application components (http://www.datatointelligence.fi/forest-big-data.html).

The aim of Forest Big Data platform is to provide uniform view to heterogeneous forest data

sources by specifying a common data inquiry interface and a data structure for representing

data and required metadata, in particular, the uncertainty. To provide easy access to the data

sources, the platform offers basic services for updating data with growth prediction models

and for combining several up-to-date data estimates by means of Bayesian data fusion. The

main aim of the FBD business is to bring added value to the end users of the FBD Applications.

Therefore, the success of the FBD business is measured by the performance of the FBD end

users rather than by actual FBD business transactions. The FBD Platform is envisaged to be

operational in 2020.

5.2.2 Digiroad

One ongoing tentative in Finland is to establish and develop a comprehensive Forest Digiroad

Service which collects in real-time operations condition and accessibility data of forest road

all over Finland. Forest data forum defines the contents of data and rules for providing,

sharing and utilization of the service. Trucks and drivers heading on forest roads and other

timber transportation routes produce continuously data about road conditions in automated

way. This will help to save money in forest road maintenance and enhance traffic safety as

well as decrease risk of road damages. Precise information on forest roads, forest and its soil

Page 44: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 44

helps in targeting the work. This is called precision forestry. The work can already be

optimized when saplings are planted.

5.2.3 Metsaan.fi e-Service

Metsään.fi is an eService provided by the governmental body, the Finnish Forest Centre

(METSAK), to make forest resource information available for citizens free of charge.

Metsään.fi as an eService serves forest owners and forestry service providers. Metsään.fi is a

portal through which people who own forest property in Finland can conduct business related

to their forests from their own desktops. The portal connects owners with related third

parties, including providers of forestry services. This makes it easy to manage forestry work

and to be in touch with forestry professionals.

Metsään.fi is a portal which offers the latest information to forest owners on their properties.

As soon as they log in, users can see what should be done in their forests right now.

Information is displayed for each forest stand compartment, broken down by soil type, tree

type and natural occurrence, and possible logging or forestry actions are suggested, including

income and cost estimates. Maps and aerial photographs clearly show where properties are

located and what they look like. Users log in securely using their online banking codes. The

service is offered in Finnish and Swedish.

The portal saves service providers the cost and effort of visiting sites to obtain the latest data

on which to base plans. It also contains up-to-date contact details for forest owners. The aerial

photographs and maps are important tools for professionals, and for small businesses

Metsään.fi may replace the need to have their own geographic information system or CRM

system entirely. Most private Finnish forest owners are either in employment or retired, and

a growing proportion live far from the forests they own. For most owners the forests are not

a major source of income, and only a small fraction have professional forestry skills.

The portal draws information from a national forest resource database, which is continuously

updated with data obtained by laser scanning, aerial photography, sample plot

measurements and site visits. This sort of data collection is a statutory task of the Finnish

Forest Centre. Between surveys, information is maintained based on notifications received by

the Forest Centre from forest owners and forestry organisations. Now, reports on completed

work can also be submitted via Metsään.fi. Tree growth is factored into the data in the portal,

and suggested actions are updated annually.

Development of the portal is funded by the Finnish Ministry of Agriculture and Forestry.

Metsään.fi supports the fulfilment of many strategies and EU directives, including the EU

Forest Strategy, the PSI and INSPIRE directives, the development of rural livelihoods and the

promotion of biodiversity.

Metsään.fi is provided by the Finnish Forest Centre, which is a state-funded organisation for

promoting sustainable forestry and forest-based livelihoods. The portal is free of charge.

Businesses can define in which areas they want to operate.

Page 45: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 45

Figure 17. Metsään.fi service with related operations and user groups.

The Finnish government has recently decided to invest 13 M€ in forestry digitalization in

2016-2018 through establishing a “Key” project for forestry. Political opinion is now after a

long time very positive towards forest business promotion. Forest industry is today again seen

as a business of the future - not a business of the past.

Page 46: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 46

Figure 18. Entity of forest data development in Metsään.fi Service. Specific focus on improvement of data mobility and data quality, and e-service promotion. (Metsätieto 2020 - Kehittämissuunnitelma).

5.2.4 Wuudis Service

Wuudis is a full-service digital forest property management platform for forest owners, forest

contractors and forest authority expert. It is a network service enabling data and close to real

time information sharing between forest owners, contractors, timber buyers, manufacturers,

forest insurance companies and authority expert of forestry sector. It also acts as a market

place for selling timber/biomass and forest care works (harvesting, reforestation, fertilization

etc.). It enables easy and remote forest management for forest owners. It guides for planning

next forest activities that needs to be performed to exploit maximum economic benefit from

timber harvest. It can save costs and increased margin for contractors via easy scouting and

connection between forest owners and contractors. It has societal and environmental value

via promotion of sustainable forest management practices and increased mobilization of

available biomass resources for the needs of the biomass industry.

Forest Health Monitoring and AIS Control

Spain has to face alarming situations due to several pests which are big threats affecting the

health of very important species in the Iberian Peninsula, among others: Quercus ilex,

Quercus suber and Eucaliptus sp.

Page 47: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 47

The tasks carried out on DataBio project in this regards have been related to the development

of a methodology based on remote sensing images (satellite + aerial + UAV) and field data for

monitoring the health status of forests in large areas of the Iberian Peninsula. The work have

been particularly focused on the monitoring of Quercus sp. forests affected by Phytophthora

cinnamomi Rands and of the damage in eucalyptus plantations affected by the coleoptera

Gonipterus scutellatus Gyllenhal.

Specifically for the use case of Eucalyptus and Gonipterus and, in order to test the validity of

UAV data, this pilot used RPAS eBee (fixed wing) and a hexacopter (rotary wing) with three

different cameras: SODA – RGB, Sequoia Micasense – Multiespectral and Thermomap –

Thermic.

Figure 19. TRAGSA Drones used in Forestry pilot.

Processing the obtained images, TRAGSA has generated several products as RGB,

multispectral, NDVI or thermic reflectance orto-mosaics.

Figure 20. Generated products (imageries) in Forestry pilot.

Page 48: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 48

Using different spectral bands provided by the cameras used in the tests, TRAGSA is

developing a model explaining the relation between several EO indexes (NDVI, CARI, GNDVI,

NGRDI,...) and the optical properties of vegetation, pigments concentration and chlorophyll

concentration. Eventually, those data will be cross-checked with field data.

Of course, those indexes have been generated as Big Data multi-table datasets, but a visual

example of the results can be seen in the following image:

Figure 21. Generated indexes (images) in Forestry pilot.

Despite of the acquisition and processing tools have proved to be successful, the main

problem that the pilots are currently dealing with is related to the selected tree species.

Actually, the canopy density (crown density) of eucalyptus is very low, and they appear usually

mixed with bushes. This fact makes the isolation of selected trees difficult.

Currently, TRAGSA is developing isolation geometric methodologies in order to double check

the produced statistical data to be analyzed using R or StatsGraphics.

Regarding the AIS control pilot, TRAGSA is developing more conventional and traditional Big

Data operations based on gathering several datasets and processing them. In this specific

Page 49: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 49

pilot, the most relevant aspect is based on the use of large-scale images as source datasets as

WORLDCLIM or GHS, population grid developed by JRC.

5.3 Future developments Traditionally, forestry is driven by local producers with own practices and culture of doing

business. Currently there is a significant interest from both EU member states national

governments as well as EU commission in rapid digitalization of forestry in order to improve

the profitability and competitiveness while ensuring sustainability. Big data technologies can

provide long term and sustainable solutions for the management of the whole sector. A

critical challenge is making the benefits available to the wide range of actors and end user

within the forestry sector.

5.3.1 Opportunities and possible big data solutions

5.3.1.1 Metsään.fi e-Service

In the DataBio project, Big Data partners will integrate their existing market ready or almost

market ready technologies to the forest databases with METSAK and the resulted solutions

will be piloted with the forestry sector partners, with associated partners and other

stakeholders e.g. public policies related to nature conservation,

infrastructure/landscape/town plans.

The existing technical environment of Metsään.fi eService concerning big data consists of

multiple data sources and big data types including remote sensors, geospatial information,

images and text. The DataBio project utilizes these data sources, generates new type of data

structures, methods and data analytics methods. Metsään.fi eService uses the big data

through a publishing database and other existing interfaces. The data is not saved in

Metsään.fi eService itself. The Big Data volume at METSAK was 200 GB of forest resource data

in the beginning of 2017. The amount is expected to increase around 100 GB per year during

this project being around 500 GB in the end of 2019.

In Finland, there are vast amounts of passively owned forests that could serve both financial

and environmental needs for forest management more effectively. Also, many novel forest

health problems are likely to occur in the future without innovative forest management

solutions that can enable appropriate management activities. A major concern of forest

authorities is how to encourage forest owners to better manage their assets. Metsään.fi

eService for forest owners and forestry operators supports the management of privately

owned forests and enhances the use of forest resource data. Metsään.fi eService is constantly

developed by means of increasing the forest data and functionalities related to it. The main

goal in DataBio project and related pilots is to enhance the use of Metsään.fi eService and the

use of METSAK’s forest resource data. One key opportunity is to offer Metsään.fi users more

information and tools for instance on storm damages and quality control to support better

forest management. This can be enabled by crowdsourcing solutions, which will be piloted in

DataBio project.

Page 50: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 50

5.3.1.2 Wuudis Service

Since, the launch of ‘Wuudis Forest’ service early this year, presently it has around 400 users.

The service is validated in Finland and the number of users are increasing every day. As a part

of Databio project MHG Systems is aiming to develop the most innovative and holistic forestry

big data solution named ‘Wuudis Data’ by collaborating with SENOP, Forestry TEP, and

Metsaan.fi service. In addition, analyzing the large volume of user behavioural data to provide

customized information to relevant forest stakeholders is also an important feature of this

service. ‘Wuudis Data’ service will be the final outcome of the three forestry pilots, which has

immense commercialization potential with significant benefits to the whole forest business

value chain. The expected indicative benefits of ‘Wuudis Data’ service covering the whole

forest value chain is shown in Figure 22.

Figure 22. Forest value chain and the expected benefits of ‘Wuudis Data’ to all segments of the value chain.

5.3.1.3 Concept of ‘Wuudis Data’ service

‘Wuudis Data’ service is aiming to become the most holistic service in forestry business by

integrating multiple forestry data sources into single web-service. It provides the required

tools for easy forest management and necessary customized data to all forest stakeholders.

The concept of ‘Wuudis Data’ service is provided in Figure xx. The black boxes in Figure xx are

the key features and functionality of future ‘Wuudis Data’ service, which are under

development as a part of Databio forestry pilots. A very effective business oriented approach

in used in this development project as ‘Wuudis Data’ service will be built on top of ‘Wuudis

Forest’ service and integration with various data sources like Forestry TEP, SENOP drone

based monitoring and metsaan.fi service. In short the approach is: (i) Integrates multiple

forestry data sources to single web-service, (ii) Analyses and visualizes the combined data for

the end users based on their needs and (iii) helps users to focus on relevant up-to-date

information of forests and forest owners

Page 51: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 51

Figure 23. Concept of Wuudis Data.

Hyperspectral Imaging Systems for Drone Remote Sensing Platforms

Hyperspectral imaging from small unmanned aerial vehicles (UAV) offers agile type of remote

sensing. In forestry monitoring the data has mainly captured from manned aircrafts and

satellites, focusing more on the forest or plot level. UAV imaging enables higher spatial

resolution, improving the resolution of photgrammetric point clouds and the acquisition of

three-dimensional (3D) structural data from the forest. In this sense the satellite data can be

locally magnified by UAV hyperspectral data to get information about individual trees,

including their specie and health status via more accurate radiometric image and accurate

heights via more precise canopy height model.

For growing UAV remote sensing market Senop has done pioneering work by manufacturing

small, lightweight camera that can be easily mounted on a drone. Senop camera is a unique

frame based hyperspectral imaging device that is based on a variable air gap Fabry-Pérot

interferometer (FPI) operating in the visible to near-infrared spectral range (500-900 nm).

Within the forestry 2.3.1 pilot the spectral data produced by Senop hyperspectral camera has

spatial resolution less than 10 cm: 1 pixel equals less than 10 centimeters. This data will be

processed into georeferenced spectral maps, i.e. radiometric orthorectified image mosaics,

and into 3D point clouds and surface models (DSM) with EnsoMOSAIC Fusion image

Page 52: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 52

processing software provided by MosaicMill Ltd. And further, these maps can be joined with

other 3D point clouds and digital surface models of complex 3D structures of forests.

Obtained radiometric image mosaics and 3D point clouds will be analyzed with the algorithms

provided Simosol Ltd. Their forestry simulators and growth modelling tools enable us to

provide a unique ecosystem service within tree-wise monitoring and mapping for studying

e.g. effects of fertilization and infestation.

Figure 24. The concept of new Senop Hyperspectral camera, released ín 2018.

5.3.1.4 Sentinel-2 based monitoring system

Within the forestry pilot by FMI various big data analysis of satellite optical Sentinel-2 data

will be performed. When dealing with high spatial and temporal resolution data like Sentinel-

2 enormous amount of data are generated every five days, allowing for near-real time

monitoring of forest ecosystems on country / continent scales. This is however only possible

by utilizing big data approaches to pre-process and interpret the data. In FMI’s pilot, two

separate tasks of satellite big data are addressed - 1) automated generation of time series

cloud free reflectance images covering the area of Czech Republic in the peak vegetation

growing season, 2) interpretation of cloud-free images with regards to forest health

conditions.

Using all-available observations of pair of Sentinel-2 satellites, quality of each image pixel can

be assessed independently in selected time interval and synthetic reflectance images can be

generated on per-pixel basis (so called spatial-temporal analysis or L3 product). Once the data

archive of pre-processed Sentinel-2 images (big data) is established, end-user may generate

such reflectance images for any selected time interval in a fully automated manner. This may

cover key phenological vegetation stages (spring leaf emergence, peak growing season,

Page 53: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 53

autumn senescence), or generate timely cloud free image to assess rapid forest changes (e.g.

wind-fall, insect infection). For an example of the output see Figure 25.

Figure 25. Example of cloud-free reflectance image of the forests of Czech Republic generated using big data spatial-temporal analysis utilizing all-available Sentinel-2 observations between June and August 2016.

Interpretation of resulting cloud-free images will be based on the analysis of time series of

vegetation indices and forest quantitative products. For this, extensive in-situ ground truth

data collection will be performed to sample forest structural parameters. The sensitivity of

satellite-derived products will be studied based on these in-situ data and best performing

products will be used in the time series analysis (see Figure 26).

Page 54: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 54

Figure 26. Example of satellite-derived product describing forest health status - amount of chlorophylls in forest canopies. Red areas are identified as forests with low chlorophyll content. Cloud-free image mosaic generated above Sentinel-2 big data was used as an input in the algorithm.

Page 55: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 55

6 Big data in fishery 6.1 Introduction The fishing fleet is increasingly sophisticated with numerous sensors installed on each vessel

for finding fish, navigating and communicating with the outside world. The temporal and

spatial variability of fisheries stocks have led the fishermen to have an inherent need to share,

restrict and seek out information from each other. Sharing of information between the

fishermen may be bartered, given on a friendship basis or obtained from public sources such

as auction results, buyer reports and publicly available tracking services and statistics. Several

separate groups benefit from data collection and sharing between fishing vessels: the

fishermen, the managing companies, research institutes, and government bodies.

The current method to obtain information about the fisheries activities—where the fishing is

good, which species are caught, and which vessels are active—is through utilization of

communication technologies. Sales data are routinely accessed to obtain information about

which vessels deliver what quantities of different species where. AIS (Automatic Identification

System) tracking portals are used to get an overview of the regions where vessels operate,

refer to the MarineTraffic's information service below for an example of such a portal. If the

regions are within AIS coverage, and telephone (both mobile- and satellite-based) may be

used to contact specific vessels or companies to get a first-hand account of how the conditions

are and to obtain bits of information used for trip planning. This process is manual and the

access to information is limited by availability of industry contacts and the willingness to

share.

The vessels are operated by businesses where the shipowner controls both the vessel and the

resource base of each vessel. The catch is landed from vessels per arrangement with buyers

or by habit and location. Each fishing company report their catch diaries to the regulatory

bodies and catch information is accumulated on each vessel (or company) while sales and

deliveries of fish are collected into publicly available statistics. The businesses maintain their

own experience data of past fisheries while landing statistics are available online from various

sales organizations.

6.1.1 Vessel monitoring systems and fisheries management

Vessel monitoring systems (VMS) are defined by WikiPedia as systems used in commercial

fishing to allow environmental and fisheries regulatory organizations to track and monitor the

activities of fishing vessels both in a country's territorial waters and the Exclusive Economic

Zone extending 200 nautical miles from each country’s coasts. VMS systems are used to

improve the management and sustainability of the marine environment by ensuring proper

fishing practices and help prevent illegal fishing. VMS relates to specific application of

monitoring commercial fishing boats and should not be confused with VTS (Vessel Traffic

System) which aims to monitor marine traffic primarily for safety and efficiency in ports and

busy waterways (in the information services mentioned here MarineTraffic is more a VTS,

Page 56: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 56

while Global Fishing Watch focus is more of a "global" VMS). VMS systems implementations,

requirements and protocols vary between countries. EU, including Norway through EEC,

requires VMS and Electronic Report Systems (ERS) aboard all fishing vessels longer than 15

meters (above 12 meters since 2012). Figure 27 outlines the components of a VMS system.

Figure 27. Illustration of VMS (from EC commission, Fisheries policy – control technologies).

Furthermore, the governing bodies (EC and national EU and EEC member states) requires

fishermen and landing sites by law to report catch data back to them for monitoring purposes.

Prior to the landing, the catch volume is also reported to the proper sales association for the

catch species and auctioned to determine the landing site. Fishery shipping companies are

increasingly replacing paper logbooks by ERS systems) integrated with each vessel in their

fleet to support efficient quota management. Dualog is one example of a company providing

ERS software for catch journals and quota management with their eCatch application

(www.dualog.no) which is used by many shipping companies in Norway.

ICES, the International Council for the Exploration of the Sea, is the governing body that

determines the status of fish populations and recommends sustainable quotas for the next

year through their annual meeting of fishery biologist from Europe and North America. The

ICES advice is part of the EU negotiations and help set quotas for both EU and ICES member

states. ICES's advice carries a heavy weight as input for settling the quotas at international,

bilateral and national level.

6.1.2 Optimization focus in fishery

Fuel consumption is a challenge for most fisheries, as it represents 60-70% of total annual

cost of a vessel activity ([REF-35], [REF-36], [REF-37], [REF-38]). Nowadays, the decision about

the route of vessels is taken by expert fishermen in a subjective way based on their own

experience, technological devices (sonar, meteorological forecasts) and increasing

Page 57: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 57

communication with local scientists (e.g. presence/absence forecasts from habitat models).

Apart from the initial planning based on best areas to fish in the past and current

meteorological forecasts, the existence of fish or not in each fished point attempted (spatial

correlation) as well as unforeseen events (bad weather, instrument failures, etc.) need to be

considered. This has been approached in the past using interactive optimization [REF-39],

[REF-40], which has been also used in maritime transportation planning [REF-41]. A critical

task involves the definition of a fitness function that accurately represents the real world

which often require an iterative process of eliciting a fitness function from the expert [REF-

40] which explains why so far there are only some attempts or proof of concept aiming at

optimizing some elements of fishing activities ([REF-42], [REF-43]). However, they focus on

considering a single activity or destination (e.g. routing to the fishing area; [REF-44], [REF-42];

or a single decision driver (e.g. meteorological conditions; [REF-42], [REF-45]). None of those

works has an overall objective of maximizing benefits and reduce costs for an entire fishing

fleet.

6.1.3 Machine learning applications in fishery

Machine learning based approaches using satellite data have been successful in the past for

example in forecasting species recruitment and identifying new potential predictors ([REF-

46], [REF-47], [REF-48], [REF-49]). In particular, further time-series analysis of anchovy

recruitment forecasting showed that a new predictor based in climate patterns could explain

a seasonal behaviour [REF-46]. These methods can be also combined with expert knowledge.

For example, to consider novel machine learning methods that can take advantage of

suspected interactions between species and doubling the chance of being right in predicting

all simultaneously [REF-47] or, to be combined with mechanistic models to take advantage of

both modelling approaches [REF-50].

Recent advances in image analysis have shown promising results for automated classification

of marine samples. This methodology is based on taking a digital image of zooplankton

samples by a scanner [REF-51] or a digital camera [REF-52], and using machine learning

algorithms to identify the zooplankton individuals from the image, classify them into

taxonomic groups (defined by the user), and measuring each of these specimens separately

to obtain estimates of abundance, biomass, and size spectrum per taxon ([REF-53]; [REF-54];

[REF-55]). These methodologies allowed to process several thousand of samples in [REF-54].

A major advantage of this methodology is that it only requires inexpensive equipment and,

after the initial setup and training [REF-56], it can be very fast and operated by non-specialist

personnel. It can estimate the plankton abundance and biomass from large amounts of

samples quickly and thus cost-effectively ([REF-54], [REF-55], [REF-57]), albeit with lower

taxonomic accuracy [REF-52]. However, the application of such methodologies is still a

challenge to phytoplankton classification and abundance estimation due to the small sizes of

the individuals from 5um of Pseudo-nitzschia species to 50 um of other species (Dinophysis,

Alexandrium, Lingulodinium). Nowadays there are few systems that can digitalize it, but those

Page 58: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 58

systems are big, expensive and require constant attention from a human operator. All this has

limited the number of species and samples aimed in past studies [REF-58]; [REF-59].

Sonars and echo-sounders are widely used for remote sensing of life in the marine

environment. Preliminary work shows the potential of the automated analysis of commercial

medium-range sonar signals for detecting presence/absence of tuna in fishing vessels as a

proof-of-concept to increase our data acquisition capacity in a cost-effective way [REF-60].

Scientific surveys are very costly and of limited coverage [REF-61]. The approach in [REF-60]

uses image processing techniques to analyse sonar screenshots. For each sonar image

measurable regions are extracted and analysed their characteristics. Scientific data was used

to classify each region into a class (tuna or no-tuna) and build a dataset to train and evaluate

classification models by using supervised learning.

[REF-62] and [REF-63] used backscatter energy levels at multiple frequencies, e.g. discrete

frequency analysis, as features for classification of fish species based on echosounder data.

The Institute of Marine Research in Norway has conducted many research projects together

with SIMRAD (Kongsberg Maritime) to quantify and identify fish schools through

hydroacoustic data through the years [REF-64]; [REF-63]. Furuno and Simrad are among the

top professional fish-finding instrument brands globally today with commercial product lines

for sonars and echo sounders dating back to the 1940s and 1950s. They are both currently

positioning themselves to improve their business through applying big data technology to

provide more sophisticated analyses and services to increase the value of their fish-finding

instruments. However, as per October 2017, these companies still have no sonars or echo

sounders with this technology available commercially.

6.1.4 Big Data information services in fishery

Olex is a successful Norwegian company selling a system for combining data from GPS and

echo sounders to provide detailed bathymetric maps based on crowdsourcing data from their

customers and sharing collated data among them. This has worked very well for two decades

(established in 1996), and their system is highly popular with more than 2500 users

contributing data in north-west Europe (refer to www.olex.no for the full list of vessel

installations). Olex have shown that a collective of fishermen sharing their data is capable of

producing results far beyond what could be imagined by the mapping community. Their

system is highly relevant in that it already records and shares data from echo sounders. If

their system is extended to record and report observed biomass estimates in addition to

seabed depth, Olex’s popularity can efficiently boost the expansion of hydroacoustic data

gathering from the fishing fleet.

MarineTraffic is a very popular information service for finding location and other information

about vessels, ports, stations and offshore installations, including arrivals and departure

times. They have more than 6 million monthly users visiting their site.

Page 59: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 59

Figure 28. MarineTraffic information portal showing vessel traffic in Northern Europe based on AIS data (from www.marinetraffic.com).

The Global Fishing Watch [REF-65] (www.globalfishingwatch.org) is a large project that maps

fishing activity based on machine learning of vessel motion patterns [REF-66]. This is based

on massive AIS (Automatic Identification System) data sets dating back to 2012 and with 72

hours of latency for AIS data increments. The project strength is the massive and global

analysis of AIS data, but this is very sparse data to base global monitoring of fishery activity

on, and more data partners are joining the project as it moves forward. The transparent data

sharing policy in this project makes it stand out as a unique global resource. Although the

global map of fishing activity requires a steady high-bandwidth connection, the open data

portal and source code website gives access to highly relevant fisheries that can be accessed

and processed onshore as part of the planning phase. An example of one-month fishing

activity (July-August 2017) in the Norwegian and Barents Sea and is shown in Figure 29.

Page 60: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 60

Figure 29. Norwegian Sea fishing activity according to Global Fishing Watch (Jul/Aug 2017).

BarentsWatch is a comprehensive monitoring and information system with a public portal for

large parts of the northern seas focusing on the North Atlantic from Scotland to the Arctic

waters (www.barentswatch.no). It was launched in 2012 and includes the set of services as

shown in Figure 30. The FishInfo service is special relevant as it shows where fishing activity

is ongoing and which areas have been closed or restricted for fishing, ref Figure 31. It also

includes information about the ice edge and concentration, seabed bottom types including

coral reefs and offshore subsea facilities and active and planned seismic surveys. Hence, it is

already a comprehensive portal with relevant information for fishery and more information

layers are continuously being integrated based on a prioritization of the usefulness. Map files

can be downloaded and are compatible with the Olex system and several chart plotters.

Page 61: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 61

Figure 30. Information services in BarentsWatch (barentswatch.no, accessed 29/11.2017).

Figure 31. The FishInfo service - Example showing fishing activity with nets (blue), lines(red) and purse seiners (purple) as well as restricted (black polygons) and closed (filled polygons) fishing areas (from the fiskinfo.no website).

Page 62: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 62

6.1.5 Open data providers relevant for fishery Big Data analytics

EModnet (The European Marine Observation and Data Network, www.emodnet.eu) is an

organization supported by the EU's maritime policy which aims to be the gateway to marine

data in Europe. A central challenge is to make available the marine data collected by many

different institutions and research projects across Europe, which often have been carried out

in a fragmented way for many years. EMODnet provides access to European marine data

across seven discipline-based themes as seen in Figure X, with each theme having a specific

gateway with access to standardized observations, data quality indicators and processed data

products. As an example of data relevant for fishery it can be noted that the human activities

portal includes catch statistics data per port and the biology portal has field observation data

for many marine species while the Physics portal has sea surface and depth profile

temperature data. These are just examples of data sets that been made available by

EMODnet, there is much more data available. However, while the data are diverse and large,

there is still work needed on standardisation of data access and filtering functionality (lat/long

rectangles and time) as well as data collation and integration support, specially across themes

(experienced at the www.opensealab.eu event where the DataBio was represented by Team

CLP, see https://github.com/EMODnet/OpenSeaLab). EMODnet is stimulating marine

innovation through open data sharing and encouraging developers to provide their marine

applications as open source through GitHub.com.

Other open data set portals of high relevance for fishery include the UN Comtrade Database,

the World Bank Open Data portal for international trade statistics and the European Market

Observatory for Fisheries and Aquaculture (EUMOFA) and Eurostat for EU-specific statistics

data. NOAA (the National Oceanic and Atmospheric Administration) in the US and the

Copernicus Marine Environment Monitoring Service in EU are highly relevant information

hubs for weather, climate and EO observations. There are also national services that give

more detailed insight per country in each country's fishery and economic statistics. A

summary of relevant open data providers for fisheries with hyperlinks is given in Table 2.

Table 2. Open data providers relevant for fisheries.

Open Data Provider Description Hyperlinks

World Bank Statistics

• API overview

• WDI Indicators

• Third party apps

Global & Country

Economics CSV, Excel,

XML, JSON ++

https://datahelpdesk.worldbank.org/ http://data.worldbank.org/developers/api-overview https://data.worldbank.org/data-catalog/world-development-indicators https://data.worldbank.org/products/third-party-apps

UN Comtrade Database

• Data availability

• API

Merchandise Services

http://comtrade.un.org https://comtrade.un.org/data/da https://comtrade.un.org/data/doc/api/#DataRequests

Eurostat EU statistics

• Web API

SDMX JSON

http://ec.europa.eu/eurostat/ http://ec.europa.eu/eurostat/data/web-services

Page 63: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 63

EUMOFA EU market data: most important

fishery data

http://www.eumofa.eu/ http://www.eumofa.eu/macroeconomic (example dashboard) Extracts data from EUROstat and national databases from member states

EMODnet & EurOBIS 7 themes, from physics

to biology

http://www.emodnet.eu/portals http://www.eurobis.org/dataset_list

Copernicus (EU) and NOAA (US)

EO, weather, climate, waves,

temperatures

http://copernicus.eu/ http://marine.copernicus.eu (marine data sets) http://www.noaa.gov

Nature.com Database of commercial, small-scale and illegal

catch

https://www.nature.com/articles/sdata201739

University of Tasmania -IMAS Institute for Marine and Antarctic studies

Global fisheries landings

http://metadata.imas.utas.edu.au/geonetwork/srv/eng/metadata.show?uuid=c1fefb3d-7e37-4171-b9ce-4ce4721bbc78

National portals example:

• Fisheries Directorate

• Seafood Council

Norwegian Catch

regulations Export data

https://www.fiskeridir.no http://seafood.no

6.1.6 Fisheries and open source software

There is a comprehensive open source community with software relevant for marine research

and fisheries, a search on GitHub reveals:

• 1570 repositories related to "marine"

• 943 repositories related to "fisheries" or "fishery"

• 4417 repositories with "sonar" in the title

It is hard to say how many of these projects are relevant for fisheries, but even if the

percentage is quite low there will be several interesting applications worth investigating. An

established open-source community developing applications for the marine environment and

fisheries already exists and easy access to open data like the EMODnet initiative will help

accelerate its growth. The scope here is not to give a comprehensive overview over open-

source fisheries projects, but rather to acknowledge this community's existence and highlight

especially relevant projects.

FOCUS (Fisheries Open source CommUnity Software) is a open-source community with the

goal to offer a free suite of tools to support fisheries management organisations to contribute

to sustainable fisheries (www.focus.fish). The project has signed a SDG partnership with the

Page 64: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 64

UN and is also supported by the European Commission (see

https://ec.europa.eu/fisheries/cfp/control/technologies_en). The main open source

contributor so far is the Swedish Agency for Marine and Water Management with the Union

VMS co-op project (refer to https://github.com/UnionVMS). The community was established

only a year ago (September 2016) with the high ambition to "be the global reference for

standards and innovative open source solutions for sustainable fisheries management". A key

challenge is the data integration of very diverse data sets, and FOCUS support the UN/CEFACT

FLUX (Fisheries Language for Universal eXchange) standards for information exchange to

overcome the barrier with diverse national reporting standards.

Figure 32. The FLUX standards and status (from UN ESCAP presentation of Dr Heiner Lehr) [REF-37].

The type of data exchanged include:

• Information between stakeholders on stocks, quotas and catches

• Real time monitoring of vessel positions (VMS) and on-going fishing activities

• Reporting of fish landed and sales

• Vessel data and characteristics

• License and fishing authorisation requests

FOCUS is a recent, but important, initiative with strong support from the UN and EC and that

has the momentum for becoming the focal point in open source fisheries development for

implementing the FLUX standard for data exchange and more transparency in fisheries.

Page 65: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 65

Furthermore, it is also important to mention that the SAM (State-space Assessment Model)

model by Anders Nielsen and Casper W. Berg from DTU Aqua is used by ICES to estimate

development in at least ten of the most economically important fisheries in Europe [REF-38]).

The model is web-based and anyone can enter data and check the intermediate results and

figures used to generate a result and it is also possible to rewind all results to see the data

used to reach a specific conclusion. This provides high transparency and easier insight

between the researchers themselves as well as between ICES and the fishermen.

6.2 Conclusion There is a lot of activity in the marine sector related to Big Data and fishery, and it is difficult

to get a full overview of the diverse initiatives in a short period of time as the mandate of this

overview. A key observation from established and ongoing work is that the key priorities focus

on open data access, standardisation and data integration of very diverse data sets, reporting

and visualization of fisheries activity. In short, current services are reporting and monitoring

of what is going on in the fisheries.

Global Fishing Watch is the prime example above for leveraging machine learning on a global

scale for detecting past and recent fishing activity. However, although the project is working

with integrating more fishery data sets, the service is based on mainly AIS data which has only

vessel identity, position and destination information. The same comment can be made for

MarineTraffic service. While both services are great services and contributors to transparency

on what goes on in the marine sector, it also goes a long way to highlight the importance and

need for data integration. When multidisciplinary data can be analysed together in new ways

leveraged by Big Data technology important extensions to existing knowledge as well as new

insights are to be extracted.

6.2.1 Current impact of Big Data in fisheries

Current fishery services and portals goes a long way in aiding the fishermen, shipping

companies and authorities:

● VMS and VTS systems shows where fishing activity goes on, and where vessel traffic

and offshore installations are, e.g. helping increase transparency and making it harder

to for illegal, unreported and unregulated (IUU) fishing activities to not be noticed.

● Weather forecast services and systems like fishinfo aid fishermen in planning routes

to the fishing grounds and where to fish with consideration into account weather,

environment and ongoing fishing activities and regulations.

● Open data portals are doing a great job in making data available and discoverable

while much work need remain for making it interoperable and facilitate data collation

across different scientific domains. Important standardization work for data

integration is ongoing, but the different data exchange services also need to

implement them.

Page 66: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 66

6.3 Future developments The fisheries industry is currently at the starting point of leveraging Big Data technology.

While the European Commission and other international and national governance and

research organizations have a strong focus on maximizing open data access potential for

innovation, many industrial stakeholders are also positioning themselves to secure future

revenue from Big Data often in conflict with this goal. Traditionally the marine sector including

vessel equippers and instrument makers has been dominated by a "vendor lock-in policy",

and today one still can see this way of thinking being continued by more restricted access to

instrument data that used to be available previously. Many industry players are moving from

selling products to services and more often than not this includes aggressive licensing policies

giving the end users less control of the data than before. Access to data is fundamental to

stimulate innovation and knowledge discovery, and the EU GDPR (General Data Protection

Regulation) that comes into effect in 2018 is also an important step to secure end users right

to their data and making it available to third party processors of choice [REF-69].

The OECD (Organisation for Economic Co-operation and Development) report The Ocean

Economy in 2030 [REF-70] describes the development of the ocean economics towards 2030.

The blue economy is strongly growing from the 2010 estimate of 1500 billion USD (2.5% of

the world economy), and OECD suggests that if the current rate of growth continues, it will

more than double by 2030. This is a conservative estimate as they do not include a good

number for ocean-related sectors without adequate data (e.g. new innovations). On the other

hand, the ongoing deterioration of the seas, e.g. pollution and climatic changes) put

important restrictions on development of the ocean economy. A globally sustainable and

responsible management increasing the knowledge on the implications for the marine

environment is paramount to harvest the growth potential of the oceans.

6.3.1 User needs and Big Data opportunities

The current catch technology is extremely efficient and the fleet rarely has trouble filling their

catch quotas during the fishing season. The key question is rather how to leverage Big Data

technology to optimize operation, planning and management of the fisheries to secure a best

possible profit with minimal environmental impact on oceans and climate:

• Reduced energy consumption and emissions through efficient fishery planning and

operation with better information services.

• Improved oceanographic models and multispecies stock estimation models.

• Avoid species overfishing and IUU fishing activities.

• More careful catch technology with respect to habitats, seabed, coral reefs and other

species.

• Catch technology for species lower in the food-chain, e.g. mesopelagic fish (200-

1000m water depth).

• Ocean clean-up technology, e.g. plastics and microplastics.

Page 67: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 67

A key challenge for the fishermen and shipping companies is to locate the fish as efficient as

possible to reduce the time and energy needed to fill the quota at a time when prices are

good. This challenge is becoming harder as pelagic species have an increasingly changing

migration pattern, and this observation is especially noticeable in Arctic waters. There is a

strong need to explain and understand why this happens, is it directly related to temperature

and other climate changes or does catch and other human activities like offshore oil

production, marine traffic and seismic surveys impact where the species moves?

There is great potential here for leveraging Big Data technology, and specially descriptive and

predictive analytics, to optimize the catch process both in terms of where the fish is, but also

what the expected market value will be while also increasing the understanding of how the

different elements of the marine environment impact each other when analysed in holistic

but multidisciplinary way. Open data and standardized exchange formats are key

prerequisites for making this feasible.

The fishery pilots in DataBio are summarized in Figure 33, and are focused to start addressing

the first two key challenges listed above.

Figure 33. Summary and context of the Fishery Pilots in DataBio.

Page 68: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 68

7 The future of big data The exponential growth in data volumes is expected to continue at least for the next years to

come. As an example, the number of Internet connected sensors and devices is estimated to

reach 30 billion in 2020 - up from 16 billion in 2016 [REF-71]. As the devices gather and handle

data in increasing resolution both spatially (e.g. drones capturing 4K video and ultra-high

definition satellite images) and temporally (e.g. continuous monitoring), the data volume can

be forecast to keep up doubling every year.

Similarly, by adding parallelism and other technologies, the computing power is commonly

estimated to stay conforming to Moore´s law (proposed already 1965!) during the next years,

which means it is doubling every second year [REF-72]. The same doublings speed seems to

be valid for chip speeds, computer speeds, and computations per unit of energy. However,

this comes at a price - it takes more and more resources to keep up the pace. The

development of quantum computers beyond the recent 16 qubits has the potential to speed

up solving certain categories of problems significantly. These problems include numerical

simulation and machine learning. This will most probably lead to a leap in Moore´s law at least

for certain categories of computation. Also, the communication speed increases with 5G and

new Wi-Fi technologies. Pattern recognition, be it spotting anomalies in time series or certain

crops in aerial images, is expected to advance rapidly utilising especially new developments

in deep and reinforcement learning. Data fusion, e.g. by combining separate datasets like

satellite images and map data, is becoming increasingly straightforward through the use of

standards like Linked Data.

However, there is a limiting factor - the use of electric energy. Already in 2012 the ICT systems

in the world consumed more electricity than all countries in the world except China and US

(see below). This development is increasing, especially with distributed architectures like

blockchain, even if the energy consumption per chip does not necessarily grow. It is clear that

new energy technologies have to be developed to allow for a sustained ICT growth. It is

indicative that the leading ICT companies like Google, Amazon, Apple and Microsoft are using

exclusively renewable energy in their data centres.

Page 69: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 69

Figure 34. Electricity consumption: countries compared to IT sector.

Page 70: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 70

8 Conclusions Current technologies indicate how the new ICTs and information flows would emerge in this

perspective around use of sensors. Earth Observation (EO) from satellites produce vast

amounts of data and is playing an increasingly important role as a regular and reliable high-

quality data source for farmers, foresters, fisheries, but also related industries. This

unprecedented large amount of data available for operational use is creating new challenges

to agriculture, forestry and fishery sector.

Discovery and access are the focal points, bringing together companies and increasing the use

of EO data, sensors and other data to support decision-making. With the new generation of

EO satellites and the emergence of key players from the industry come new challenges and

crossroads for future knowledge management. The resulting explosive growth of data poses

far-reaching dilemmas regarding the fragmentation of data infrastructures at the

international level. The time is for expanding the operational capability of global monitoring

from space, in situ and this opens a unique opportunity to build sustainable Big Data

Infrastructure that support user services exploiting archived and newly acquired derived

datasets.

Earth Observation (EO) from satellites produce vast amounts of data and is playing an

increasingly important role as a regular and reliable high-quality data source for farmers,

foresters, fisheries, but also related industries. This unprecedented large amount of data

available for operational use is creating new challenges to agriculture, forestry and fishery

sector. However, as the capacity of computing, data transfer and storage increase, variety

instead of volume has become the key characteristic of big data. New algorithms and

automated reasoning will be needed to deal with this challenge in an efficient way. Efficient

implementation of big data technologies also requires cooperation between researchers and

developers across different domains. Common big data frameworks and digital platforms

have been developed to facilitate the work, promote compatible approaches and enhance

knowledge transfer between domains.

The agriculture sector is of strategic importance for European society and economy. Due to

its complexity, agri-food operators have to manage many different and heterogeneous

sources of information and requires collection, storage, sharing and analysis of large

quantities of spatially and non-spatially referenced data. The management of this data is

implemented under the banner of precision farming (PF).

Forestry is developing rapidly with the help of new technologies and procedures. Numerous

methods provide information on forests, each with their own time cycles, granularities,

accuracies, costs, and viewpoints. Easy access to best available up-to-date information on

forests is expected to generate new applications and businesses and bring together varying

users, thus enhancing the utilization of forest resources.

Page 71: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 71

The fisheries industry is currently at the starting point of leveraging Big Data technology. The

fishing fleet is increasingly sophisticated with numerous sensors installed on each vessel for

finding fish, navigating and communicating with the outside world. The governments of

coastal nations that manage fisheries resources have a parallel need for information, and

catch statistics are collected together with financing of research cruises to statistically sample

the fish in the ocean at regular time and positional intervals. The need for fish stocks and

active fisheries information is at the same time at odds with the availability of real-time data

from the fishing fleets, which is a result of the scarcity of communication resources and high

cost of limited bandwidth at sea.

Big data technologies continue to develop and expand to new areas. In bioeconomy one of

the challenges is the diversity of operators in the value chain. Advanced analytics and

visualisation technologies will help bringing the benefits of data based solutions to a broader

audience, from primary producers to end users. This, in turn, will contribute to a more

efficient use of resources and help reach sustainability goals.

The new data IT industry, scientists, as well as the private commercial sector and value adding

institutions in general, now expect open access to big data sources and tools enabling efficient

exploitation of multidisciplinary data for developing value-added products and contrive

downstream services in agriculture, forestry and fishery.

Page 72: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 72

9 References

Reference Name of document

[REF-01] Peter ffoulkes. (2017). InsideBIGDATA Guide to the Intelligent Use of Big Data

on an Industrial Scale

[REF-02] NewVantage Partners LLC. (2016). Big Data Executive Survey 2016 - An

Update on the Adoption of Big Data in the Fortune 1000. Big Data Executive

Survey.

[REF-03] DIKW - Russell Ackoff's view,

http://paradigmas2006.blogspot.cz/2006/05/dikw-russell-ackoffs-view.html

[REF-04] Karel Charvat, Sarka Horakova, Sjaak Wolfert, Henri Holster, Otto Schmid,

Liisa Pesonen, Daniel Martini, Esther Mietzsch, Tomas Mildorf Final Strategic

Research Agenda (SRA): Common Basis for policy making for introduction of

innovative approaches on data exchange in agri-food industry’, agriXchange

26. 11. 2012

[REF-05] European Commission, Research and Innovation, Bioeconomy.

http://ec.europa.eu/research/bioeconomy/index.cfm

[REF-06] Schellberga J, Hill MJ, Gerhards R et al., 2008. Precision agriculture on

grassland: Applications, perspectives and constraints. Europ. J. Agronomy 29:

59-71.

[REF-07] Segarra E, 2002. Precision agriculture initiative for Texas high plains. Annual

Comprehensive Report. Lubbock, Texas, Texas A&M University Research and

Extension Center.

[REF-08] Siddiqa et al. 2016, A survey of big data management: Taxonomy and state-

of-the-art, Journal of Network and Computer Applications 71 (2016) 151–166.

[REF-09] OGC 10-157r4, Earth Observation Metadata profile of Observations &

Measurements, Version 1.1, 09/06/2016,

http://docs.opengeospatial.org/is/10-157r4/10-157r4.html.

[REF-10] CEOS OpenSearch Best Practice, Issue 1.1.2, 13/06/2017,

http://ceos.org/document_management/Working_Groups/WGISS/Interest_G

roups/OpenSearch/CEOS-OPENSEARCH-BP-V1.2.pdf.

Page 73: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 73

[REF-11] Tan, P.N., Steinbach, M. and Kumar, V. (2006) Introduction to data mining,

First edition edn., Addison Wesley.

[REF-12] Amar, R., Eagan, J. and Stasko, J. (2005) "Low-level components of analytic

activity in information visualization", IEEE Symposium of Information

Visualization (INFOVIS) 2005, eds. J.T. Stasko and M.O. Ward, IEEE Computer

Society, Minneapolis, MN, USA, 23-25 Oct., pp. 111.

[REF-13] Hand, D.J., Mannila, H. and Smyth, P. (2001) Principles of data mining, First

edition, MIT press.

[REF-14] Ronnie Beggs. Market Report Paper by Bloor. 2016-09-01.

[REF-15] Mike Gualtieri. Data Age 2025: The Forrester Wave™: Streaming Analytics, Q3

2017, Use This Technology To Make Your Enterprise Applications Sense,

Think, And Act In Real Time. 2017-09-01.

https://www.forrester.com/report/The+Forrester+Wave+Streaming+Analytic

s+Q3+2017/-/E-RES136545?objectid=RES136545#endnote1. Retrieved 2017-

11-01.

[REF-16] Alfonso Velosa, W. Roy Schulte, and Benoit J. Lheureux. Hype Cycle for the

Internet of Things, 2017. Gartner report # G00314298. 2017-07-24.

[REF-17] David Reinsel, John Gantz, and John Rydning. Data Age 2025: The Evolution of

Data to Life-Critical. That’s Big; IDC White paper. 2017-04-01.

https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-

DataAge2025-March-2017.pdf. Retrieved 2017-11-01.

[REF-18] Card, S. K., Mackinlay, J. D. & Schneidermann, B. 1999. Readings in

information visualization, Using Vision to Think. Academic Press Inc. 686 p.

ISBN 1-55860-533-9.

[REF-19] Thomas, J.J. and Cook, K.A. (2005) Illuminating the path: The research and

development agenda for visual analytics, 1st edn., IEEE Computer Society, Los

Alamitos, CA.

[REF-20] Norman, D. and Dunaeff, T. (1994) Things that make us smart: Defending

human attributes in the age of the machine, Basic Books, USA

[REF-21] Järvinen, P., (2013) Licentiate thesis. Aalto University, Department of

Information and Computer Science, 135 p. + app. 20 p

[REF-22] Shneiderman, B. (1996) "The eyes have it: A task by data type taxonomy for

information visualizations", Proceedings, IEEE Symposium on Visual

Languages, IEEE, September 3-6, pp. 336.

Page 74: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 74

[REF-23] Roberts, J.C. (2007) "State of the art: Coordinated and multiple views in

exploratory visualization", Fifth International Conference on Coordinated and

Multiple Views in Exploratory Visualization, CMV'07.IEEE, 2-2 July, pp. 61.

[REF-24] Press, G., Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data

Science Task, Survey Says. Forbes, March 23, 2016. Available online at

https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-

time-consuming-least-enjoyable-data-science-task-survey-says. Retrieved

2017-11-08.

[REF-25] NIST Special Publication 1500-6. NIST Big Data Interoperability Framework:

Volume 6, Reference Architecture, 2015. Available online at

https://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-6.pdf. Retrieved 2017-

11-08.

[REF-26] The Apache Hadoop Project. http://hadoop.apache.org, 2009. Retrieved

2017-11-08.

[REF-27] Zaharia, M., Chowdhury M., Franklin J. M., Shenker S., Stoica I. Spark: cluster

computing with working sets. In USENIX conference on Hot topics in cloud

computing, pages 10-10 (2010).

[REF-28] OGC 10-157r4, Earth Observation Metadata profile of Observations &

Measurements, Version 1.1, 09/06/2016,

http://docs.opengeospatial.org/is/10-157r4/10-157r4.html.

[REF-29] OGC Testbed 13 – ESA Sponsored Threads – Exploitation Platform, Technical

Architecture, December 09, 2016, PDGS-EVOL-CGI-TN-16/1570, Issue 1.0.

[REF-30] Karel Charvat Tomas Reznik, Vojtech Lukas, Sarka Horakova, Karel Charvat Jr,

Michal Kepka, Marek Splichal, Simon Leitgeb, Jan Shanel, Karel Jedlicka,

Jaroslav Smejkal Big Data in Agriculture – From FOODIE towards Data Bio

abstract for 7 ACPA conference

[REF-31] Tim Sparapani , How Big Data And Tech Will Improve Agriculture, From Farm

To Table, https://www.forbes.com/sites/timsparapani/2017/03/23/how-big-

data-and-tech-will-improve-agriculture-from-farm-to-table/#5c40ba075989.

[REF-32] Copernicus Market report prepared by PriceWaterhouseCoopers for the

European Commission. Available online:

http://www.copernicus.eu/sites/default/files/library/Copernicus_Market_Re

port_11_2016.pdf. Retrieved 2017-07-03.

Page 75: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 75

[REF-33] Kleinjan, J., Clay, D. E., Carlson, C. G., & Clay, S. A. (2007). Productivity zones

from multiple years of yield monitor data. In F. J. Pierce, & D. C. Clay, GIS

applications in agriculture. CRC Press, Boca Raton.

[REF-34] Blackmore, S., Godwin, R. J., & Fountas, S. (2003). The Analysis of Spatial and

Temporal Trends in Yield Map Data over Six Years. Biosystems Engineering

[REF-35] Suuronen, P., Chopin, F., Glass, C., Løkkeborg, S., Matsushita, Y., Queirolo, D.,

& Rihan, D. (2012). Low impact and fuel efficient fishing—looking beyond the

horizon. Fisheries Research, 119, 135-146.

[REF-36] Rojon, I., and Smith, T. 2014. On the attitudes and opportunities of fuel

consumption 512 monitoring and measurement within the shipping industry

and the identification and 513 validation of energy efficiency and

performance interventions. 18 pp.

[REF-37] Parker, R. W. & Tyedmers, P. H. (2014) Fuel consumption of global fishing

fleets: current understanding and knowledge gaps. Fish and Fisheries, 16(4),

684-696.

[REF-38] Fernandes, J. A., Santos, L., Vance, T., Fileman, T., Smith, D., Bishop, J. D., ... &

Austen, M. C. (2016). Costs and benefits to European shipping of ballast-

water and hull-fouling treatment: Impacts of native and non-indigenous

species. Marine Policy, 64, 148-155.

[REF-39] Klau, G. W., Lesh, N., Marks, J., & Mitzenmacher, M. (2010). Human-guided

search. Journal of Heuristics, 16(3), 289-310.

[REF-40] Ibarbia, I., Mendiburu, A., Santos, M., & Lozano, J. A. (2012). An interactive

optimization approach to a real-world oceanographic campaign planning

problem. Applied Intelligence, 36(3), 721-734.

[REF-41] Kang, M. H., Choi, H. R., Kim, H. S., & Park, B. J. (2012). Development of a

maritime transportation planning support system for car carriers based on

genetic algorithm. Applied Intelligence, 36(3), 585-604.

[REF-42] Palenzuela, J. M. T., Vilas, L. G., Spyrakos, E., Dominguez, L. R., & CETMAR, F.

(2010). Routing optimization using neural networks and oceanographic

models from remote sensing data. In Proceedings of the 1st International

Symposium on Fishing Vessel Energy Efficiency E-Fishing, Vigo, Spain.

[REF-43] Vettor, R., Tadros, M., Ventura, M., & Soares, C. G. (2016). Route planning of

a fishing vessel in coastal waters with fuel consumption restraint. Maritime

Technology and Engineering, 3, 167-173.

Page 76: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 76

[REF-44] Groba, C., Sartal, A., & Vázquez, X. H. (2015). Solving the dynamic traveling

salesman problem using a genetic algorithm with trajectory prediction: An

application to fish aggregating devices. Computers & Operations Research,

56, 22-32.

[REF-45] Walther, L., Rizvanolli, A., Wendebourg, M., & Jahn, C. (2016). Modeling and

Optimization Algorithms in Ship Weather Routing. International Journal of e-

Navigation and Maritime Economy, 4, 31-45.

[REF-46] Fernandes, J. A., Irigoien, X., Goikoetxea, N., Lozano, J. A., Inza, I., Pérez, A., &

Bode, A. (2010). Fish recruitment prediction, using robust supervised

classification methods. Ecological Modelling, 221(2), 338-352.

[REF-47] Fernandes, J. A., Lozano, J. A., Inza, I., Irigoien, X., Pérez, A., & Rodríguez, J. D.

(2013). Supervised pre-processing approaches in multiple class variables

classification for fish recruitment forecasting. Environmental modelling &

software, 40, 245-254.

[REF-48] Fernandes, J. A., Irigoien, X., Lozano, J. A., Inza, I., Goikoetxea, N., & Pérez, A.

(2015). Evaluating machine-learning techniques for recruitment forecasting of

seven North East Atlantic fish species. Ecological Informatics, 25, 35-42.

[REF-49] Trifonova, N., Kenny, A., Maxwell, D., Duplisea, D., Fernandes, J., & Tucker, A.

(2015). Spatio-temporal Bayesian network models with latent variables for

revealing trophic dynamics and functional networks in fisheries ecology.

Ecological Informatics, 30, 142-158.

[REF-50] Andonegi, E., Fernandes, J. A., Quincoces, I., Irigoien, X., Uriarte, A., Pérez, A.,

... & Stefánsson, G. (2011). The potential use of a Gadget model to predict

stock responses to climate change in combination with Bayesian networks:

the case of Bay of Biscay anchovy. ICES Journal of Marine Science, 68(6),

1257-1269.

[REF-51] Grosjean, Philippe & Picheral, Marc & Warembourg, Caroline & Gorsky,

Gabriel. (2004). Enumeration, measurement, and identification of net

zooplankton samples using the ZOOSCAN digital imaging system. Ices Journal

of Marine Science - ICES J MAR SCI. 61. 518-525.

10.1016/j.icesjms.2004.03.012.

[REF-52] Bachiller, E., Fernandes, J. A., & Irigoien, X. (2012). Improving semiautomated

zooplankton classification using an internal control and different imaging

devices. Limnology and Oceanography: Methods, 10(1), 1-9.

Page 77: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 77

[REF-53] Gislason, A., & Silva, T. (2009). Comparison between automated analysis of

zooplankton using ZooImage and traditional methodology. Journal of

Plankton Research, 31(12), 1505-1516.

[REF-54] Irigoien, X., Fernandes, J. A., Grosjean, P., Denis, K., Albaina, A., & Santos, M.

(2009). Spring zooplankton distribution in the Bay of Biscay from 1998 to

2006 in relation with anchovy recruitment. Journal of plankton research,

31(1), 1-17.

[REF-55] Di Mauro, R., Cepeda, G., Capitanio, F., & Viñas, M. D. (2011). Using ZooImage

automated system for the estimation of biovolume of copepods from the

northern Argentine Sea. Journal of sea research, 66(2), 69-75.

[REF-56] Fernandes, J. A., Irigoien, X., Boyra, G., Lozano, J. A., & Inza, I. (2009).

Optimizing the number of classes in automated zooplankton classification.

Journal of Plankton Research, 31(1), 19-29.

[REF-57] Manríquez, K., Escribano, R., & Riquelme-Bugueño, R. (2012). Spatial

structure of the zooplankton community in the coastal upwelling system off

central-southern Chile in spring 2004 as assessed by automated image

analysis. Progress in oceanography, 92, 121-133.

[REF-58] Zarauz, L., Irigoien, X., & Fernandes, J. A. (2008). Changes in plankton size

structure and composition, during the generation of a phytoplankton bloom,

in the central Cantabrian sea. Journal of plankton research, 31(2), 193-207.

[REF-59] Ali, N., Wacquet, G., Didry, M., Hamad, D., Artigas, L. F., & Grosjean, P. (2014).

Utilisation conjointe de FlowCAM/ZooPhytoImage et de la cytométrie en flux.

Premiers résultats et perspectives. Action 9. FlowCam ZooPhytoImage.

Livrable n° 4. Rapport final, 23 Septembre 2014.

[REF-60] Uranga, J., Arrizabalaga, H., Boyra, G., Hernandez, M. C., Goñi, N., Arregui, I.,

... & Santiago, J. (2017). Detecting the presence-absence of bluefin tuna by

automated analysis of medium-range sonars on fishing vessels. PloS one,

12(2), e0171382.

[REF-61] Mayer, L., Li, Y., & Melvin, G. (2002). 3D visualization for pelagic fisheries

research and assessment. ICES Journal of Marine Science, 59(1), 216-225.

[REF-62] Gorska, N., Korneliussen, R. J., and Ona, E. 2007. Acoustic backscatter by

schools of adult Atlantic mackerel. – ICES Journal of Marine Science,64: 1145–

1151.

[REF-63] Korneliussen, Rolf J., Heggelund, Y., Macaulay, G.J., Patel, D., Johnsen, E. and

Eliassen, I.K. (2016). Acoustic identification of marine species using a feature

library. Methods in Oceanography 17: 187-205.

Page 78: D6.3 State of the Art - DATABIO Data-driven …...2017/12/29  · D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017 Dissemination level: PU -Public Page

D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017

Dissemination level: PU -Public Page 78

[REF-64] Foote, K.G, Knudsen, H.K. Korneliussen, R.J., Nordbø P.E. and Røang, K. (1991)

Postprocessing system for echosounder data. Journal of Acoustic Society of

America, Vol 90, No1, pp 37-47.

[REF-65] Hess, D. and Savitz, J. (2016) OCEANA Global Fishing Watch report. Available

from www.globalfishingwatch.org.

[REF-66] de Souza E.N., Boerder K., Matwin S., Worm, B. (2016) Improving Fishing

Pattern Detection from Satellite AIS Using Data Mining and Machine Learning.

PLoS ONE 11(7): e0158248. doi:10.1371/journal.pone.0158248

[REF-67] Lehr, Heiner. Electronic management and exchange of fishery information,

http://www.unescap.org/sites/default/files/03%20-

%20Electronic%20management%20and%20Exchange%20of%20Fishery%20Inf

ormation%20V151210a.pdf

[REF-68] http://www.aqua.dtu.dk/english/News/2014/03/140313_Fisheries_manage

ment_as_open_source

[REF-69] https://www.eugdpr.org

[REF-70] http://www.oecd.org/sti/futures/ the-ocean-economy-in-2030-

9789264251724-en.htm

[REF-71] https://spectrum.ieee.org/tech-talk/telecom/internet/popular-internet-of-

things-forecast-of-50-billion-devices-by-2020-is-outdated

[REF-72] https://cacm.acm.org/magazines/2017/1/211094-exponential-laws-of-

computing-growth/fulltext