d6.3 state of the art - databio data-driven …...2017/12/29 · d6.3 – state of the art h2020...
TRANSCRIPT
This document is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732064. It is the property of the DataBio consortium and shall not be distributed or reproduced without the formal approval of the DataBio Management Committee.
Project Acronym: DataBio
Grant Agreement number: 732064 (H2020-ICT-2016-1 – Innovation Action)
Project Full Title: Data-Driven Bioeconomy
Project Coordinator: INTRASOFT International
DELIVERABLE
D6.3 – State of the Art
Dissemination level PU -Public
Type of Document Report
Contractual date of delivery M12 – 31/12/2017
Deliverable Leader VTT
Status - version, date Final – v1.0, 29/12/2017
WP / Task responsible WP6
Keywords: Big data, data analytics, bioeconomy, agriculture,
forestry, fishery, earth observation
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 2
Executive Summary Big data technologies have shown significant benefits in many sectors of society, as diverse
as manufacturing, business management and health science. This report looks at the state of
the art of big data technologies and their application in bioeconomy, i.e. the parts of the
economy that use renewable biological resources from land and sea – such as crops, forests,
fish, animals and micro-organisms – to produce food, materials and energy. The DataBio
project in particular, addresses agriculture, forestry and fishery, where it aims to advance the
use of big data technologies by implementing several pilot demonstrations.
The purpose of the document is to provide an overview for the general public and non-expert
readers of recent developments in big data and highlight opportunities of how it could serve
the bioeconomy sector in the near future. The document is structured as follows:
Chapter 3 of the document includes an overview of general big data challenges and
opportunities. Chapter 3.1 introduces the concept of big data in general and chapter 3.2
introduces the use of big data in the bioeconomy sector. Big data management, analysis and
visualisation are discussed in chapters 3.3, 3.4 and 3.5 respectively. Finally, chapter 3.6
introduces big data frameworks and infrastructures.
Chapters 4, 5 and 6 go into more detail covering big data in agriculture, forestry and fishery
from the perspectives of the DataBio pilots.
Due to the different backgrounds and target applications in each pilot application, these
chapters present the state of the art from slightly different perspectives.
Chapter 7 concludes with an outlook to future opportunities in big data technologies.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 3
Deliverable Leader: Göran Granholm, VTT
Contributors:
Karel Charvat, Lespro
Seppo Huurinainen, MHGS
Per Gunnar Auran, SINTEF Fishery
Juliusz Pukacki, PSNC
Caj Södergård, VTT
Renne Tergujeff, VTT
Javier Hitado Simarro, ATOS
Miguel Angel, ATOS
Fabiana Fournier, IBM
Reviewers: Nikos Marianos, NP
Irene Matzakou, INTRASOFT
Approved by: Athanasios Poulakidas, INTRASOFT
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 4
Table of Contents EXECUTIVE SUMMARY ....................................................................................................................................... 2
TABLE OF CONTENTS .......................................................................................................................................... 4
TABLE OF FIGURES ............................................................................................................................................. 5
LIST OF TABLES ................................................................................................................................................... 6
DEFINITIONS, ACRONYMS AND ABBREVIATIONS .............................................................................................. 7
1 INTRODUCTION ...................................................................................................................................... 10
1.1 PROJECT SUMMARY ................................................................................................................................. 10 1.2 DOCUMENT SCOPE .................................................................................................................................. 13 1.3 DOCUMENT STRUCTURE ........................................................................................................................... 13
2 BACKGROUND AND OBJECTIVES ............................................................................................................ 14
3 BIG DATA OVERVIEW - STATUS, CHALLENGES AND OPPORTUNITIES ..................................................... 15
3.1 INTRODUCTION TO BIG DATA ...................................................................................................................... 15 3.2 BIG DATA IN BIOECONOMY......................................................................................................................... 18 3.3 BIG DATA MANAGEMENT ........................................................................................................................... 20
3.3.1 Earth Observation data services ................................................................................................ 22 3.4 BIG DATA ANALYTICS ................................................................................................................................ 23 3.5 BIG DATA VISUALISATION AND USER INTERACTION ........................................................................................... 27
3.5.1 Sensor data ............................................................................................................................... 30 3.5.2 Earth Observation data ............................................................................................................. 32
3.6 BIG DATA FRAMEWORKS ........................................................................................................................... 33
4 BIG DATA IN AGRICULTURE .................................................................................................................... 37
4.1 INTRODUCTION ....................................................................................................................................... 37 4.2 STATUS OF BIG DATA IN AGRICULTURE .......................................................................................................... 37 4.3 FUTURE DEVELOPMENTS ........................................................................................................................... 39
5 BIG DATA IN FORESTRY........................................................................................................................... 41
5.1 INTRODUCTION ....................................................................................................................................... 41 5.1.1 Development/optimization focus in forestry .............................................................................. 41
5.2 BIG DATA APPLICATIONS IN FORESTRY - SCOPE, IMPACT AND BENEFIT OF DIGITAL FOREST MANAGEMENT ...................... 42 5.2.1 Forest Big Data platform ........................................................................................................... 42 5.2.2 Digiroad .................................................................................................................................... 43 5.2.3 Metsaan.fi e-Service .................................................................................................................. 44 5.2.4 Wuudis Service .......................................................................................................................... 46
5.3 FUTURE DEVELOPMENTS ........................................................................................................................... 49 5.3.1 Opportunities and possible big data solutions............................................................................ 49
6 BIG DATA IN FISHERY .............................................................................................................................. 55
6.1 INTRODUCTION ....................................................................................................................................... 55 6.1.1 Vessel monitoring systems and fisheries management .............................................................. 55 6.1.2 Optimization focus in fishery ..................................................................................................... 56 6.1.3 Machine learning applications in fishery .................................................................................... 57 6.1.4 Big Data information services in fishery ..................................................................................... 58
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 5
6.1.5 Open data providers relevant for fishery Big Data analytics ....................................................... 62 6.1.6 Fisheries and open source software ........................................................................................... 63
6.2 CONCLUSION .......................................................................................................................................... 65 6.2.1 Current impact of Big Data in fisheries ....................................................................................... 65
6.3 FUTURE DEVELOPMENTS ........................................................................................................................... 66 6.3.1 User needs and Big Data opportunities ...................................................................................... 66
7 THE FUTURE OF BIG DATA ...................................................................................................................... 68
8 CONCLUSIONS ........................................................................................................................................ 70
9 REFERENCES ............................................................................................................................................ 72
Table of Figures FIGURE 1. THE EXPONENTIAL GROWTH OF DATA [REF-01]. ............................................................................................ 15 FIGURE 2. KEY CHARACTERISTICS OF BIG DATA (BASED [REF-02])..................................................................................... 16 FIGURE 3. THE DATA-INFORMATION-KNOWLEDGE-WISDOM HIERARCHY OF ACKOFF [REF-03]. ............................................. 17 FIGURE 4. BIG DATA MANAGEMENT PROCESS [REF-08] ................................................................................................. 21 FIGURE 5. BAR CHART (A) AND PIE CHART (B)............................................................................................................... 28 FIGURE 6. HISTOGRAMS (A) AND LINE GRAPH (B). ......................................................................................................... 29 FIGURE 7. SCATTERPLOT. ........................................................................................................................................ 29 FIGURE 8. PCA (A) AND PARALLEL COORDINATES VISUALIZATION (B). ................................................................................ 30 FIGURE 9. PCA (A) AND PARALLEL COORDINATES VISUALIZATION (B). ................................................................................ 30 FIGURE 10. VISUALISATION OF SENSOR DATA. .............................................................................................................. 32 FIGURE 11. BRUSHING. THE ROUNDED AREA IS HIGHLIGHTED IN THE HISTOGRAM AND ON THE MAP. ........................................ 33 FIGURE 12. BDVA REFERENCE ARCHITECTURE WITH NUMBERS OF DATABIO COMPONENTS. .................................................. 34 FIGURE 13. NIST BIG DATA REFERENCE ARCHITECTURE................................................................................................. 35 FIGURE 14. YIELD POTENTIAL APPLICATION.................................................................................................................. 38 FIGURE 15. DATA COLLECTED BY FOREST MACHINES HELP TO EVALUATE HARVESTING CONDITIONS, FOR EXAMPLE. PHOTO: ERKKI
OKSANEN. .................................................................................................................................................. 42 FIGURE 16. FOREST BIG DATA PLATFORM WITH FOREST BIG DATA AND APPLICATION COMPONENTS
(HTTP://WWW.DATATOINTELLIGENCE.FI/FOREST-BIG-DATA.HTML). ........................................................................ 43 FIGURE 17. METSÄÄN.FI SERVICE WITH RELATED OPERATIONS AND USER GROUPS. ............................................................... 45 FIGURE 18. ENTITY OF FOREST DATA DEVELOPMENT IN METSÄÄN.FI SERVICE. SPECIFIC FOCUS ON IMPROVEMENT OF DATA MOBILITY
AND DATA QUALITY, AND E-SERVICE PROMOTION. (METSÄTIETO 2020 - KEHITTÄMISSUUNNITELMA). ............................. 46 FIGURE 19. TRAGSA DRONES USED IN FORESTRY PILOT. ............................................................................................... 47 FIGURE 20. GENERATED PRODUCTS (IMAGERIES) IN FORESTRY PILOT. ................................................................................ 47 FIGURE 21. GENERATED INDEXES (IMAGES) IN FORESTRY PILOT. ....................................................................................... 48 FIGURE 22. FOREST VALUE CHAIN AND THE EXPECTED BENEFITS OF ‘WUUDIS DATA’ TO ALL SEGMENTS OF THE VALUE CHAIN. ......... 50 FIGURE 23. CONCEPT OF WUUDIS DATA. ................................................................................................................... 51 FIGURE 24. THE CONCEPT OF NEW SENOP HYPERSPECTRAL CAMERA, RELEASED ÍN 2018. ...................................................... 52 FIGURE 25. EXAMPLE OF CLOUD-FREE REFLECTANCE IMAGE OF THE FORESTS OF CZECH REPUBLIC GENERATED USING BIG DATA SPATIAL-
TEMPORAL ANALYSIS UTILIZING ALL-AVAILABLE SENTINEL-2 OBSERVATIONS BETWEEN JUNE AND AUGUST 2016................. 53 FIGURE 26. EXAMPLE OF SATELLITE-DERIVED PRODUCT DESCRIBING FOREST HEALTH STATUS - AMOUNT OF CHLOROPHYLLS IN FOREST
CANOPIES. RED AREAS ARE IDENTIFIED AS FORESTS WITH LOW CHLOROPHYLL CONTENT. CLOUD-FREE IMAGE MOSAIC GENERATED
ABOVE SENTINEL-2 BIG DATA WAS USED AS AN INPUT IN THE ALGORITHM. ................................................................. 54 FIGURE 27. ILLUSTRATION OF VMS (FROM EC COMMISSION, FISHERIES POLICY – CONTROL TECHNOLOGIES). ............................. 56 FIGURE 28. MARINETRAFFIC INFORMATION PORTAL SHOWING VESSEL TRAFFIC IN NORTHERN EUROPE BASED ON AIS DATA (FROM
WWW.MARINETRAFFIC.COM). ........................................................................................................................ 59
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 6
FIGURE 29. NORWEGIAN SEA FISHING ACTIVITY ACCORDING TO GLOBAL FISHING WATCH (JUL/AUG 2017). ............................. 60 FIGURE 30. INFORMATION SERVICES IN BARENTSWATCH (BARENTSWATCH.NO, ACCESSED 29/11.2017). ................................ 61 FIGURE 31. THE FISHINFO SERVICE - EXAMPLE SHOWING FISHING ACTIVITY WITH NETS (BLUE), LINES(RED) AND PURSE SEINERS (PURPLE)
AS WELL AS RESTRICTED (BLACK POLYGONS) AND CLOSED (FILLED POLYGONS) FISHING AREAS (FROM THE FISKINFO.NO WEBSITE).
................................................................................................................................................................ 61 FIGURE 32. THE FLUX STANDARDS AND STATUS (FROM UN ESCAP PRESENTATION OF DR HEINER LEHR) [REF-37]. .................. 64 FIGURE 33. SUMMARY AND CONTEXT OF THE FISHERY PILOTS IN DATABIO. ........................................................................ 67 FIGURE 34. ELECTRICITY CONSUMPTION: COUNTRIES COMPARED TO IT SECTOR. .................................................................. 69
List of Tables TABLE 1:THE DATABIO CONSORTIUM PARTNERS........................................................................................................... 10 TABLE 2. OPEN DATA PROVIDERS RELEVANT FOR FISHERIES. ............................................................................................. 62
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 7
Definitions, Acronyms and Abbreviations Acronym/
Abbreviation Title
AEF Agricultural Industry Electronics Foundation
ALS Airborne Laser Scanning
BDT Big data technology
BDVA European Big Data Value Association
CEMA European Agricultural Machinery organisation
CEN Comité Européen de Normalisation
CEOS Committee on Earth Observation Satellites
CRM Customer Relationship Management
D2I Data to Intelligence
EO Earth Observation
EP Exploration platform
ERS Electronic Reporting System
ESA European Space Agency
EUMOFA European Market Observatory for Fisheries and Aquaculture
FLUX Fisheries Language for Universal eXchange
FOCUS Fisheries Open source CommUnity Software
GLONASS Russian navigation satellite system
GNSS Global Navigation Satellite System
GPS Global Positioning System
HTTP Hyper Text Transfer Protocol
IaaS Infrastructure as a service
ICES International Council for the Exploration of the Sea
IOPS Input/output operations per second
IoT Internet of Things
ISO International Standardisation Organisation
Mha One million hectares, 10 000 km2
NAS Network-attached Storage
NIR Near infra-red
NIST National Institute of Standards and Technology
NOAA National Oceanic and Atmospheric Administration
OECD Organisation for Economic Co-operation and Development
OGC Open Geospatial Consortium
PCI-e PCI Express (Peripheral Component Interconnect Express)
PF Precision farming
PLIS Land Parcel Information System
RFID Radio Frequency Identification
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 8
RTK Real Time Kinematic
SAM State-space Assessment Model
TEP Thematic Exploration Platform of ESA
TLS Terrestrial Laser Scanning
USL Uniform Resource Locator
VMS Vessel Monitoring System
VTS Vessel Traffic System
WMS Web Map Service
ZB Zetta byte, 1021bytes, 1billion terabytes
UAV Unmanned Aerial Vehicle
NDVI Normalized difference vegetation index
CARI Chlorophyll Absorption Ratio Index
GNDVI Green Normalized Difference Vegetation Index
NGRVI Normalized Difference Green/Red Normalized green red difference index
JRC Joint Research Centre
Term Definition
Cassandra Apache Cassandra is a free and open-source distributed NoSQL database
management system designed to handle large amounts of data across many
commodity servers, providing high availability with no single point of failure
EModnet The European Marine Observation and Data Network
EU GDPR General Data Protection Regulation
Excel A spreadsheet developed by Microsoft. It features calculation, graphing tools,
pivot tables, and a macro programming language
Hadoop Apache Hadoop, an open-source software framework used for distributed
storage and processing of dataset of big data using the MapReduce
programming model
Landsat The Landsat program is the longest-running enterprise for acquisition of
satellite imagery of Earth.
MapReduce MapReduce is a programming model and an associated implementation for
processing and generating big data sets with a parallel, distributed algorithm
on a cluster
R Open source programming language and software environment for statistical
computing and graphics that is supported by the R Foundation for Statistical
Computing
SAS software suite developed by SAS Institute for advanced analytics, multivariate
analyses, business intelligence, data management, and predictive analytics
Sentinel-2 Sentinel-2 is an Earth observation mission developed by ESA as part of the
Copernicus Programme to perform terrestrial observations in support of
services such as forest monitoring, land cover changes detection, and natural
disaster management
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 9
SPSS Statistics is a software package used for logical batched and non-batched
statistical analysis.
Statgraphics Statgraphics is a statistics package that performs and explains basic and
advanced statistical functions.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 10
1 Introduction 1.1 Project Summary The data intensive target sector selected for the
DataBio project is the Data-Driven Bioeconomy.
DataBio focuses on utilizing Big Data to
contribute to the production of the best possible
raw materials from agriculture, forestry and
fishery/aquaculture for the bioeconomy
industry, in order to output food, energy and
biomaterials, also taking into account various
responsibility and sustainability issues.
DataBio will deploy state-of-the-art big data technologies and existing partners’ infrastructure
and solutions, linked together through the DataBio Platform. These will aggregate Big Data
from the three identified sectors (agriculture, forestry and fishery), intelligently process them
and allow the three sectors to selectively utilize numerous platform components, according
to their requirements. The execution will be through continuous cooperation of end user and
technology provider companies, bioeconomy and technology research institutes, and
stakeholders from the big data value PPP programme.
DataBio is driven by the development, use and evaluation of a large number of pilots in the 3
identified sectors, where also associated partners and additional stakeholders are involved.
The selected pilot concepts will be transformed to pilot implementations utilizing co-
innovative methods and tools. The pilots select and utilize the best suitable market ready or
almost market ready ICT, Big Data and Earth Observation methods, technologies, tools and
services to be integrated to the common DataBio Platform.
Based on the pilot results and the new DataBio Platform, new solutions and new business
opportunities are expected to emerge. DataBio will organize a series of trainings and
hackathons to support its take-up and to enable developers outside the consortium to design
and develop new tools, services and applications based on and for the DataBio Platform.
The DataBio consortium is listed in Table 1. For more information about the project see
www.databio.eu.
Table 1:The DataBio consortium partners
Number Name Short name Country
1 (CO) INTRASOFT INTERNATIONAL SA INTRASOFT Belgium
2 LESPROJEKT SLUZBY SRO LESPRO Czech Republic
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 11
3 ZAPADOCESKA UNIVERZITA V PLZNI UWB Czech Republic
4
FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER
ANGEWANDTEN FORSCHUNG E.V. Fraunhofer Germany
5 ATOS SPAIN SA ATOS Spain
6 STIFTELSEN SINTEF SINTEF ICT Norway
7 SPACEBEL SA SPACEBEL Belgium
8
VLAAMSE INSTELLING VOOR TECHNOLOGISCH
ONDERZOEK N.V. VITO Belgium
9
INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ
AKADEMII NAUK PSNC Poland
10 CIAOTECH Srl CiaoT Italy
11 EMPRESA DE TRANSFORMACION AGRARIA SA TRAGSA Spain
12 INSTITUT FUR ANGEWANDTE INFORMATIK (INFAI) EV INFAI Germany
13 NEUROPUBLIC AE PLIROFORIKIS & EPIKOINONION NP Greece
14
Ústav pro hospodářskou úpravu lesů Brandýs nad
Labem UHUL FMI Czech Republic
15 INNOVATION ENGINEERING SRL InnoE Italy
16 Teknologian tutkimuskeskus VTT Oy VTT Finland
17 SINTEF FISKERI OG HAVBRUK AS
SINTEF
Fishery Norway
18 SUOMEN METSAKESKUS-FINLANDS SKOGSCENTRAL METSAK Finland
19 IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD IBM Israel
20 MHG SYSTEMS OY - MHGS MHGS Finland
21 NB ADVIES BV NB Advies Netherlands
22
CONSIGLIO PER LA RICERCA IN AGRICOLTURA E
L'ANALISI DELL'ECONOMIA AGRARIA CREA Italy
23 FUNDACION AZTI - AZTI FUNDAZIOA AZTI Spain
24 KINGS BAY AS KingsBay Norway
25 EROS AS Eros Norway
26 ERVIK & SAEVIK AS ESAS Norway
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 12
27 LIEGRUPPEN FISKERI AS LiegFi Norway
28 E-GEOS SPA e-geos Italy
29 DANMARKS TEKNISKE UNIVERSITET DTU Denmark
30 FEDERUNACOMA SRL UNIPERSONALE Federu Italy
31
CSEM CENTRE SUISSE D'ELECTRONIQUE ET DE
MICROTECHNIQUE SA - RECHERCHE ET
DEVELOPPEMENT CSEM Switzerland
32 UNIVERSITAET ST. GALLEN UStG Switzerland
33 NORGES SILDESALGSLAG SA Sildes Norway
34 EXUS SOFTWARE LTD EXUS
United
Kingdom
35 CYBERNETICA AS CYBER Estonia
36
GAIA EPICHEIREIN ANONYMI ETAIREIA PSIFIAKON
YPIRESION GAIA Greece
37 SOFTEAM Softeam France
38
FUNDACION CITOLIVA, CENTRO DE INNOVACION Y
TECNOLOGIA DEL OLIVAR Y DEL ACEITE CITOLIVA Spain
39 TERRASIGNA SRL TerraS Romania
40
ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS
ANAPTYXIS CERTH Greece
41
METEOROLOGICAL AND ENVIRONMENTAL EARTH
OBSERVATION SRL MEEO Italy
42 ECHEBASTAR FLEET SOCIEDAD LIMITADA ECHEBF Spain
43 NOVAMONT SPA Novam Italy
44 SENOP OY Senop Finland
45
UNIVERSIDAD DEL PAIS VASCO/ EUSKAL HERRIKO
UNIBERTSITATEA EHU/UPV Spain
46
OPEN GEOSPATIAL CONSORTIUM (EUROPE) LIMITED
LBG OGCE
United
Kingdom
47 ZETOR TRACTORS AS ZETOR Czech Republic
48
COOPERATIVA AGRICOLA CESENATE SOCIETA
COOPERATIVA AGRICOLA CAC Italy
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 13
1.2 Document Scope The purpose of this document is to give an overview of the state of the art of big data
technology in bioeconomy sector. It especially targets actors and stakeholders within
agriculture, forestry and fishery, including end-users and other operators who may not have
in-depth ICT knowledge or expertise in big data technologies.
1.3 Document Structure
This document is comprised of the following chapters:
Chapter 1 presents an introduction to the project and the document.
Chapter 2 presents motivation, background and objectives regarding the use of big data in
bioeconomy.
Chapter 3 gives on overview of big data.
Chapter 4 describes the status of big data in agriculture.
Chapter 5 describes the status of big data in forestry.
Chapter 6 describes the status of big data in fishery.
Chapter 7 looks into some future developments in big data technologies relevant to
bioeconomy.
Chapter 8 provides the conclusions based on the previous chapters.
Chapter 9 lists the references used in the document.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 14
2 Background and objectives The “Data-Driven Bioeconomy” (DataBio) project as a Large Scale Pilot (LSP) action address
domains of strategic importance for EU industry because the share of bioeconomy is
remarkably large in the national economy in EU countries. The European bioeconomy is
already worth more than €2 trillion annually and employs over 22 million people, often in
rural or coastal areas and in Small and Medium Sized Enterprises (SMEs).
The sectorial demonstrations are large scale efforts of the Data-Driven Bioeconomy
ecosystem containing not only the project partners but also a large number of other
cooperation parties creating different supply chains and value chains for the pilots.
In the demonstrations the following three sectors of bioeconomy are covered: 1. Forestry, 2.
Agriculture and 3. Fishery. The results can be replicated because standard technologies and
best practice solutions are used on different domain independent system levels:
1. Data gathering/data sets,
2. Platforms and interfaces,
3. Big Data tools and services.
In this LSP the sectorial demonstrations are large and the European coverage wide but not
complete. However, the international networks and the activities of the partners and of other
sectorial cooperation organizations participating actively in Big Data demonstrations make it
easy to transfer the results across the EU. The big data technologies, platforms and data
source interfaces of the project are domain agnostic generic solution. That is why the results
can be easily utilized in other contexts.
The objective of this state-of-the-art document is to support the transfer of knowledge across
various domains by providing an overview of current implementations and future outlook of
big data technologies in three key sectors of bioeconomy: agriculture, forestry and fishery.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 15
3 Big Data overview - status, challenges and
opportunities This chapter gives an overview of current developments in generation, management and use
of big data, and how the challenges of volume, velocity, variety and veracity are being tackled.
3.1 Introduction to big data Big data refers to large and complex data sets, which are challenging for normal computer
hardware and software to handle. Therefore, a range of Big Data Technologies is needed for
capturing, managing, processing, analysing, visualising and communicating the data. Gartner
coined three basic dimensions of Big Data - the three V’s: Volume, Velocity and Variety. Often,
a fourth V, Veracity and a fifth, Value, are added that underlines the aim of the computations:
to extract value from data.
The volume of data is growing exponentially doubling every 12 months (source. Data Alliance,
2015), enabled by numerous low-price IoT devices, like mobile phones, aerial and satellite
images, cameras, temperature and humidity sensors. In addition, a lot of data is created as a
by-product (=footprint) of digital interaction. Typically, the data volumes range between
terabytes to many petabytes (=1015 bytes). This means at the same time that the large data
sets have low information density. Large data sets relevant for DataBio are primarily satellite
images.
Figure 1. The exponential growth of data [REF-01].
The velocity comes from the need for real-time or near-to-real-time response and delivery of
datasets and data streams. This goes for IoT data coming from cameras and other sensors as
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 16
well as media streams and simulation data in digital models. Typical cases for DataBio are
sensor data coming from fishing ship engines or from tractors driving on the fields.
Variety arises from the need to process texts, images, audio, video as well as fusioned data
sources. Relevant cases for DataBio are again series of satellite, aerial and drone images.
As the number of data sources increase, the importance of “volume” as the key characteristic
of big data is diminishing. In a survey on big data adoption targeting leading industry a
majority of respondents saw variety as the main characteristic [REF-02]. This perception also
seems to grow over time (Figure 2).
-
Figure 2. Key characteristics of big data (based [REF-02]).
For every discussion about knowledge or information management, it is important to
understand basic terms such as data, information and knowledge. For a better explanation,
we will use Ackoff’s Data-Information-Knowledge-Wisdom hierarchy [REF-03] (Figure 3):
• Data: as symbols;
• Information: as data that are processed to be useful; provides answers to "who",
"what", "where", and "when" questions;
• Knowledge: as application of data and information; answers "how" questions;
• Wisdom: as evaluated understanding.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 17
Figure 3. The Data-Information-Knowledge-Wisdom hierarchy of Ackoff [REF-03].
A further elaboration of Ackoff's [REF-03] definitions follows:
1. Data... data is raw. It simply exists and has no significance beyond its existence (in and
of itself). It can exist in any form, usable or not. It does not have meaning of itself. In
computer parlance, a spreadsheet generally starts out by holding data
2. Information... information is data that has been given meaning by way of relational
connection. This "meaning" can be useful, but does not have to be. In computer
parlance, a relational database makes information from the data stored within it.
3. Knowledge... knowledge is the appropriate collection of information, such that its
intent is to be useful. Knowledge is a deterministic process.
4. Wisdom... wisdom is an extrapolative and non-deterministic, non-probabilistic
process. It calls upon all the previous levels of consciousness, and specifically upon
special types of human programming (moral, ethical codes, etc.). It beckons to give us
understanding about which there has previously been no understanding, and in doing
so, goes far beyond understanding itself. It is the essence of philosophical probing
From the management point of view, we have defined three levels of management:
• Data Management;
• Information Management;
• Knowledge management.
Data management includes:
• Data governance;
• Data Architecture, Analysis and Design;
• Database Management;
• Data Security Management;
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 18
• Data Quality Management;
• Metadata Management;
• Document, Record and Content Management;
• Reference and Master Data Management.
Information management includes:
• Records management;
• Can be stored, catalogued, organized;
• Align with corporate goals and strategies;
• Set up a database;
• Utilize for day-to-day optimum decision-making;
• Aims for efficiency;
• Data with a purpose.
Knowledge management includes:
• A framework for designing an organization’s goals, structures, and processes to add
value;
• Collect, disseminate, utilize information;
• Align with corporate goals and strategies;
• Focus on cultivating, sharing, and strategizing;
• Connecting people to gain a competitive advantage;
• Information with a purpose [REF-04].
3.2 Big data in bioeconomy In line with the European Commission, we define bioeconomy as comprising those parts of
the economy that use renewable biological resources from land and sea – such as crops,
forests, fish, animals and micro-organisms – to produce food, materials and energy [REF-05].
In particular, the DataBio project focuses on key sectors: agriculture, forestry and fishery.
Through history, these traditional sectors have gone through phases of continuous and
sometimes disruptive development that have affected the whole value chain from producer
to consumer and end user. The advent of information technology and the still ongoing
digitalisation of industry, and society in general, has been the most significant development
since the industrial revolution. The exponential growth of data through new data sources and
advanced analytics may help coping with the challenges of increasing productivity in a
sustainable way. Due to the nature and complexity of the bioeconomy sectors, this will
require a variety of different data sources ranging from large-scale earth observation to fine-
scale sensor input. A key challenge is to process this data to generate new knowledge and
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 19
deliver these insights as meaningful information, forecasts and recommendations to users in
an accessible way.
In agriculture, big data technology (BDT) is implemented under the banner of precision
farming (PF) [REF-06]. BDT builds on geo-coded maps of agricultural fields and the real-time
monitoring of activities on the farm in order to increase the efficiency of resource use, reduce
the uncertainty of management decisions [REF-07]. Under PF, yield is increased due
particularly to the precise selection and application of exact types and doses of agricultural
inputs (crop varieties, fertilizers, pesticides, herbicides, irrigation water) for optimum crop
growth and development.
In terms of Technology Readiness Level (TRL), the current implementations in agriculture are
mostly positioned at the 6th and 7th TRL. Improved technologies such as new elite varieties
were developed, big data such as weather, soil, crop (phenotypic data), and other
environmental data are routinely collected and meta-analysed, and technological and
managerial services are already offered to farmers in a few nations for a number of crops
although not to a big data analytics technology level. There also exist experiences with farm
telemetry or utilization of satellite data (Earth Observation) in some countries. In addition,
the required skills are available to the organizations participating in the pilots, and there is a
good level of readiness of organizations to change their internal and external business
processes, which is a key factor for adopting the new technology.
It is envisioned, that big data analytics system will provide pilot managers with highly localized
descriptive (better and more advanced way of looking at an operation), prescriptive (timely
recommendations for operation improvement i.e., seed, fertilizer and other agricultural
inputs application rates, soil analysis, and localized weather and disease/pest reports, based
on real-time and historical data) and predictive plans (use current and historical data sets to
forecast future localized events and returns). Tracking the machinery fleet which allows
localization of farm vehicles in real time.
In most European countries, traditional methods for forest management are based on
“static” management plans, created at the planting stage and reviewed every 10 years. In
recent years, these management plans have become a declaration of intentions, including
objectives for multifunctional forests (non-wood products and services). However, these
plans often lack effective implementation and monitoring methods that allow forest owners,
managers and regulators to validate the progress in achieving the target objectives set out in
the management plan.
Big data methods bring the possibility to both increase the value of the forests as well as to
decrease the costs within sustainability limits set by natural growth and ecological aspects.
The key technology is to gather more and more accurate information about the trees from a
host of sensors including new generation of satellites, UAV images, laser scanning, mobile
devices through crowdsourcing and machines operating in the forests. This enables a
characterization of even single trees. Once accurate forest information has been gathered,
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 20
the following step is to employ tools that mobile and cloud technology recently have made
available and deposit the measured data onto digital platforms that can be accessed by a
variety of user devices. The precise databases enable a sustainable growth of timber
extraction, an optimized use of the tree raw material and a higher long-range growth of the
biomass by precise support actions. At the same time, the costs for management, labour and
timber transport can be significantly reduced, which gives gains also in the short run. There
will be a variety of new services e.g. relating to timber sales, working and transport
assignments, that create economic growth.
In the fisheries sector, companies are ramping up their digitalisation efforts to start
harvesting the benefits from applying Big Data technology to optimize their business.
Although large efforts have been done to make scientific marine data sets available, e.g.
EMODnet, NOAA, Copernicus, the main problem is that much of the industrial data in fisheries
is either not recorded (for example hydroacoustics, operational data, energy consumption) or
is business sensitive and therefore not shared or openly available (detailed catch and price
data). Vessel equippers as well as manufacturers of fish finding and catch equipment are
increasing the capacity and functionality in their systems to store and collate data. As a result,
major industrial players are focusing on building their own Big Data platforms to gain a
business edge for their products through more advanced analytics and services, rather than
opening up their systems and sharing data. In contrast to this, there is a lot of effort in
scientific communities related to fisheries to open up data sets and leverage machine learning
analytics to provide open services for the common good, with Global Fishing Watch being an
excellent global showcase. Chapter 6 gives an introduction to the state of art of Big Data in
fishery, covering topics like vessel monitoring systems, optimization focus and machine
learning in fishery, relevant information services and data providers and open source
software and initiatives relevant for Big Data in fishery. A conclusion of the current state of
Big Data in fisheries is summarized at the end of the chapter before outlining some near future
development opportunities.
3.3 Big data management The characteristics defining big data, volume, velocity and variability, put high demands on
the process whereby this data is managed. Big data management is a discipline, where data
management techniques, tools and platforms including storage, pre-processing, processing
and security can be applied [REF-08]. The process flow is illustrated in Figure 4.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 21
Secu
rity
Big Data Sources
Decision Making
Classification
Processing
Pre-processing
Storage Management
Network Management
Figure 4. Big data management process [REF-08]
Big Data technology introduced some additional requirements on hardware infrastructure
and influences process of designing and building of data centres.
From the point of view of hardware infrastructure most of the issues are related to the data
storage architecture to bring the computation to data, and to make processing more efficient.
The key requirement of big data storage is to handle very large amounts of data and to keep
scaling with data growth. What is also important is to provide the input/output operations
per second (IOPS) level necessary to deliver data to analytics tools and to avoid performance
degradation with increasing storage space. Keeping processing time constant and short, while
data volumes increase, along with meeting real-time demands–all in an affordable way –are
serious challenges. These in turn have a strong impact on the infrastructure for Big Data.
Traditional approaches with relational databases and server scale up and server scale out will
eventually reach their limits.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 22
Currently, organizations already possess enough storage in-house to support a Big Data
initiative. However, agencies may decide to invest in storage solutions that are optimized for
Big Data. While not necessary for all Big Data deployments, flash storage is especially
attractive due to its performance advantages and high availability.
Large users of Big Data — companies such as Google and Facebook — utilize hyperscale
computing environments, which are made up of commodity servers with direct-attached
storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to
reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered
network-attached storage (NAS).
Cloud storage is an option for disaster recovery and backups of on-premises Big Data
solutions. While the cloud is also available as a primary source of storage, many organizations
— especially large ones — find that the expense of constantly transporting data to the cloud
makes this option less cost-effective than on-premises storage.
In the context of data centres design principles, the electrical infrastructure is one of the
major concerns for handling big data. Big data has an indirect impact on data centre power
consumption. As the electrical infrastructure expands, the electrical power consumption
increases many-fold. Reliability of electrical infrastructure is also important while considering
data volume and its processing.
The other issue is the cooling system that needs to perform well and scale with increasing
load of the computer systems.
Big data repositories may be built by integration of data coming from different, geographically
distributed sources. Taking into account this fact, a very important element of data centre is
network infrastructure. Because traffic generated by automatic data streaming is much higher
than human-generated requests, high performance network connections based on fibre
channels are essential. Big data sources can send huge volumes of data to data centres, which
will increase inbound bandwidth requirements. Therefore, the data centre network
infrastructure must be prepared to support the volume and velocity of data. It will also
increase the bandwidth requirement of the network.
The last but not least component of data centre infrastructure is security. Big data is all about
data, so its security at the storage level is a critical challenge to overcome. The data has to be
secured because it can contain an organization's confidential information. Organizations are
working on different approaches to avoid security threats. Data centre security has to be
implemented at the network level, storage level and application level.
3.3.1 Earth Observation data services
A lot of effort has been spent during the last years for standardising EO data management.
The interfaces for which widely accepted standards exist and are deployed include:
• EO dataset/product metadata,
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 23
• EO dataset/product discovery,
• Online data access and
• Viewing
The current EO metadata standard supported by European and Canadian Space Agencies is
OGC 10-157r4 “Earth Observation Profile of Observations and Measurements (O&M)” [REF-
09]. Through the ESA FedEO endpoint (http://fedeo.esa.int), this type of metadata is available
from several backend systems, e.g. Sentinels Data Hub, ESA CDS (Copernicus CSCDA) and ESA
Virtual Archive-4.
Discovery of datasets and products is defined in CEOS OpenSearch Best Practice v1.1.2 [REF-
10]. These specifications can be used to allow for discovery of collections. Collection metadata
returned by the collection discovery service may be returned in various metadata formats.
Discovery of products is performed via a similar OpenSearch interface. The collection-specific
search responses are made available as defined in the CEOS Best Practices.
Current practice at many data providers Product Facilities is to make the products available
for online access via HTTP. The ESA Facilities LDS, OADS and others use this approach. The
dataset metadata and catalog search responses then include this download URL as part of the
search response and metadata description. The product search response typically includes a
(HTTP) download URL for the product.
EO dataset metadata or search responses typically contains a link to a view or quick look
image. This can be a static image or a reference to a View service implemented as an OGC
Web Map Service (WMS).
To allow efficient querying a large data repository, a so-called Map Reduce architecture
(invented by Google) is often used to split and distribute the queries across parallel processing
nodes (Map step), after which the results are gathered and delivered (Reduce step). Map
Reduce is implemented in an Apache open-source project called Hadoop. Apache Spark
expands Map Reduce by adding the ability to configure many operations. Apache Flink
(https://flink.apache.org) is a data processing system and an alternative to Hadoop’s
MapReduce component. It comes with its own runtime rather than building on top of
MapReduce. As such, it can work completely independently of the Hadoop ecosystem.
3.4 Big data analytics Data analysis has been studied intensively and numerous algorithms exists. It has applications
in different business, science, and social science domains. A wide range of tools and
commercial applications are available, some of which are highly competitive in markets, such
as Customer Relationship Management (CRM). There are also numerous statistics programs
and packages available, both for casual users and specialists (Excel, SAS, SPSS, R).
Some of the most common data analytics methods are introduced here. They cover methods
for data exploration, descriptive methods, predictive methods and methods for anomaly
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 24
detection. This introduction omits a large number of analytics methods, including analysing
for text and other unstructured data.
The purpose of data exploration is to gain a better understanding of the characteristics of
data [REF-11]. The central methods are summary statistics and visualizations. Summary
statistics are numbers that summarize properties of the data. Amar et al. [REF-12] have
classified the statistical methods as:
• computer-derived values; average, median, count, correlations,
• finding extrema; finding data cases having the highest and lowest value,
• determining range: finding a span of values of an attribute of data cases, and
• characterizing distributions: creating a distribution of a set of data cases with a
quantitative attribute, e.g. to understand “normality”
The goal of descriptive methods is to discover patterns and rules in data. The methods focus
on finding clusters, patterns and associations from data [REF-11]. Clustering looks for groups
of objects such that the objects in a group will be similar (or related) to one another and
different from (or unrelated to) the objects in other groups. The similarity of objects is defined
based on similarity (or distance) measures. E.g. market segmentation is an application of
clustering.
Pattern detection involves finding combinations of items that occur frequently in data.
Sequential pattern discovery finds rules that predict strong sequential dependencies among
different events. Association rule mining involves the prediction of occurrences of an item
based on occurrences of other items. It produces dependency rules such as “buyers of milk
and diapers are likely to buy beer”.
The purpose of predictive modelling is to build models that predict the value of one variable
from the known values of other variables [REF-13]. The predicted objects are predefined.
Regression and classification are two much used predictive methods.
Regression predicts a value of a continuous variable based on other variables using linear or
nonlinear models [REF-11]. Linear regression is easy to visualize, often shown as a line on a
scatterplot diagram. The area is studied extensively and has its origins in statistics. Application
examples include predicting stock markets, or wind speed as a function of temperature or
humidity.
Classification creates a model for a class attribute as a function of the values of other
attributes (training set). Unseen records are then assigned to the class. The accuracy of the
models is evaluated with a test set. Several techniques have been developed including
decision trees, Bayesian methods, rule-based classifiers and neural networks. Classification is
a much-used method and commercial applications are also available. Examples include
classification of credit card transactions as legitimate or fraudulent, classification of e-mails
as spam, or classification of news stories [REF-11].
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 25
Anomalies are observations whose characteristics differ significantly from the normal profile.
Methods of anomaly detection look for sets of data points that are considerably different
from the remainder of the data. The methods build a profile of “normal” behaviour and detect
significant deviations from it. The profile can be patterns or summary statistics for the overall
population. Types of anomaly detection schemes can be graphical-based, statistical-based,
distance-based or model-based. Credit card fraud detection, telecommunication fraud
detection, network intrusion detection and fault detection are examples of application areas
[REF-11].
Real time Analytics and Stream Processing relates to the velocity aspects of Big Data
applications. Streaming analytics enables businesses to respond appropriately and in real-
time to context-aware insights delivered from fast data [REF-14].
Forrester [REF-15] defines streaming analytics as: “Software that provides analytical
operators to orchestrate data flow, calculate analytics, and detect patterns on event data
from multiple, disparate live data sources to allow developers to build applications that sense,
think, and act in real time”. As pointed out in this same report, streaming analytics is about
finding and acting on insights from event data in real-time. It represents something that has
happened, whether it be physical or digital. It encompasses any data that enterprise
applications, mobile apps, websites, infrastructure, external feeds, and IoT devices emit.
Streaming analytics solutions identify patterns on these events in real-time. Insights
generated using streaming solutions are immediate but not valuable unless they are used to
take action.
Drivers to the adoption and expansion of streaming analytics:
• Internet of Things (IoT) growth and pervasiveness - Streaming analytics solutions are
particularly well suited to internet of things (IoT) applications because they are by
nature real-time and emit sensor data that can be analyzed in real time. As pointed
out by Gartner in [REF-16], “…much of the growth in streaming processing usage
during the next 10 years will come from the IoT”. Streaming analytics is the core
technology enabler for The Internet of Things [REF-14]. Characteristics of streaming
analytics are particularly suited to the processing of sensor data: the combination of
time-based and location-based data analysis in real-time over short time windows, the
ability to filter, aggregate and transform live data, and to do so across a range of
platforms from small edge appliances to distributed, fault-tolerant cloud clusters.
Sensor data volumes have already reached a level where streaming analytics is a
necessity, not an option.
• Need to meet the coming massive shortfall in storage capacity - In a recent report
[REF-17], IDC pointed out that, of the 160 ZB data which is forecasted to be generated
by 2025, about a quarter of it will be real-time data in nature (generated, processed
and instantly accessible) up from around 5 percent today, and most of that real-time
data (95 percent) will come from the world of IoT. Another interesting observation is
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 26
that only between 3 percent and 12 percent, depending on the source, will be able to
be stored, as data storage won’t be able to cope with this vast amount of data.
Therefore, the logical conclusion is that this data must be collected, processed and
analysed in-memory, in real-time, close to where the data is generated.
• Improves the quality of decision making by presenting information that could
otherwise be overlooked [REF-16].
• Enables smarter anomaly detection and faster responses to threats and opportunities
[REF-16].
• Helps shield business people from data overload by eliminating irrelevant information
and presenting only alerts and distilled versions of the most important information
[REF-16].
• Vendors are bringing out new products, many of them open source, to handle
established and emerging use cases [REF-16].
• Business is demanding analytical support for better situation awareness and faster,
more-precise decisions [REF-16].
As Gartner pointed out in [REF-16], event streaming processing technology is maturing rapidly
and will eventually be adopted by multiple departments within every large company. Some
of the most prominent markets and use cases are listed below.
Capital Markets remains a strong sector for those with an event processing heritage. Use
cases include automated, algorithmic trading and for real-time trade compliance and audit,
but also an increase in deployments for fraud detection and trading analytics as a service.
Preventative maintenance could well be the silver bullet for streaming analytics in IoT. The
value to the customer is clear, to reduce operational and equipment cost by minimizing
unplanned outages, and to reduce the requirement for expensive site and maintenance visits
that could be avoided. For example, an IoT predictive maintenance application may monitor
temperature and vibration data streamed from a conveyer belt. The streaming analytics
solution could detect a spike in either temperature or vibration to indicate a looming
shutdown. The solution could then push an alert to an operator or trigger an automatic
shutdown of the machine. In addition, if the cadence of the streaming data is interrupted,
that may also indicate a problem with the sensors on the machine.
Retail - Real-time inventory updates are helping to drive business processes for inventory and
pricing optimisation, and for optimisation of the supply chain, logistics and just-in-time
delivery. This is also a market where wearables and the consumer market is poised for even
greater growth.
Industrial automation combines streaming and predictive analytics to optimize
manufacturing processes and product quality. Streaming analytics enables statistical analysis
of the manufacturing process, with alerting and automated shutdown when quality levels are
breached.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 27
Smart Energy - Scenarios from real-time monitoring of smart meters, smart pricing models
for electricity, to real-time sensor monitoring of wind farms (which produce a vast volume of
sensor data and where streaming analytics can drive a significant increase in efficiency and
energy output).
Healthcare - Smart sensors will play a pivotal role in exploiting the potential hidden in this
market. For example, where an SMS message can only remind a patient to take a pill, a smart
sensor on a pill bottle can report continuously if a pill has been taken and when, even if the
storage temperature is not correct.
Summary
Streaming analytics will be adopted in tomorrow’s organizations in almost every domain. The
pervasiveness of IoT will drive both the necessity and exploitation of real-time analytics for
actionable decision making. The domains of Fishery, Agriculture, and Forestry studied in the
scope of DataBio have been enriched in the past years by applying sensors and therefore are
not an exception. This is a unique opportunity to demonstrate streaming-driven applications
in these emerging domains.
3.5 Big data visualisation and user interaction Information visualization can be defined as “The use of computer-supported, interactive,
visual representations of abstract data to amplify cognition” [REF-18]. The goal is to improve
understanding of the data with graphical presentations. The principle behind information
visualization is to utilize the powerful image processing capabilities of the human brain.
Visualizations increase the human cognitive resources. They extend the working memory,
reduce the search of information and enhance the recognition of patterns.
Data visualization may handle abstract, non-physical information using abstract but well
understood visualization structures like trees or graphs. It has applications with measurement
data, business information, document collections, web content and other big data assets that
cannot be understood without highlighting the important characters. Big data visualization
renders visible properties of the objects of interest and can be combined with interactive
information access techniques.
Interactive visualization process is vital part of big data analysis framework described in
Chapter 3.4. Complex and heterogeneous data from different sources and various types and
levels of quality need to be curated and transformed to suitable format. The data sources can
range from well-organized databases to continuous input data streams. The data is analysed
using mathematical, statistical and data mining algorithms and models. Visualizations
highlight the important features, including commonalities and anomalies, making it easy for
users to perceive new aspects of the data. Visualizations are optimized for efficient human
perception, taking into account the capabilities and limitations of the human visual system.
Interactivity in visualizations allows users to explore the data and achieve new knowledge and
insight.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 28
The Visual Analytics Agenda [REF-19] introduces three principles of data selecting a
visualization method, adapted from [REF-20]: appropriateness, naturalness and matching.
Appropriateness states that visual representations should provide neither more nor less
information than that needed for the task at hand. Naturalness calls for visual representations
that most closely match the information being presented; new visual metaphors are only
useful for representing information when they match the user’s cognitive model of the
information. The matching principle states that representations are most effective when they
match the task to be performed by the user.
Characteristics of some of the most common visualization types are shortly described here.
For more detailed information, see e.g. [REF-21]. Bar charts and pie charts (Figure 5) are the
basic methods used to visualize univariate ordinal data, i.e. data which consists of
observations that have natural, ordered categories on only a single attribute.
Figure 5. Bar chart (a) and pie chart (b)
Histograms are similar to bar charts, but are used to represent quantitative data (Figure 5 a).
The histogram defines a sequence of breaks and then counts the number of observations in
the bins formed by the breaks. Line graphs are used for displaying quantitative data as a
continuous function of a single variable (Figure 6 b). Common uses are showing frequency
distributions and time series.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 29
Figure 6. Histograms (a) and line graph (b).
The basic visual method for analysing bivariate data, i.e. data for two (usually related) data
variables, is the scatterplot (Figure 7). Scatterplots are a good means of finding correlations,
clusters and outliers between two attributes. A third dimension can also be added by using a
visual effect such as colour and size of plot (bubble charts), or animations (animated bubble
charts).
Figure 7. Scatterplot.
Often, real-world data is multidimensional, consisting of many data items or without a clear
hierarchy. Dimension reduction methods aims at projecting data into a low dimensional space
(1D-3D) while maintaining the correct relations between the nodes. There are several
methods with different optimization goals and complexities. One of the best known is
Principal Component Analysis (PCA, Figure 8a). It tries to find a linear subspace that has
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 30
maximal variance. In parallel coordinates visualization each of the dimensions corresponds to
a vertical axis and each data element is displayed as a series of connected points along the
dimensions/axes (Figure 8 b).
Figure 8. PCA (a) and parallel coordinates visualization (b).
If there are more than two variables in the dataset, correlation matrixes or correlation
networks can be used to show pairwise correlations for all variable combinations (Figure 9).
Figure 9. PCA (a) and parallel coordinates visualization (b).
3.5.1 Sensor data
Sensor data is the output of a device that detects and responds to some type of input from a
physical element or in other words, sensors ‘listen’ to the physical world, converting energy
into electric signals. Sensors are embedded into machines that we are used to use daily, like
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 31
timers, thermostats, remote controls, etc. The data or output they produced may be usable
to provide information or input to other systems.
During the recent years, there have been enormous advances in hardware technology like the
development of sensors, which allows collecting huge kinds of different real-time data. In
addition, the cost of sensor hardware has been decreasing during the last years, allowing
them to participate or becoming as key player is several domains of application, i.e.
environment for weather detection and climate trend.
With the introduction of sensors in our daily labour, new challenges for ICT technologies have
arisen, principally related to collection, storage, processing and visualisation. Although those
challenges require the use of efficient systems and technologies, the correct use of them will
allow to have an added-value on the processed sensor data gathered. Saying so, the scope of
this chapter is related to the visualisation of sensor data.
Both from the final user and the application developer perspective, it is of interest the
information derived from the sensor data. As an example of this interest is the validation of
the sensor measurements through visual tools like web browsers or mobile devices.
Nowadays, there exists in the market different solutions, in form of IoT platforms, that easily
allows to connect sensor data to a whole bunch of applications that goes from analytics to
visual features. Some of the most relevant are:
• Microsoft Azure: In regard to the sensor data visualisation, it offers the Microsoft
Power BI tool. This tool offers the visualisation of real-time data or other type of data
coming from heterogeneous resources through a complete set of dashboards or
charts. It provides the capacity to display those results under mobile devices as well
as more complete user interfaces like web applications.
• IBM Watson: IBM offers an IoT platform in order to cover all the full-cycle of the IoT
devices management: from connectivity, to storage, processing and visualisation. In
the context of visualisation, new boards are available where it can be built custom
dashboards. Some of the capabilities that can be implemented with the dashboards
are:
o visualisation charts for the real-time data from devices.
o gauges for visualizing physical quantities like temperature, pressure.
o donuts and bar charts to display the current value of the data points.
o See the Data and storage consumption of your devices.
• FIWARE: Around FIWARE ecosystem there exists a wide range of solutions that can
benefit the exploitation of sensor data, for example:
o cartodb – allowing to display the location of the data producers in a map
o ducksboard – widget based solution allowing to show historic evolution of
entities
o freeboard – which is a very simple to use providing complete set of
functionalities to control the life-cycle of sensor data producers.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 32
Figure 10 shows an example of how the actual visualisation capabilities can be presented. Of
course, the presentation will vary from one vendor to another, but in every case the intention
will be the same, to offer the most complete key information in a simple and attractive
manner.
Figure 10. Visualisation of sensor data.
3.5.2 Earth Observation data
The interaction tasks vary depending on the visualization representations. Different
operations are required for spatio/temporal visualizations or hierarchical and network
structures. Shneiderman [REF-22] introduces seven tasks for information seeking when
interacting with large data sets: overview, zoom, filter, details-on-demand, relate, history,
and extract.
In visual interaction, there are two basic interaction techniques: Direct manipulation, which
allows the user to filter or select elements of visualizations, and dynamic queries where the
user interacts with sliders, menus and buttons. Direct manipulation techniques are
recommendable because they do not distract attention from the analysis process. The menus,
buttons and sliders are often scattered around the user interface, and using them requires
extra effort [REF-23].
A popular technique in visual analytics is using coordinated multiple views [REF-23] which is
a specific exploratory visualization technique. Data is represented in multiple windows and
operations in the views are coordinated. This means that data elements which are selected
and highlighted in one view are highlighted concurrently in all other views that include the
same data element. This operation is often called brushing. The user can change the style of
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 33
brush, the bounding region and the brushing effects. The method is effective for discovering
outliers. An example of brushing is shown in Figure 11.
Figure 11. Brushing. The rounded area is highlighted in the histogram and on the map.
3.6 Big data frameworks Big data processing involves of series of data collection, storage and preparation stages
before the data can be analysed. In fact, even though the big data analytics is what everybody
is talking about, data preparation (e.g. collecting, curating and organising data) accounts for
up to 80% of the of the data scientist’s work [REF-24]. Therefore, frameworks and platforms
to help manage the complete big data processing chain are needed.
An effort to model a reference framework that describes logical components of a generic big
data system has been made by European Big Data Value Association (BDVA), whose
framework has been used for guiding DataBio platform development. Figure 12 shows the
BDVA Reference Architecture where the numbers describe number of the tools DataBio
project’s software vendors are providing for each part of the framework.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 34
Figure 12. BDVA Reference Architecture with numbers of DataBio components.
Another reference architecture for a big data interoperability framework has been published
by National Institute of Standards and Technology (NIST) [REF-25] (Figure 13). The framework
defines broad level data and service use flows between the framework components, denoting
needs for application interfaces.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 35
Figure 13. NIST Big Data Reference Architecture.
There are several existing big data platforms that implement, at least partially, the above-
mentioned frameworks. One of the main issues in big data management is scalability and
distributed data management. The Hadoop framework [REF-26] has been a very successful
distributed data processing framework and it has been widely adopted in industry and
research. The Hadoop project includes a distributed file system that provides high-throughput
access to data, a job scheduling and cluster resource management system and a system for
parallel processing of large data sets. Hadoop is widely used by the biggest data analytics
users, such as Amazon, Facebook, Google, IBM and Twitter.
Hadoop framework is often complemented with other big data processing platforms, such as
Spark processing engine [REF-27] or new types of databases, often referred to as NoSQL
databases. Hadoop and the other open source platforms are supported for industry use by
several service providers, such as Cloudera and Hortonworks as well as all the major software
vendors e.g. Microsoft, Oracle and IBM. In addition to Hadoop based systems, broad-based
data-management vendors offer big data analytics tools from data-integration and database-
management systems to business intelligence, with integration to their own applications.
An Earth Observation (EO) exploitation platform [REF-28] is a collaborative, virtual work
environment providing access to EO data and the tools, processors, and Information and
Communication Technology resources required to work with them, through one coherent
interface. As such, the exploitation platform may be seen as a new ground segments
operations approach, complementary to the traditional operations concept.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 36
OGC Testbed-13 [REF-29] supports the development of ESA’s Thematic Exploitation Platforms
(TEP) by exercising envisioned workflows for data integration, processing, and analytics based
on algorithms developed by users. These algorithms are initially developed by TEP users in
their local environments and afterwards tested on the Exploitation Platform. The goal is to
put an application into an Exploitation Platform (EP) Application Package, upload this package
to the Exploitation Platform, and deploy it on infrastructure that is provided as a service (IaaS)
for testing and execution. An Application Deployment and Execution Service acts as a front
end to cloud platforms, and is used by clients to deploy and execute application packages.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 37
4 Big data in agriculture 4.1 Introduction The agriculture sector is of strategic importance for European society and economy. Due to
its complexity, agri-food operators have to manage many different and heterogeneous
sources of information. Agriculture requires collection, storage, sharing and analysis of large
quantities of spatially and non-spatially referenced data. These data flows currently present
a hurdle to uptake of precision agriculture as the multitude of data models, formats,
interfaces and reference systems in use result in incompatibilities. In order to plan and make
economically and environmentally sound decisions a combination and management of
information is needed [REF-30].
Big data technology (BDT) is a new technological paradigm that is driving the entire economy,
including low-tech industries such as agriculture, where it is implemented under the banner
of precision farming (PF). Here it is necessary to mention, that farmers primary focused not
on (big) data, but on knowledge generated from this data.
4.2 Status of big data in agriculture Big data is moving into agriculture in a big way. A number of new technologies is now
influencing farming:
• Sensors on fields and crops are starting to provide data points on soil conditions, as
well as detailed info on wind, fertilizer requirements, water availability and pest
infestations.
• GPS units on tractors, can help determine optimal usage of agriculture machinery
• Unmanned aerial vehicles, or drones, can patrol fields and alert farmers to crop
ripeness or potential problems.
• RFID-based traceability systems can provide a constant data stream on farm products
as they move through the supply chain, from the farm to the compost or recycle bin.
Individual plants can be monitored for nutrients and growth rates [REF-31].
• There has been an explosive growth in the use of Remote Sensing data in recent years
in terms of volume and also velocity. Such data-collection possibilities are of
significant benefit to several application domains, including atmosphere/marine/land
monitoring, emergency management, and security etc. It is estimated that the
European Copernicus programme alone should bring 13.5 billion Euros and provide
around 28’000 jobs between 2008 and 2020 [REF-32].
Operative aerial Remote Sensing for the whole area of interest when mapping fields at high
spatial resolution but with low frequency (also known as temporal resolution). The aim is to
prepare prescription maps for spatially variable applications of fertilizers and pesticides,
estimated by the spectral measurement of crop parameters. The frequency of the survey
depends on the crop type, agronomical operations, crop management intensity, and weather
conditions. Aerial imaging is usually carried out using a multispectral camera by an external
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 38
provider of photogrammetric services. Analyses may be performed through interpretative
algorithms after pre-processing of the acquired images, i.e. radiometric and geometric
corrections.
Periodic satellite Remote Sensing for the wide-ranging identification of spatial variability and
for the simultaneous capturing of the dynamics of vegetation growth, both at the medium
level of spatial resolution; such as in the case of Landsat 8 images at 30 metres per pixel, once
per 14 days. European Sentinel-2 data seem to be a valuable data source for periodic satellite
Remote Sensing, which significantly reduces the temporal resolution, e.g. to about 6 days for
most of Central Europe when combining Landsat and Sentinel data. The main information lies
in the vegetation indices determined from the R (red), NIR (near infra-red), and R-edge bands.
Absolute values of vegetation indices, their relative-to-mean values for the field, and the
detection of changes in these values are used for the assessment of crop stands and for
delineating management zones. Yield potential zones are areas with the same yield level
within the fields. Yield is the integrator of landscape and climatic variability and therefore
provide useful information for identifying management zones [REF-33]. This presents a basic
delineation of management zones for site specific crop management, which is usually based
on yield maps over the past few years. Similar to the evaluation of yield variation from
multiple yield data described by Blackmore [REF-34], the aim is to identify high yielding (above
the mean) and low yielding areas related as the percentage to the mean value of the field. In
addition, the inter-year spatial variance of yield data is important for agronomists to
distinguish between areas with stable or unstable yields. The presence of complete series of
yield maps for all fields is rare, thus remote sensed data are analysed to determine in field
variability of crops thru vegetation indices.
Figure 14. Yield potential application.
Machinery monitoring typically obtained through the Global Navigation Satellite System
(GNSS), no matter whether it is the American Global Positioning System (GPS NAVSTAR), the
European Galileo system, the Russian GLONASS system, the Chinese BeiDou system, the
Indian NAVIC system (officially named as the Indian Regional Navigation Satellite System), or
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 39
any other. The basic principles are the same for any of the abovementioned systems, even
though technical details may vary. A GNSS receiver is mounted on a moving vehicle, typically
a tractor and/or an application machine. Both the position and trajectory of a
tractor/application machine may then be tracked. Using RTK GNSS methods, a geospatial
accuracy of approx. 0.03 m can be achieved; see e.g. [REF-34]. Experiments involving cell
phone-based monitoring have also been performed, such monitoring approaching a
geospatial accuracy of up to 0.20 m, according to the type of cell-phone network. Machinery
management is focused mainly on collecting telemetry data from machinery and analysing
them in relation with other farm data. The main challenge is access to data and data
integration, when farmer uses tractors and equipment from various manufacturers with
different telematics solutions and different data ownership/sharing policy. In many cases
farms or agriculture service organizations owns tractors of more than one brand/family.
Although the communication protocols used in control units of farm machinery and data
collection are subject of standardization, the telematics solutions including data
ownership/usage policy are usually specific to each tractor brand/family and the level.
Furthermore, attention shall be payed to ISO and CEN standards regulating data sharing in
agriculture basing on the input coming from industry organizations like CEMA and AEF.
Although this is not issue and can be even desirable for purposes of tractor producer’s
customer care responsible for solving technical problems on tractor, for farmers it can be hard
or impossible to connect the data coming from tractor with other farm data relevant for
agronomical / economical evaluation of machinery usage. Despite the fact that the tractor
has telematics solution, the farmer sometimes needs to use third party device and software
to obtain data for field specific analysis. Zetor Company is currently developing and testing
modular telematics solution which is supposed to be part of all Zetor tractors. The solution
will provide several levels of functionality ranging from basic telematics for customer care and
basic location information for customer to field specific economic analysis and precision
agriculture. The highest level of modular solution will offer connection to other data relevant
for farm management like field boundaries obtained Land Parcel Information system (LPIS),
elevation model and possibly yield potential maps derived from EO data.
Meteorological monitoring at farm level to capture the detailed dynamics of weather
conditions on the ground. Weather data together with the positions at which they are
collected are recorded at specific localities at a high frequency (every 10 to 15 minutes). The
main goal is to obtain data for the modelling of crop growth and to support decision making
by agronomists with respect to plant protection (the prediction of plant pests and
infestation), plant nutrition (crop growth and nutrient supply), soil tillage (soil moisture
regime), and irrigation (soil moisture).
4.3 Future developments To transforming data onto knowledge we need Big data analytics system, which will then
provide pilot managers with highly localized descriptive (better and more advanced way of
looking at an operation), prescriptive (timely recommendations for operation improvement
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 40
i.e., seed, fertilizer and other agricultural inputs application rates, soil analysis, and localized
weather and disease/pest reports, based on real-time and historical data) and predictive
plans (use current and historical data sets to forecast future localized events and returns).
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 41
5 Big data in forestry 5.1 Introduction The EU-28 had close to 182 Mha of forest. Among EU-28, Sweden reported the largest
wooded area in 2015 (30.5 Mha, 16.8% of the total area of the EU-28), followed by Spain (27.6
Mha, 15.2% of the total area of the EU-28), Finland (23.0 Mha, 12.7% of the total area of the
EU-28), and France (17.6 Mha). In 2010, 60.3 % of the EU-28’s forests were privately owned.
Regular engagement between different forest stakeholders drives the path of creating digital
forest ecosystem based by taking advantage of big data models.
5.1.1 Development/optimization focus in forestry
Forestry is developing rapidly with the help of new technologies and procedures. Numerous
methods provide information on forests each with their own time cycles, granularities,
accuracies, costs, and viewpoints. Effective utilization of available forest resources is thus not
only based on short-cycled, increasingly accurate, even cost-effective data inventory
methods. Instead, by providing easy access to best available up-to-date information on
forests is expected to generate new applications and businesses and bring together varying
users, thus enhancing the utilization of forest resources. Better data enables more efficient
and higher quality planning and operations in the entire wood supply chain.
The Data to Intelligence (D2I) research program aiming to build the foundations for the next
generation forest resource management system in Finland, recognised the following
development opportunities:
• Terrestrial laser scanning (TLS) especially can provide the tools to measure and predict
single-tree-level AGB components with high detail using metrics describing the shape
and size of the trees. And Airborne Laser Scanning (ALS) could be used to predict this
information to larger areas.
• Timber assortments can be accurately predicted using TLS or multisource approach.
Also, tree quality features can be measured accurately to further improve the value of
forest resource information.
• Automatic processing of TLS data was demonstrated to be effective and accurate and
could be utilized to make future TLS measurements more efficient.
• Multisource approaches provide new possibilities to improve the accuracy of single-
tree measurements but also for predicting values for larger areas.
One of the main objects of D2I was to study operational harvester data potential in updating
forest resource information and as a reference data for Airborne laser scanning.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 42
Figure 15. Data collected by forest machines help to evaluate harvesting conditions, for example. Photo: Erkki Oksanen.
5.2 Big Data applications in forestry - scope, impact and benefit of
digital forest management
5.2.1 Forest Big Data platform
There has already been a strong effort in Finland to build up a general Forest Big Data
platform. The goal of the research task of Metsäteho was to specify and demonstrate a
platform providing data inquiry services for users and applications to easily access available
forest data sources. The Forest Big Data covers forest resource, forest condition, and wood
procurement process data. The FBD Platform connects and refines data from various data
sources and delivers refined data to application suppliers. The application suppliers, for their
part, sell various services to end users (i.e. actors in the wood procurement chains).
Applications are divided into two groups: the FBD Applications that are developed in
cooperation with the FBD Platform and other applications that are developed without
cooperation.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 43
Figure 16. Forest Big Data Platform with forest big data and application components (http://www.datatointelligence.fi/forest-big-data.html).
The aim of Forest Big Data platform is to provide uniform view to heterogeneous forest data
sources by specifying a common data inquiry interface and a data structure for representing
data and required metadata, in particular, the uncertainty. To provide easy access to the data
sources, the platform offers basic services for updating data with growth prediction models
and for combining several up-to-date data estimates by means of Bayesian data fusion. The
main aim of the FBD business is to bring added value to the end users of the FBD Applications.
Therefore, the success of the FBD business is measured by the performance of the FBD end
users rather than by actual FBD business transactions. The FBD Platform is envisaged to be
operational in 2020.
5.2.2 Digiroad
One ongoing tentative in Finland is to establish and develop a comprehensive Forest Digiroad
Service which collects in real-time operations condition and accessibility data of forest road
all over Finland. Forest data forum defines the contents of data and rules for providing,
sharing and utilization of the service. Trucks and drivers heading on forest roads and other
timber transportation routes produce continuously data about road conditions in automated
way. This will help to save money in forest road maintenance and enhance traffic safety as
well as decrease risk of road damages. Precise information on forest roads, forest and its soil
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 44
helps in targeting the work. This is called precision forestry. The work can already be
optimized when saplings are planted.
5.2.3 Metsaan.fi e-Service
Metsään.fi is an eService provided by the governmental body, the Finnish Forest Centre
(METSAK), to make forest resource information available for citizens free of charge.
Metsään.fi as an eService serves forest owners and forestry service providers. Metsään.fi is a
portal through which people who own forest property in Finland can conduct business related
to their forests from their own desktops. The portal connects owners with related third
parties, including providers of forestry services. This makes it easy to manage forestry work
and to be in touch with forestry professionals.
Metsään.fi is a portal which offers the latest information to forest owners on their properties.
As soon as they log in, users can see what should be done in their forests right now.
Information is displayed for each forest stand compartment, broken down by soil type, tree
type and natural occurrence, and possible logging or forestry actions are suggested, including
income and cost estimates. Maps and aerial photographs clearly show where properties are
located and what they look like. Users log in securely using their online banking codes. The
service is offered in Finnish and Swedish.
The portal saves service providers the cost and effort of visiting sites to obtain the latest data
on which to base plans. It also contains up-to-date contact details for forest owners. The aerial
photographs and maps are important tools for professionals, and for small businesses
Metsään.fi may replace the need to have their own geographic information system or CRM
system entirely. Most private Finnish forest owners are either in employment or retired, and
a growing proportion live far from the forests they own. For most owners the forests are not
a major source of income, and only a small fraction have professional forestry skills.
The portal draws information from a national forest resource database, which is continuously
updated with data obtained by laser scanning, aerial photography, sample plot
measurements and site visits. This sort of data collection is a statutory task of the Finnish
Forest Centre. Between surveys, information is maintained based on notifications received by
the Forest Centre from forest owners and forestry organisations. Now, reports on completed
work can also be submitted via Metsään.fi. Tree growth is factored into the data in the portal,
and suggested actions are updated annually.
Development of the portal is funded by the Finnish Ministry of Agriculture and Forestry.
Metsään.fi supports the fulfilment of many strategies and EU directives, including the EU
Forest Strategy, the PSI and INSPIRE directives, the development of rural livelihoods and the
promotion of biodiversity.
Metsään.fi is provided by the Finnish Forest Centre, which is a state-funded organisation for
promoting sustainable forestry and forest-based livelihoods. The portal is free of charge.
Businesses can define in which areas they want to operate.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 45
Figure 17. Metsään.fi service with related operations and user groups.
The Finnish government has recently decided to invest 13 M€ in forestry digitalization in
2016-2018 through establishing a “Key” project for forestry. Political opinion is now after a
long time very positive towards forest business promotion. Forest industry is today again seen
as a business of the future - not a business of the past.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 46
Figure 18. Entity of forest data development in Metsään.fi Service. Specific focus on improvement of data mobility and data quality, and e-service promotion. (Metsätieto 2020 - Kehittämissuunnitelma).
5.2.4 Wuudis Service
Wuudis is a full-service digital forest property management platform for forest owners, forest
contractors and forest authority expert. It is a network service enabling data and close to real
time information sharing between forest owners, contractors, timber buyers, manufacturers,
forest insurance companies and authority expert of forestry sector. It also acts as a market
place for selling timber/biomass and forest care works (harvesting, reforestation, fertilization
etc.). It enables easy and remote forest management for forest owners. It guides for planning
next forest activities that needs to be performed to exploit maximum economic benefit from
timber harvest. It can save costs and increased margin for contractors via easy scouting and
connection between forest owners and contractors. It has societal and environmental value
via promotion of sustainable forest management practices and increased mobilization of
available biomass resources for the needs of the biomass industry.
Forest Health Monitoring and AIS Control
Spain has to face alarming situations due to several pests which are big threats affecting the
health of very important species in the Iberian Peninsula, among others: Quercus ilex,
Quercus suber and Eucaliptus sp.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 47
The tasks carried out on DataBio project in this regards have been related to the development
of a methodology based on remote sensing images (satellite + aerial + UAV) and field data for
monitoring the health status of forests in large areas of the Iberian Peninsula. The work have
been particularly focused on the monitoring of Quercus sp. forests affected by Phytophthora
cinnamomi Rands and of the damage in eucalyptus plantations affected by the coleoptera
Gonipterus scutellatus Gyllenhal.
Specifically for the use case of Eucalyptus and Gonipterus and, in order to test the validity of
UAV data, this pilot used RPAS eBee (fixed wing) and a hexacopter (rotary wing) with three
different cameras: SODA – RGB, Sequoia Micasense – Multiespectral and Thermomap –
Thermic.
Figure 19. TRAGSA Drones used in Forestry pilot.
Processing the obtained images, TRAGSA has generated several products as RGB,
multispectral, NDVI or thermic reflectance orto-mosaics.
Figure 20. Generated products (imageries) in Forestry pilot.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 48
Using different spectral bands provided by the cameras used in the tests, TRAGSA is
developing a model explaining the relation between several EO indexes (NDVI, CARI, GNDVI,
NGRDI,...) and the optical properties of vegetation, pigments concentration and chlorophyll
concentration. Eventually, those data will be cross-checked with field data.
Of course, those indexes have been generated as Big Data multi-table datasets, but a visual
example of the results can be seen in the following image:
Figure 21. Generated indexes (images) in Forestry pilot.
Despite of the acquisition and processing tools have proved to be successful, the main
problem that the pilots are currently dealing with is related to the selected tree species.
Actually, the canopy density (crown density) of eucalyptus is very low, and they appear usually
mixed with bushes. This fact makes the isolation of selected trees difficult.
Currently, TRAGSA is developing isolation geometric methodologies in order to double check
the produced statistical data to be analyzed using R or StatsGraphics.
Regarding the AIS control pilot, TRAGSA is developing more conventional and traditional Big
Data operations based on gathering several datasets and processing them. In this specific
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 49
pilot, the most relevant aspect is based on the use of large-scale images as source datasets as
WORLDCLIM or GHS, population grid developed by JRC.
5.3 Future developments Traditionally, forestry is driven by local producers with own practices and culture of doing
business. Currently there is a significant interest from both EU member states national
governments as well as EU commission in rapid digitalization of forestry in order to improve
the profitability and competitiveness while ensuring sustainability. Big data technologies can
provide long term and sustainable solutions for the management of the whole sector. A
critical challenge is making the benefits available to the wide range of actors and end user
within the forestry sector.
5.3.1 Opportunities and possible big data solutions
5.3.1.1 Metsään.fi e-Service
In the DataBio project, Big Data partners will integrate their existing market ready or almost
market ready technologies to the forest databases with METSAK and the resulted solutions
will be piloted with the forestry sector partners, with associated partners and other
stakeholders e.g. public policies related to nature conservation,
infrastructure/landscape/town plans.
The existing technical environment of Metsään.fi eService concerning big data consists of
multiple data sources and big data types including remote sensors, geospatial information,
images and text. The DataBio project utilizes these data sources, generates new type of data
structures, methods and data analytics methods. Metsään.fi eService uses the big data
through a publishing database and other existing interfaces. The data is not saved in
Metsään.fi eService itself. The Big Data volume at METSAK was 200 GB of forest resource data
in the beginning of 2017. The amount is expected to increase around 100 GB per year during
this project being around 500 GB in the end of 2019.
In Finland, there are vast amounts of passively owned forests that could serve both financial
and environmental needs for forest management more effectively. Also, many novel forest
health problems are likely to occur in the future without innovative forest management
solutions that can enable appropriate management activities. A major concern of forest
authorities is how to encourage forest owners to better manage their assets. Metsään.fi
eService for forest owners and forestry operators supports the management of privately
owned forests and enhances the use of forest resource data. Metsään.fi eService is constantly
developed by means of increasing the forest data and functionalities related to it. The main
goal in DataBio project and related pilots is to enhance the use of Metsään.fi eService and the
use of METSAK’s forest resource data. One key opportunity is to offer Metsään.fi users more
information and tools for instance on storm damages and quality control to support better
forest management. This can be enabled by crowdsourcing solutions, which will be piloted in
DataBio project.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 50
5.3.1.2 Wuudis Service
Since, the launch of ‘Wuudis Forest’ service early this year, presently it has around 400 users.
The service is validated in Finland and the number of users are increasing every day. As a part
of Databio project MHG Systems is aiming to develop the most innovative and holistic forestry
big data solution named ‘Wuudis Data’ by collaborating with SENOP, Forestry TEP, and
Metsaan.fi service. In addition, analyzing the large volume of user behavioural data to provide
customized information to relevant forest stakeholders is also an important feature of this
service. ‘Wuudis Data’ service will be the final outcome of the three forestry pilots, which has
immense commercialization potential with significant benefits to the whole forest business
value chain. The expected indicative benefits of ‘Wuudis Data’ service covering the whole
forest value chain is shown in Figure 22.
Figure 22. Forest value chain and the expected benefits of ‘Wuudis Data’ to all segments of the value chain.
5.3.1.3 Concept of ‘Wuudis Data’ service
‘Wuudis Data’ service is aiming to become the most holistic service in forestry business by
integrating multiple forestry data sources into single web-service. It provides the required
tools for easy forest management and necessary customized data to all forest stakeholders.
The concept of ‘Wuudis Data’ service is provided in Figure xx. The black boxes in Figure xx are
the key features and functionality of future ‘Wuudis Data’ service, which are under
development as a part of Databio forestry pilots. A very effective business oriented approach
in used in this development project as ‘Wuudis Data’ service will be built on top of ‘Wuudis
Forest’ service and integration with various data sources like Forestry TEP, SENOP drone
based monitoring and metsaan.fi service. In short the approach is: (i) Integrates multiple
forestry data sources to single web-service, (ii) Analyses and visualizes the combined data for
the end users based on their needs and (iii) helps users to focus on relevant up-to-date
information of forests and forest owners
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 51
Figure 23. Concept of Wuudis Data.
Hyperspectral Imaging Systems for Drone Remote Sensing Platforms
Hyperspectral imaging from small unmanned aerial vehicles (UAV) offers agile type of remote
sensing. In forestry monitoring the data has mainly captured from manned aircrafts and
satellites, focusing more on the forest or plot level. UAV imaging enables higher spatial
resolution, improving the resolution of photgrammetric point clouds and the acquisition of
three-dimensional (3D) structural data from the forest. In this sense the satellite data can be
locally magnified by UAV hyperspectral data to get information about individual trees,
including their specie and health status via more accurate radiometric image and accurate
heights via more precise canopy height model.
For growing UAV remote sensing market Senop has done pioneering work by manufacturing
small, lightweight camera that can be easily mounted on a drone. Senop camera is a unique
frame based hyperspectral imaging device that is based on a variable air gap Fabry-Pérot
interferometer (FPI) operating in the visible to near-infrared spectral range (500-900 nm).
Within the forestry 2.3.1 pilot the spectral data produced by Senop hyperspectral camera has
spatial resolution less than 10 cm: 1 pixel equals less than 10 centimeters. This data will be
processed into georeferenced spectral maps, i.e. radiometric orthorectified image mosaics,
and into 3D point clouds and surface models (DSM) with EnsoMOSAIC Fusion image
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 52
processing software provided by MosaicMill Ltd. And further, these maps can be joined with
other 3D point clouds and digital surface models of complex 3D structures of forests.
Obtained radiometric image mosaics and 3D point clouds will be analyzed with the algorithms
provided Simosol Ltd. Their forestry simulators and growth modelling tools enable us to
provide a unique ecosystem service within tree-wise monitoring and mapping for studying
e.g. effects of fertilization and infestation.
Figure 24. The concept of new Senop Hyperspectral camera, released ín 2018.
5.3.1.4 Sentinel-2 based monitoring system
Within the forestry pilot by FMI various big data analysis of satellite optical Sentinel-2 data
will be performed. When dealing with high spatial and temporal resolution data like Sentinel-
2 enormous amount of data are generated every five days, allowing for near-real time
monitoring of forest ecosystems on country / continent scales. This is however only possible
by utilizing big data approaches to pre-process and interpret the data. In FMI’s pilot, two
separate tasks of satellite big data are addressed - 1) automated generation of time series
cloud free reflectance images covering the area of Czech Republic in the peak vegetation
growing season, 2) interpretation of cloud-free images with regards to forest health
conditions.
Using all-available observations of pair of Sentinel-2 satellites, quality of each image pixel can
be assessed independently in selected time interval and synthetic reflectance images can be
generated on per-pixel basis (so called spatial-temporal analysis or L3 product). Once the data
archive of pre-processed Sentinel-2 images (big data) is established, end-user may generate
such reflectance images for any selected time interval in a fully automated manner. This may
cover key phenological vegetation stages (spring leaf emergence, peak growing season,
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 53
autumn senescence), or generate timely cloud free image to assess rapid forest changes (e.g.
wind-fall, insect infection). For an example of the output see Figure 25.
Figure 25. Example of cloud-free reflectance image of the forests of Czech Republic generated using big data spatial-temporal analysis utilizing all-available Sentinel-2 observations between June and August 2016.
Interpretation of resulting cloud-free images will be based on the analysis of time series of
vegetation indices and forest quantitative products. For this, extensive in-situ ground truth
data collection will be performed to sample forest structural parameters. The sensitivity of
satellite-derived products will be studied based on these in-situ data and best performing
products will be used in the time series analysis (see Figure 26).
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 54
Figure 26. Example of satellite-derived product describing forest health status - amount of chlorophylls in forest canopies. Red areas are identified as forests with low chlorophyll content. Cloud-free image mosaic generated above Sentinel-2 big data was used as an input in the algorithm.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 55
6 Big data in fishery 6.1 Introduction The fishing fleet is increasingly sophisticated with numerous sensors installed on each vessel
for finding fish, navigating and communicating with the outside world. The temporal and
spatial variability of fisheries stocks have led the fishermen to have an inherent need to share,
restrict and seek out information from each other. Sharing of information between the
fishermen may be bartered, given on a friendship basis or obtained from public sources such
as auction results, buyer reports and publicly available tracking services and statistics. Several
separate groups benefit from data collection and sharing between fishing vessels: the
fishermen, the managing companies, research institutes, and government bodies.
The current method to obtain information about the fisheries activities—where the fishing is
good, which species are caught, and which vessels are active—is through utilization of
communication technologies. Sales data are routinely accessed to obtain information about
which vessels deliver what quantities of different species where. AIS (Automatic Identification
System) tracking portals are used to get an overview of the regions where vessels operate,
refer to the MarineTraffic's information service below for an example of such a portal. If the
regions are within AIS coverage, and telephone (both mobile- and satellite-based) may be
used to contact specific vessels or companies to get a first-hand account of how the conditions
are and to obtain bits of information used for trip planning. This process is manual and the
access to information is limited by availability of industry contacts and the willingness to
share.
The vessels are operated by businesses where the shipowner controls both the vessel and the
resource base of each vessel. The catch is landed from vessels per arrangement with buyers
or by habit and location. Each fishing company report their catch diaries to the regulatory
bodies and catch information is accumulated on each vessel (or company) while sales and
deliveries of fish are collected into publicly available statistics. The businesses maintain their
own experience data of past fisheries while landing statistics are available online from various
sales organizations.
6.1.1 Vessel monitoring systems and fisheries management
Vessel monitoring systems (VMS) are defined by WikiPedia as systems used in commercial
fishing to allow environmental and fisheries regulatory organizations to track and monitor the
activities of fishing vessels both in a country's territorial waters and the Exclusive Economic
Zone extending 200 nautical miles from each country’s coasts. VMS systems are used to
improve the management and sustainability of the marine environment by ensuring proper
fishing practices and help prevent illegal fishing. VMS relates to specific application of
monitoring commercial fishing boats and should not be confused with VTS (Vessel Traffic
System) which aims to monitor marine traffic primarily for safety and efficiency in ports and
busy waterways (in the information services mentioned here MarineTraffic is more a VTS,
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 56
while Global Fishing Watch focus is more of a "global" VMS). VMS systems implementations,
requirements and protocols vary between countries. EU, including Norway through EEC,
requires VMS and Electronic Report Systems (ERS) aboard all fishing vessels longer than 15
meters (above 12 meters since 2012). Figure 27 outlines the components of a VMS system.
Figure 27. Illustration of VMS (from EC commission, Fisheries policy – control technologies).
Furthermore, the governing bodies (EC and national EU and EEC member states) requires
fishermen and landing sites by law to report catch data back to them for monitoring purposes.
Prior to the landing, the catch volume is also reported to the proper sales association for the
catch species and auctioned to determine the landing site. Fishery shipping companies are
increasingly replacing paper logbooks by ERS systems) integrated with each vessel in their
fleet to support efficient quota management. Dualog is one example of a company providing
ERS software for catch journals and quota management with their eCatch application
(www.dualog.no) which is used by many shipping companies in Norway.
ICES, the International Council for the Exploration of the Sea, is the governing body that
determines the status of fish populations and recommends sustainable quotas for the next
year through their annual meeting of fishery biologist from Europe and North America. The
ICES advice is part of the EU negotiations and help set quotas for both EU and ICES member
states. ICES's advice carries a heavy weight as input for settling the quotas at international,
bilateral and national level.
6.1.2 Optimization focus in fishery
Fuel consumption is a challenge for most fisheries, as it represents 60-70% of total annual
cost of a vessel activity ([REF-35], [REF-36], [REF-37], [REF-38]). Nowadays, the decision about
the route of vessels is taken by expert fishermen in a subjective way based on their own
experience, technological devices (sonar, meteorological forecasts) and increasing
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 57
communication with local scientists (e.g. presence/absence forecasts from habitat models).
Apart from the initial planning based on best areas to fish in the past and current
meteorological forecasts, the existence of fish or not in each fished point attempted (spatial
correlation) as well as unforeseen events (bad weather, instrument failures, etc.) need to be
considered. This has been approached in the past using interactive optimization [REF-39],
[REF-40], which has been also used in maritime transportation planning [REF-41]. A critical
task involves the definition of a fitness function that accurately represents the real world
which often require an iterative process of eliciting a fitness function from the expert [REF-
40] which explains why so far there are only some attempts or proof of concept aiming at
optimizing some elements of fishing activities ([REF-42], [REF-43]). However, they focus on
considering a single activity or destination (e.g. routing to the fishing area; [REF-44], [REF-42];
or a single decision driver (e.g. meteorological conditions; [REF-42], [REF-45]). None of those
works has an overall objective of maximizing benefits and reduce costs for an entire fishing
fleet.
6.1.3 Machine learning applications in fishery
Machine learning based approaches using satellite data have been successful in the past for
example in forecasting species recruitment and identifying new potential predictors ([REF-
46], [REF-47], [REF-48], [REF-49]). In particular, further time-series analysis of anchovy
recruitment forecasting showed that a new predictor based in climate patterns could explain
a seasonal behaviour [REF-46]. These methods can be also combined with expert knowledge.
For example, to consider novel machine learning methods that can take advantage of
suspected interactions between species and doubling the chance of being right in predicting
all simultaneously [REF-47] or, to be combined with mechanistic models to take advantage of
both modelling approaches [REF-50].
Recent advances in image analysis have shown promising results for automated classification
of marine samples. This methodology is based on taking a digital image of zooplankton
samples by a scanner [REF-51] or a digital camera [REF-52], and using machine learning
algorithms to identify the zooplankton individuals from the image, classify them into
taxonomic groups (defined by the user), and measuring each of these specimens separately
to obtain estimates of abundance, biomass, and size spectrum per taxon ([REF-53]; [REF-54];
[REF-55]). These methodologies allowed to process several thousand of samples in [REF-54].
A major advantage of this methodology is that it only requires inexpensive equipment and,
after the initial setup and training [REF-56], it can be very fast and operated by non-specialist
personnel. It can estimate the plankton abundance and biomass from large amounts of
samples quickly and thus cost-effectively ([REF-54], [REF-55], [REF-57]), albeit with lower
taxonomic accuracy [REF-52]. However, the application of such methodologies is still a
challenge to phytoplankton classification and abundance estimation due to the small sizes of
the individuals from 5um of Pseudo-nitzschia species to 50 um of other species (Dinophysis,
Alexandrium, Lingulodinium). Nowadays there are few systems that can digitalize it, but those
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 58
systems are big, expensive and require constant attention from a human operator. All this has
limited the number of species and samples aimed in past studies [REF-58]; [REF-59].
Sonars and echo-sounders are widely used for remote sensing of life in the marine
environment. Preliminary work shows the potential of the automated analysis of commercial
medium-range sonar signals for detecting presence/absence of tuna in fishing vessels as a
proof-of-concept to increase our data acquisition capacity in a cost-effective way [REF-60].
Scientific surveys are very costly and of limited coverage [REF-61]. The approach in [REF-60]
uses image processing techniques to analyse sonar screenshots. For each sonar image
measurable regions are extracted and analysed their characteristics. Scientific data was used
to classify each region into a class (tuna or no-tuna) and build a dataset to train and evaluate
classification models by using supervised learning.
[REF-62] and [REF-63] used backscatter energy levels at multiple frequencies, e.g. discrete
frequency analysis, as features for classification of fish species based on echosounder data.
The Institute of Marine Research in Norway has conducted many research projects together
with SIMRAD (Kongsberg Maritime) to quantify and identify fish schools through
hydroacoustic data through the years [REF-64]; [REF-63]. Furuno and Simrad are among the
top professional fish-finding instrument brands globally today with commercial product lines
for sonars and echo sounders dating back to the 1940s and 1950s. They are both currently
positioning themselves to improve their business through applying big data technology to
provide more sophisticated analyses and services to increase the value of their fish-finding
instruments. However, as per October 2017, these companies still have no sonars or echo
sounders with this technology available commercially.
6.1.4 Big Data information services in fishery
Olex is a successful Norwegian company selling a system for combining data from GPS and
echo sounders to provide detailed bathymetric maps based on crowdsourcing data from their
customers and sharing collated data among them. This has worked very well for two decades
(established in 1996), and their system is highly popular with more than 2500 users
contributing data in north-west Europe (refer to www.olex.no for the full list of vessel
installations). Olex have shown that a collective of fishermen sharing their data is capable of
producing results far beyond what could be imagined by the mapping community. Their
system is highly relevant in that it already records and shares data from echo sounders. If
their system is extended to record and report observed biomass estimates in addition to
seabed depth, Olex’s popularity can efficiently boost the expansion of hydroacoustic data
gathering from the fishing fleet.
MarineTraffic is a very popular information service for finding location and other information
about vessels, ports, stations and offshore installations, including arrivals and departure
times. They have more than 6 million monthly users visiting their site.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 59
Figure 28. MarineTraffic information portal showing vessel traffic in Northern Europe based on AIS data (from www.marinetraffic.com).
The Global Fishing Watch [REF-65] (www.globalfishingwatch.org) is a large project that maps
fishing activity based on machine learning of vessel motion patterns [REF-66]. This is based
on massive AIS (Automatic Identification System) data sets dating back to 2012 and with 72
hours of latency for AIS data increments. The project strength is the massive and global
analysis of AIS data, but this is very sparse data to base global monitoring of fishery activity
on, and more data partners are joining the project as it moves forward. The transparent data
sharing policy in this project makes it stand out as a unique global resource. Although the
global map of fishing activity requires a steady high-bandwidth connection, the open data
portal and source code website gives access to highly relevant fisheries that can be accessed
and processed onshore as part of the planning phase. An example of one-month fishing
activity (July-August 2017) in the Norwegian and Barents Sea and is shown in Figure 29.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 60
Figure 29. Norwegian Sea fishing activity according to Global Fishing Watch (Jul/Aug 2017).
BarentsWatch is a comprehensive monitoring and information system with a public portal for
large parts of the northern seas focusing on the North Atlantic from Scotland to the Arctic
waters (www.barentswatch.no). It was launched in 2012 and includes the set of services as
shown in Figure 30. The FishInfo service is special relevant as it shows where fishing activity
is ongoing and which areas have been closed or restricted for fishing, ref Figure 31. It also
includes information about the ice edge and concentration, seabed bottom types including
coral reefs and offshore subsea facilities and active and planned seismic surveys. Hence, it is
already a comprehensive portal with relevant information for fishery and more information
layers are continuously being integrated based on a prioritization of the usefulness. Map files
can be downloaded and are compatible with the Olex system and several chart plotters.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 61
Figure 30. Information services in BarentsWatch (barentswatch.no, accessed 29/11.2017).
Figure 31. The FishInfo service - Example showing fishing activity with nets (blue), lines(red) and purse seiners (purple) as well as restricted (black polygons) and closed (filled polygons) fishing areas (from the fiskinfo.no website).
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 62
6.1.5 Open data providers relevant for fishery Big Data analytics
EModnet (The European Marine Observation and Data Network, www.emodnet.eu) is an
organization supported by the EU's maritime policy which aims to be the gateway to marine
data in Europe. A central challenge is to make available the marine data collected by many
different institutions and research projects across Europe, which often have been carried out
in a fragmented way for many years. EMODnet provides access to European marine data
across seven discipline-based themes as seen in Figure X, with each theme having a specific
gateway with access to standardized observations, data quality indicators and processed data
products. As an example of data relevant for fishery it can be noted that the human activities
portal includes catch statistics data per port and the biology portal has field observation data
for many marine species while the Physics portal has sea surface and depth profile
temperature data. These are just examples of data sets that been made available by
EMODnet, there is much more data available. However, while the data are diverse and large,
there is still work needed on standardisation of data access and filtering functionality (lat/long
rectangles and time) as well as data collation and integration support, specially across themes
(experienced at the www.opensealab.eu event where the DataBio was represented by Team
CLP, see https://github.com/EMODnet/OpenSeaLab). EMODnet is stimulating marine
innovation through open data sharing and encouraging developers to provide their marine
applications as open source through GitHub.com.
Other open data set portals of high relevance for fishery include the UN Comtrade Database,
the World Bank Open Data portal for international trade statistics and the European Market
Observatory for Fisheries and Aquaculture (EUMOFA) and Eurostat for EU-specific statistics
data. NOAA (the National Oceanic and Atmospheric Administration) in the US and the
Copernicus Marine Environment Monitoring Service in EU are highly relevant information
hubs for weather, climate and EO observations. There are also national services that give
more detailed insight per country in each country's fishery and economic statistics. A
summary of relevant open data providers for fisheries with hyperlinks is given in Table 2.
Table 2. Open data providers relevant for fisheries.
Open Data Provider Description Hyperlinks
World Bank Statistics
• API overview
• WDI Indicators
• Third party apps
Global & Country
Economics CSV, Excel,
XML, JSON ++
https://datahelpdesk.worldbank.org/ http://data.worldbank.org/developers/api-overview https://data.worldbank.org/data-catalog/world-development-indicators https://data.worldbank.org/products/third-party-apps
UN Comtrade Database
• Data availability
• API
Merchandise Services
http://comtrade.un.org https://comtrade.un.org/data/da https://comtrade.un.org/data/doc/api/#DataRequests
Eurostat EU statistics
• Web API
SDMX JSON
http://ec.europa.eu/eurostat/ http://ec.europa.eu/eurostat/data/web-services
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 63
EUMOFA EU market data: most important
fishery data
http://www.eumofa.eu/ http://www.eumofa.eu/macroeconomic (example dashboard) Extracts data from EUROstat and national databases from member states
EMODnet & EurOBIS 7 themes, from physics
to biology
http://www.emodnet.eu/portals http://www.eurobis.org/dataset_list
Copernicus (EU) and NOAA (US)
EO, weather, climate, waves,
temperatures
http://copernicus.eu/ http://marine.copernicus.eu (marine data sets) http://www.noaa.gov
Nature.com Database of commercial, small-scale and illegal
catch
https://www.nature.com/articles/sdata201739
University of Tasmania -IMAS Institute for Marine and Antarctic studies
Global fisheries landings
http://metadata.imas.utas.edu.au/geonetwork/srv/eng/metadata.show?uuid=c1fefb3d-7e37-4171-b9ce-4ce4721bbc78
National portals example:
• Fisheries Directorate
• Seafood Council
Norwegian Catch
regulations Export data
https://www.fiskeridir.no http://seafood.no
6.1.6 Fisheries and open source software
There is a comprehensive open source community with software relevant for marine research
and fisheries, a search on GitHub reveals:
• 1570 repositories related to "marine"
• 943 repositories related to "fisheries" or "fishery"
• 4417 repositories with "sonar" in the title
It is hard to say how many of these projects are relevant for fisheries, but even if the
percentage is quite low there will be several interesting applications worth investigating. An
established open-source community developing applications for the marine environment and
fisheries already exists and easy access to open data like the EMODnet initiative will help
accelerate its growth. The scope here is not to give a comprehensive overview over open-
source fisheries projects, but rather to acknowledge this community's existence and highlight
especially relevant projects.
FOCUS (Fisheries Open source CommUnity Software) is a open-source community with the
goal to offer a free suite of tools to support fisheries management organisations to contribute
to sustainable fisheries (www.focus.fish). The project has signed a SDG partnership with the
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 64
UN and is also supported by the European Commission (see
https://ec.europa.eu/fisheries/cfp/control/technologies_en). The main open source
contributor so far is the Swedish Agency for Marine and Water Management with the Union
VMS co-op project (refer to https://github.com/UnionVMS). The community was established
only a year ago (September 2016) with the high ambition to "be the global reference for
standards and innovative open source solutions for sustainable fisheries management". A key
challenge is the data integration of very diverse data sets, and FOCUS support the UN/CEFACT
FLUX (Fisheries Language for Universal eXchange) standards for information exchange to
overcome the barrier with diverse national reporting standards.
Figure 32. The FLUX standards and status (from UN ESCAP presentation of Dr Heiner Lehr) [REF-37].
The type of data exchanged include:
• Information between stakeholders on stocks, quotas and catches
• Real time monitoring of vessel positions (VMS) and on-going fishing activities
• Reporting of fish landed and sales
• Vessel data and characteristics
• License and fishing authorisation requests
FOCUS is a recent, but important, initiative with strong support from the UN and EC and that
has the momentum for becoming the focal point in open source fisheries development for
implementing the FLUX standard for data exchange and more transparency in fisheries.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 65
Furthermore, it is also important to mention that the SAM (State-space Assessment Model)
model by Anders Nielsen and Casper W. Berg from DTU Aqua is used by ICES to estimate
development in at least ten of the most economically important fisheries in Europe [REF-38]).
The model is web-based and anyone can enter data and check the intermediate results and
figures used to generate a result and it is also possible to rewind all results to see the data
used to reach a specific conclusion. This provides high transparency and easier insight
between the researchers themselves as well as between ICES and the fishermen.
6.2 Conclusion There is a lot of activity in the marine sector related to Big Data and fishery, and it is difficult
to get a full overview of the diverse initiatives in a short period of time as the mandate of this
overview. A key observation from established and ongoing work is that the key priorities focus
on open data access, standardisation and data integration of very diverse data sets, reporting
and visualization of fisheries activity. In short, current services are reporting and monitoring
of what is going on in the fisheries.
Global Fishing Watch is the prime example above for leveraging machine learning on a global
scale for detecting past and recent fishing activity. However, although the project is working
with integrating more fishery data sets, the service is based on mainly AIS data which has only
vessel identity, position and destination information. The same comment can be made for
MarineTraffic service. While both services are great services and contributors to transparency
on what goes on in the marine sector, it also goes a long way to highlight the importance and
need for data integration. When multidisciplinary data can be analysed together in new ways
leveraged by Big Data technology important extensions to existing knowledge as well as new
insights are to be extracted.
6.2.1 Current impact of Big Data in fisheries
Current fishery services and portals goes a long way in aiding the fishermen, shipping
companies and authorities:
● VMS and VTS systems shows where fishing activity goes on, and where vessel traffic
and offshore installations are, e.g. helping increase transparency and making it harder
to for illegal, unreported and unregulated (IUU) fishing activities to not be noticed.
● Weather forecast services and systems like fishinfo aid fishermen in planning routes
to the fishing grounds and where to fish with consideration into account weather,
environment and ongoing fishing activities and regulations.
● Open data portals are doing a great job in making data available and discoverable
while much work need remain for making it interoperable and facilitate data collation
across different scientific domains. Important standardization work for data
integration is ongoing, but the different data exchange services also need to
implement them.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 66
6.3 Future developments The fisheries industry is currently at the starting point of leveraging Big Data technology.
While the European Commission and other international and national governance and
research organizations have a strong focus on maximizing open data access potential for
innovation, many industrial stakeholders are also positioning themselves to secure future
revenue from Big Data often in conflict with this goal. Traditionally the marine sector including
vessel equippers and instrument makers has been dominated by a "vendor lock-in policy",
and today one still can see this way of thinking being continued by more restricted access to
instrument data that used to be available previously. Many industry players are moving from
selling products to services and more often than not this includes aggressive licensing policies
giving the end users less control of the data than before. Access to data is fundamental to
stimulate innovation and knowledge discovery, and the EU GDPR (General Data Protection
Regulation) that comes into effect in 2018 is also an important step to secure end users right
to their data and making it available to third party processors of choice [REF-69].
The OECD (Organisation for Economic Co-operation and Development) report The Ocean
Economy in 2030 [REF-70] describes the development of the ocean economics towards 2030.
The blue economy is strongly growing from the 2010 estimate of 1500 billion USD (2.5% of
the world economy), and OECD suggests that if the current rate of growth continues, it will
more than double by 2030. This is a conservative estimate as they do not include a good
number for ocean-related sectors without adequate data (e.g. new innovations). On the other
hand, the ongoing deterioration of the seas, e.g. pollution and climatic changes) put
important restrictions on development of the ocean economy. A globally sustainable and
responsible management increasing the knowledge on the implications for the marine
environment is paramount to harvest the growth potential of the oceans.
6.3.1 User needs and Big Data opportunities
The current catch technology is extremely efficient and the fleet rarely has trouble filling their
catch quotas during the fishing season. The key question is rather how to leverage Big Data
technology to optimize operation, planning and management of the fisheries to secure a best
possible profit with minimal environmental impact on oceans and climate:
• Reduced energy consumption and emissions through efficient fishery planning and
operation with better information services.
• Improved oceanographic models and multispecies stock estimation models.
• Avoid species overfishing and IUU fishing activities.
• More careful catch technology with respect to habitats, seabed, coral reefs and other
species.
• Catch technology for species lower in the food-chain, e.g. mesopelagic fish (200-
1000m water depth).
• Ocean clean-up technology, e.g. plastics and microplastics.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 67
A key challenge for the fishermen and shipping companies is to locate the fish as efficient as
possible to reduce the time and energy needed to fill the quota at a time when prices are
good. This challenge is becoming harder as pelagic species have an increasingly changing
migration pattern, and this observation is especially noticeable in Arctic waters. There is a
strong need to explain and understand why this happens, is it directly related to temperature
and other climate changes or does catch and other human activities like offshore oil
production, marine traffic and seismic surveys impact where the species moves?
There is great potential here for leveraging Big Data technology, and specially descriptive and
predictive analytics, to optimize the catch process both in terms of where the fish is, but also
what the expected market value will be while also increasing the understanding of how the
different elements of the marine environment impact each other when analysed in holistic
but multidisciplinary way. Open data and standardized exchange formats are key
prerequisites for making this feasible.
The fishery pilots in DataBio are summarized in Figure 33, and are focused to start addressing
the first two key challenges listed above.
Figure 33. Summary and context of the Fishery Pilots in DataBio.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 68
7 The future of big data The exponential growth in data volumes is expected to continue at least for the next years to
come. As an example, the number of Internet connected sensors and devices is estimated to
reach 30 billion in 2020 - up from 16 billion in 2016 [REF-71]. As the devices gather and handle
data in increasing resolution both spatially (e.g. drones capturing 4K video and ultra-high
definition satellite images) and temporally (e.g. continuous monitoring), the data volume can
be forecast to keep up doubling every year.
Similarly, by adding parallelism and other technologies, the computing power is commonly
estimated to stay conforming to Moore´s law (proposed already 1965!) during the next years,
which means it is doubling every second year [REF-72]. The same doublings speed seems to
be valid for chip speeds, computer speeds, and computations per unit of energy. However,
this comes at a price - it takes more and more resources to keep up the pace. The
development of quantum computers beyond the recent 16 qubits has the potential to speed
up solving certain categories of problems significantly. These problems include numerical
simulation and machine learning. This will most probably lead to a leap in Moore´s law at least
for certain categories of computation. Also, the communication speed increases with 5G and
new Wi-Fi technologies. Pattern recognition, be it spotting anomalies in time series or certain
crops in aerial images, is expected to advance rapidly utilising especially new developments
in deep and reinforcement learning. Data fusion, e.g. by combining separate datasets like
satellite images and map data, is becoming increasingly straightforward through the use of
standards like Linked Data.
However, there is a limiting factor - the use of electric energy. Already in 2012 the ICT systems
in the world consumed more electricity than all countries in the world except China and US
(see below). This development is increasing, especially with distributed architectures like
blockchain, even if the energy consumption per chip does not necessarily grow. It is clear that
new energy technologies have to be developed to allow for a sustained ICT growth. It is
indicative that the leading ICT companies like Google, Amazon, Apple and Microsoft are using
exclusively renewable energy in their data centres.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 69
Figure 34. Electricity consumption: countries compared to IT sector.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 70
8 Conclusions Current technologies indicate how the new ICTs and information flows would emerge in this
perspective around use of sensors. Earth Observation (EO) from satellites produce vast
amounts of data and is playing an increasingly important role as a regular and reliable high-
quality data source for farmers, foresters, fisheries, but also related industries. This
unprecedented large amount of data available for operational use is creating new challenges
to agriculture, forestry and fishery sector.
Discovery and access are the focal points, bringing together companies and increasing the use
of EO data, sensors and other data to support decision-making. With the new generation of
EO satellites and the emergence of key players from the industry come new challenges and
crossroads for future knowledge management. The resulting explosive growth of data poses
far-reaching dilemmas regarding the fragmentation of data infrastructures at the
international level. The time is for expanding the operational capability of global monitoring
from space, in situ and this opens a unique opportunity to build sustainable Big Data
Infrastructure that support user services exploiting archived and newly acquired derived
datasets.
Earth Observation (EO) from satellites produce vast amounts of data and is playing an
increasingly important role as a regular and reliable high-quality data source for farmers,
foresters, fisheries, but also related industries. This unprecedented large amount of data
available for operational use is creating new challenges to agriculture, forestry and fishery
sector. However, as the capacity of computing, data transfer and storage increase, variety
instead of volume has become the key characteristic of big data. New algorithms and
automated reasoning will be needed to deal with this challenge in an efficient way. Efficient
implementation of big data technologies also requires cooperation between researchers and
developers across different domains. Common big data frameworks and digital platforms
have been developed to facilitate the work, promote compatible approaches and enhance
knowledge transfer between domains.
The agriculture sector is of strategic importance for European society and economy. Due to
its complexity, agri-food operators have to manage many different and heterogeneous
sources of information and requires collection, storage, sharing and analysis of large
quantities of spatially and non-spatially referenced data. The management of this data is
implemented under the banner of precision farming (PF).
Forestry is developing rapidly with the help of new technologies and procedures. Numerous
methods provide information on forests, each with their own time cycles, granularities,
accuracies, costs, and viewpoints. Easy access to best available up-to-date information on
forests is expected to generate new applications and businesses and bring together varying
users, thus enhancing the utilization of forest resources.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 71
The fisheries industry is currently at the starting point of leveraging Big Data technology. The
fishing fleet is increasingly sophisticated with numerous sensors installed on each vessel for
finding fish, navigating and communicating with the outside world. The governments of
coastal nations that manage fisheries resources have a parallel need for information, and
catch statistics are collected together with financing of research cruises to statistically sample
the fish in the ocean at regular time and positional intervals. The need for fish stocks and
active fisheries information is at the same time at odds with the availability of real-time data
from the fishing fleets, which is a result of the scarcity of communication resources and high
cost of limited bandwidth at sea.
Big data technologies continue to develop and expand to new areas. In bioeconomy one of
the challenges is the diversity of operators in the value chain. Advanced analytics and
visualisation technologies will help bringing the benefits of data based solutions to a broader
audience, from primary producers to end users. This, in turn, will contribute to a more
efficient use of resources and help reach sustainability goals.
The new data IT industry, scientists, as well as the private commercial sector and value adding
institutions in general, now expect open access to big data sources and tools enabling efficient
exploitation of multidisciplinary data for developing value-added products and contrive
downstream services in agriculture, forestry and fishery.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 72
9 References
Reference Name of document
[REF-01] Peter ffoulkes. (2017). InsideBIGDATA Guide to the Intelligent Use of Big Data
on an Industrial Scale
[REF-02] NewVantage Partners LLC. (2016). Big Data Executive Survey 2016 - An
Update on the Adoption of Big Data in the Fortune 1000. Big Data Executive
Survey.
[REF-03] DIKW - Russell Ackoff's view,
http://paradigmas2006.blogspot.cz/2006/05/dikw-russell-ackoffs-view.html
[REF-04] Karel Charvat, Sarka Horakova, Sjaak Wolfert, Henri Holster, Otto Schmid,
Liisa Pesonen, Daniel Martini, Esther Mietzsch, Tomas Mildorf Final Strategic
Research Agenda (SRA): Common Basis for policy making for introduction of
innovative approaches on data exchange in agri-food industry’, agriXchange
26. 11. 2012
[REF-05] European Commission, Research and Innovation, Bioeconomy.
http://ec.europa.eu/research/bioeconomy/index.cfm
[REF-06] Schellberga J, Hill MJ, Gerhards R et al., 2008. Precision agriculture on
grassland: Applications, perspectives and constraints. Europ. J. Agronomy 29:
59-71.
[REF-07] Segarra E, 2002. Precision agriculture initiative for Texas high plains. Annual
Comprehensive Report. Lubbock, Texas, Texas A&M University Research and
Extension Center.
[REF-08] Siddiqa et al. 2016, A survey of big data management: Taxonomy and state-
of-the-art, Journal of Network and Computer Applications 71 (2016) 151–166.
[REF-09] OGC 10-157r4, Earth Observation Metadata profile of Observations &
Measurements, Version 1.1, 09/06/2016,
http://docs.opengeospatial.org/is/10-157r4/10-157r4.html.
[REF-10] CEOS OpenSearch Best Practice, Issue 1.1.2, 13/06/2017,
http://ceos.org/document_management/Working_Groups/WGISS/Interest_G
roups/OpenSearch/CEOS-OPENSEARCH-BP-V1.2.pdf.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 73
[REF-11] Tan, P.N., Steinbach, M. and Kumar, V. (2006) Introduction to data mining,
First edition edn., Addison Wesley.
[REF-12] Amar, R., Eagan, J. and Stasko, J. (2005) "Low-level components of analytic
activity in information visualization", IEEE Symposium of Information
Visualization (INFOVIS) 2005, eds. J.T. Stasko and M.O. Ward, IEEE Computer
Society, Minneapolis, MN, USA, 23-25 Oct., pp. 111.
[REF-13] Hand, D.J., Mannila, H. and Smyth, P. (2001) Principles of data mining, First
edition, MIT press.
[REF-14] Ronnie Beggs. Market Report Paper by Bloor. 2016-09-01.
[REF-15] Mike Gualtieri. Data Age 2025: The Forrester Wave™: Streaming Analytics, Q3
2017, Use This Technology To Make Your Enterprise Applications Sense,
Think, And Act In Real Time. 2017-09-01.
https://www.forrester.com/report/The+Forrester+Wave+Streaming+Analytic
s+Q3+2017/-/E-RES136545?objectid=RES136545#endnote1. Retrieved 2017-
11-01.
[REF-16] Alfonso Velosa, W. Roy Schulte, and Benoit J. Lheureux. Hype Cycle for the
Internet of Things, 2017. Gartner report # G00314298. 2017-07-24.
[REF-17] David Reinsel, John Gantz, and John Rydning. Data Age 2025: The Evolution of
Data to Life-Critical. That’s Big; IDC White paper. 2017-04-01.
https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-
DataAge2025-March-2017.pdf. Retrieved 2017-11-01.
[REF-18] Card, S. K., Mackinlay, J. D. & Schneidermann, B. 1999. Readings in
information visualization, Using Vision to Think. Academic Press Inc. 686 p.
ISBN 1-55860-533-9.
[REF-19] Thomas, J.J. and Cook, K.A. (2005) Illuminating the path: The research and
development agenda for visual analytics, 1st edn., IEEE Computer Society, Los
Alamitos, CA.
[REF-20] Norman, D. and Dunaeff, T. (1994) Things that make us smart: Defending
human attributes in the age of the machine, Basic Books, USA
[REF-21] Järvinen, P., (2013) Licentiate thesis. Aalto University, Department of
Information and Computer Science, 135 p. + app. 20 p
[REF-22] Shneiderman, B. (1996) "The eyes have it: A task by data type taxonomy for
information visualizations", Proceedings, IEEE Symposium on Visual
Languages, IEEE, September 3-6, pp. 336.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 74
[REF-23] Roberts, J.C. (2007) "State of the art: Coordinated and multiple views in
exploratory visualization", Fifth International Conference on Coordinated and
Multiple Views in Exploratory Visualization, CMV'07.IEEE, 2-2 July, pp. 61.
[REF-24] Press, G., Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data
Science Task, Survey Says. Forbes, March 23, 2016. Available online at
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-
time-consuming-least-enjoyable-data-science-task-survey-says. Retrieved
2017-11-08.
[REF-25] NIST Special Publication 1500-6. NIST Big Data Interoperability Framework:
Volume 6, Reference Architecture, 2015. Available online at
https://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-6.pdf. Retrieved 2017-
11-08.
[REF-26] The Apache Hadoop Project. http://hadoop.apache.org, 2009. Retrieved
2017-11-08.
[REF-27] Zaharia, M., Chowdhury M., Franklin J. M., Shenker S., Stoica I. Spark: cluster
computing with working sets. In USENIX conference on Hot topics in cloud
computing, pages 10-10 (2010).
[REF-28] OGC 10-157r4, Earth Observation Metadata profile of Observations &
Measurements, Version 1.1, 09/06/2016,
http://docs.opengeospatial.org/is/10-157r4/10-157r4.html.
[REF-29] OGC Testbed 13 – ESA Sponsored Threads – Exploitation Platform, Technical
Architecture, December 09, 2016, PDGS-EVOL-CGI-TN-16/1570, Issue 1.0.
[REF-30] Karel Charvat Tomas Reznik, Vojtech Lukas, Sarka Horakova, Karel Charvat Jr,
Michal Kepka, Marek Splichal, Simon Leitgeb, Jan Shanel, Karel Jedlicka,
Jaroslav Smejkal Big Data in Agriculture – From FOODIE towards Data Bio
abstract for 7 ACPA conference
[REF-31] Tim Sparapani , How Big Data And Tech Will Improve Agriculture, From Farm
To Table, https://www.forbes.com/sites/timsparapani/2017/03/23/how-big-
data-and-tech-will-improve-agriculture-from-farm-to-table/#5c40ba075989.
[REF-32] Copernicus Market report prepared by PriceWaterhouseCoopers for the
European Commission. Available online:
http://www.copernicus.eu/sites/default/files/library/Copernicus_Market_Re
port_11_2016.pdf. Retrieved 2017-07-03.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 75
[REF-33] Kleinjan, J., Clay, D. E., Carlson, C. G., & Clay, S. A. (2007). Productivity zones
from multiple years of yield monitor data. In F. J. Pierce, & D. C. Clay, GIS
applications in agriculture. CRC Press, Boca Raton.
[REF-34] Blackmore, S., Godwin, R. J., & Fountas, S. (2003). The Analysis of Spatial and
Temporal Trends in Yield Map Data over Six Years. Biosystems Engineering
[REF-35] Suuronen, P., Chopin, F., Glass, C., Løkkeborg, S., Matsushita, Y., Queirolo, D.,
& Rihan, D. (2012). Low impact and fuel efficient fishing—looking beyond the
horizon. Fisheries Research, 119, 135-146.
[REF-36] Rojon, I., and Smith, T. 2014. On the attitudes and opportunities of fuel
consumption 512 monitoring and measurement within the shipping industry
and the identification and 513 validation of energy efficiency and
performance interventions. 18 pp.
[REF-37] Parker, R. W. & Tyedmers, P. H. (2014) Fuel consumption of global fishing
fleets: current understanding and knowledge gaps. Fish and Fisheries, 16(4),
684-696.
[REF-38] Fernandes, J. A., Santos, L., Vance, T., Fileman, T., Smith, D., Bishop, J. D., ... &
Austen, M. C. (2016). Costs and benefits to European shipping of ballast-
water and hull-fouling treatment: Impacts of native and non-indigenous
species. Marine Policy, 64, 148-155.
[REF-39] Klau, G. W., Lesh, N., Marks, J., & Mitzenmacher, M. (2010). Human-guided
search. Journal of Heuristics, 16(3), 289-310.
[REF-40] Ibarbia, I., Mendiburu, A., Santos, M., & Lozano, J. A. (2012). An interactive
optimization approach to a real-world oceanographic campaign planning
problem. Applied Intelligence, 36(3), 721-734.
[REF-41] Kang, M. H., Choi, H. R., Kim, H. S., & Park, B. J. (2012). Development of a
maritime transportation planning support system for car carriers based on
genetic algorithm. Applied Intelligence, 36(3), 585-604.
[REF-42] Palenzuela, J. M. T., Vilas, L. G., Spyrakos, E., Dominguez, L. R., & CETMAR, F.
(2010). Routing optimization using neural networks and oceanographic
models from remote sensing data. In Proceedings of the 1st International
Symposium on Fishing Vessel Energy Efficiency E-Fishing, Vigo, Spain.
[REF-43] Vettor, R., Tadros, M., Ventura, M., & Soares, C. G. (2016). Route planning of
a fishing vessel in coastal waters with fuel consumption restraint. Maritime
Technology and Engineering, 3, 167-173.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 76
[REF-44] Groba, C., Sartal, A., & Vázquez, X. H. (2015). Solving the dynamic traveling
salesman problem using a genetic algorithm with trajectory prediction: An
application to fish aggregating devices. Computers & Operations Research,
56, 22-32.
[REF-45] Walther, L., Rizvanolli, A., Wendebourg, M., & Jahn, C. (2016). Modeling and
Optimization Algorithms in Ship Weather Routing. International Journal of e-
Navigation and Maritime Economy, 4, 31-45.
[REF-46] Fernandes, J. A., Irigoien, X., Goikoetxea, N., Lozano, J. A., Inza, I., Pérez, A., &
Bode, A. (2010). Fish recruitment prediction, using robust supervised
classification methods. Ecological Modelling, 221(2), 338-352.
[REF-47] Fernandes, J. A., Lozano, J. A., Inza, I., Irigoien, X., Pérez, A., & Rodríguez, J. D.
(2013). Supervised pre-processing approaches in multiple class variables
classification for fish recruitment forecasting. Environmental modelling &
software, 40, 245-254.
[REF-48] Fernandes, J. A., Irigoien, X., Lozano, J. A., Inza, I., Goikoetxea, N., & Pérez, A.
(2015). Evaluating machine-learning techniques for recruitment forecasting of
seven North East Atlantic fish species. Ecological Informatics, 25, 35-42.
[REF-49] Trifonova, N., Kenny, A., Maxwell, D., Duplisea, D., Fernandes, J., & Tucker, A.
(2015). Spatio-temporal Bayesian network models with latent variables for
revealing trophic dynamics and functional networks in fisheries ecology.
Ecological Informatics, 30, 142-158.
[REF-50] Andonegi, E., Fernandes, J. A., Quincoces, I., Irigoien, X., Uriarte, A., Pérez, A.,
... & Stefánsson, G. (2011). The potential use of a Gadget model to predict
stock responses to climate change in combination with Bayesian networks:
the case of Bay of Biscay anchovy. ICES Journal of Marine Science, 68(6),
1257-1269.
[REF-51] Grosjean, Philippe & Picheral, Marc & Warembourg, Caroline & Gorsky,
Gabriel. (2004). Enumeration, measurement, and identification of net
zooplankton samples using the ZOOSCAN digital imaging system. Ices Journal
of Marine Science - ICES J MAR SCI. 61. 518-525.
10.1016/j.icesjms.2004.03.012.
[REF-52] Bachiller, E., Fernandes, J. A., & Irigoien, X. (2012). Improving semiautomated
zooplankton classification using an internal control and different imaging
devices. Limnology and Oceanography: Methods, 10(1), 1-9.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 77
[REF-53] Gislason, A., & Silva, T. (2009). Comparison between automated analysis of
zooplankton using ZooImage and traditional methodology. Journal of
Plankton Research, 31(12), 1505-1516.
[REF-54] Irigoien, X., Fernandes, J. A., Grosjean, P., Denis, K., Albaina, A., & Santos, M.
(2009). Spring zooplankton distribution in the Bay of Biscay from 1998 to
2006 in relation with anchovy recruitment. Journal of plankton research,
31(1), 1-17.
[REF-55] Di Mauro, R., Cepeda, G., Capitanio, F., & Viñas, M. D. (2011). Using ZooImage
automated system for the estimation of biovolume of copepods from the
northern Argentine Sea. Journal of sea research, 66(2), 69-75.
[REF-56] Fernandes, J. A., Irigoien, X., Boyra, G., Lozano, J. A., & Inza, I. (2009).
Optimizing the number of classes in automated zooplankton classification.
Journal of Plankton Research, 31(1), 19-29.
[REF-57] Manríquez, K., Escribano, R., & Riquelme-Bugueño, R. (2012). Spatial
structure of the zooplankton community in the coastal upwelling system off
central-southern Chile in spring 2004 as assessed by automated image
analysis. Progress in oceanography, 92, 121-133.
[REF-58] Zarauz, L., Irigoien, X., & Fernandes, J. A. (2008). Changes in plankton size
structure and composition, during the generation of a phytoplankton bloom,
in the central Cantabrian sea. Journal of plankton research, 31(2), 193-207.
[REF-59] Ali, N., Wacquet, G., Didry, M., Hamad, D., Artigas, L. F., & Grosjean, P. (2014).
Utilisation conjointe de FlowCAM/ZooPhytoImage et de la cytométrie en flux.
Premiers résultats et perspectives. Action 9. FlowCam ZooPhytoImage.
Livrable n° 4. Rapport final, 23 Septembre 2014.
[REF-60] Uranga, J., Arrizabalaga, H., Boyra, G., Hernandez, M. C., Goñi, N., Arregui, I.,
... & Santiago, J. (2017). Detecting the presence-absence of bluefin tuna by
automated analysis of medium-range sonars on fishing vessels. PloS one,
12(2), e0171382.
[REF-61] Mayer, L., Li, Y., & Melvin, G. (2002). 3D visualization for pelagic fisheries
research and assessment. ICES Journal of Marine Science, 59(1), 216-225.
[REF-62] Gorska, N., Korneliussen, R. J., and Ona, E. 2007. Acoustic backscatter by
schools of adult Atlantic mackerel. – ICES Journal of Marine Science,64: 1145–
1151.
[REF-63] Korneliussen, Rolf J., Heggelund, Y., Macaulay, G.J., Patel, D., Johnsen, E. and
Eliassen, I.K. (2016). Acoustic identification of marine species using a feature
library. Methods in Oceanography 17: 187-205.
D6.3 – State of the Art H2020 Contract No. 732064 Final – v1.0, 29/12/2017
Dissemination level: PU -Public Page 78
[REF-64] Foote, K.G, Knudsen, H.K. Korneliussen, R.J., Nordbø P.E. and Røang, K. (1991)
Postprocessing system for echosounder data. Journal of Acoustic Society of
America, Vol 90, No1, pp 37-47.
[REF-65] Hess, D. and Savitz, J. (2016) OCEANA Global Fishing Watch report. Available
from www.globalfishingwatch.org.
[REF-66] de Souza E.N., Boerder K., Matwin S., Worm, B. (2016) Improving Fishing
Pattern Detection from Satellite AIS Using Data Mining and Machine Learning.
PLoS ONE 11(7): e0158248. doi:10.1371/journal.pone.0158248
[REF-67] Lehr, Heiner. Electronic management and exchange of fishery information,
http://www.unescap.org/sites/default/files/03%20-
%20Electronic%20management%20and%20Exchange%20of%20Fishery%20Inf
ormation%20V151210a.pdf
[REF-68] http://www.aqua.dtu.dk/english/News/2014/03/140313_Fisheries_manage
ment_as_open_source
[REF-69] https://www.eugdpr.org
[REF-70] http://www.oecd.org/sti/futures/ the-ocean-economy-in-2030-
9789264251724-en.htm
[REF-71] https://spectrum.ieee.org/tech-talk/telecom/internet/popular-internet-of-
things-forecast-of-50-billion-devices-by-2020-is-outdated
[REF-72] https://cacm.acm.org/magazines/2017/1/211094-exponential-laws-of-
computing-growth/fulltext