making agricultural knowledge globally discoverable: are we there yet?

62
making agricultural knowledge globally discoverable (and hopefully usable) Nikos Manouselis CEO Agro-Know www.agroknow.gr

Upload: nikos-manouselis

Post on 27-Jan-2015

109 views

Category:

Technology


0 download

DESCRIPTION

Slides of talk at TGI tutorial series at IFPRI, Washington DC, July 11, 2014.

TRANSCRIPT

Page 1: Making agricultural knowledge globally discoverable: are we there yet?

making agricultural knowledge globally

discoverable (and hopefully usable)

Nikos ManouselisCEO Agro-Knowwww.agroknow.gr

Page 2: Making agricultural knowledge globally discoverable: are we there yet?

background

Page 3: Making agricultural knowledge globally discoverable: are we there yet?

An extraordinary company that captures, organizes and adds value to the rich information available in agricultural and biodiversity sciences, in order to

make it universally accessible, useful and meaningful.

http://www.agroknow.gr

Page 4: Making agricultural knowledge globally discoverable: are we there yet?

Our way of doing things

We put our people at our focus

We have a culture of shared, co-defined values

We are based on trust and transparency

We see beyond profit by serving our users and customers so that they create societal impact

Page 5: Making agricultural knowledge globally discoverable: are we there yet?

We develop and put in real practice solutions that transform data into meaningful knowledge

and services

We help people solve problems

informed by data

Page 6: Making agricultural knowledge globally discoverable: are we there yet?

Unorganized Content in local and remote sites

Widgets

Authoring services

Data Discovery Services

Analytics services

Data Platform

Ingestion Translation Publication

Harvesting BlossomCultivation

Organized and structured Content in local and remote

DBs

Educational

Bibliographic

Other

Enrichment

Aggregate data from diverse sources

Works with different type

of data

Prepare data for

meaningful services

Educational

Bibliographic

data aggregation & sharing solutions

Page 7: Making agricultural knowledge globally discoverable: are we there yet?

working with high profile partners & clients

• Food and Agriculture Organization (FAO) of the United Nations

• World Bank Group• UK’s Dept for International Development (DFID)• Michigan State University (MSU)• Wageningen University & Research (WUR)• French Institute of Agricultural Research (INRA)• Creative Commons

Page 8: Making agricultural knowledge globally discoverable: are we there yet?

large scale data-related projects• agINFRA: a data infrastructure to support agricultural scientific

communities (2011 -now)– EU, $5.2M, 12 partners (incl. FAO); tech coordinator, evaluation, sustainability– in G8 Open Data in Agriculture Action Plan for Europe

• SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures (2012 - now)– EU, $3.1M, 8 partners (incl. FAO, WUR); tech coordinator, evaluation,

sustainability– in G8 Open Data in Agriculture Action Plan for Europe

• Organic.Lingua: Demonstrating the potential of multilingual Web Portal for Sustainable Agricultural & Environmental Education (2011-2014)– EU, $2.4M, 11 partners (incl. INRA); tech+data coordinator, evaluation

Page 9: Making agricultural knowledge globally discoverable: are we there yet?

data interoperability work

• Agricultural Interoperability Interest Group (IG) at Research Data Alliance (RDA)

• Database Subgroup, Knowledge & Learning Systems Group, Global Food Safety Partnership (GFSP)

Page 10: Making agricultural knowledge globally discoverable: are we there yet?

context

Page 11: Making agricultural knowledge globally discoverable: are we there yet?

“Knowledge is the engine of our economy. And data is its fuel”

Neelie Kroes, Vice President of the European Commission

http://ec.europa.eu/digital-agenda/en/news/economic-and-social-benefits-big-data

Page 12: Making agricultural knowledge globally discoverable: are we there yet?

“By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.”Big Data Research & Development Initiativehttp://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf

Page 13: Making agricultural knowledge globally discoverable: are we there yet?

policy• USA’s National Research Council on Ensuring

the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age– “researchers to make all research data,

methods, and other information underlying results publicly accessible in a timely manner– “the stewardship of research data is a critical

long-term task for the research enterprise and its stakeholders”

http://www.nap.edu/catalog.php?record_id=12615

Page 14: Making agricultural knowledge globally discoverable: are we there yet?

internationally• joint USA, EU, Australia, Research Data

Alliance (RDA) vision– “researchers and innovators openly sharing

data across technologies, disciplines, and countries to address the grand challenges of society”

https://rd-alliance.org/about.html

Page 15: Making agricultural knowledge globally discoverable: are we there yet?

CIARD’s manifesto• “towards a Knowledge Commons on

Agricultural Research for Development”• “agricultural knowledge is freely accessible

and contributes to reducing hunger and poverty”

• “open knowledge makes it easier to provide better solutions”http://www.ciard.net/about/manifesto

Page 16: Making agricultural knowledge globally discoverable: are we there yet?

GODAN’s statement of purpose• “support global efforts to make agricultural and

nutritionally relevant data available, accessible, and usable for unrestricted use worldwide”

• “advocate for the release and re-usability of data in support of Innovation and Economic Growth, Improved Service Delivery and Effective Governance, and Improved Environmental and Social Outcomes”http://godan.info/statement.html

Page 17: Making agricultural knowledge globally discoverable: are we there yet?

IFPRI & open access• “…research is an international public good, that

should be freely disseminated to the extent possible…”

• “IFPRI is committed to the principle of free access to the knowledge it generates”

Page 18: Making agricultural knowledge globally discoverable: are we there yet?

CGIAR & open access

• “CGIAR regards the results of its research and development activities as international public goods and is committed to their widespread dissemination and use to achieve the maximum impact to advantage the poor…”

Page 19: Making agricultural knowledge globally discoverable: are we there yet?

agricultural knowledge: globally accessible?

a “good enough” case study

Page 20: Making agricultural knowledge globally discoverable: are we there yet?

agricultural bibliography• bibliography on agricultural sciences• several efforts in putting together

(aggregating/indexing) metadata records on agricultural publications & grey literature

• FAO’s AGRIS service: a prominent example– quite advanced data ingestion workflow &

infrastructure– semantic backbone with AGROVOC as LOD & triple

store with all aggregated records– more than 7.5 million publications indexed & made

discoverable

Page 21: Making agricultural knowledge globally discoverable: are we there yet?

elaborated, automated workflow

Metadata harvester

Filtering component

Stores

File system (DC, IEEE LOM, MODS XML)

File system (DC, IEEE LOM, MODS XML)

Stores

Identification and de-duplication component

MySQL

Duplicates

Stores

Transformation component ( to AKIF)

Store metadata in JSON (Internal Format)

Link checking component

PostProcessing/Enrichment component

File system (XMLs)

Get unique ID

Records with

Broken Links

Indexing mechanismAPI

Page 22: Making agricultural knowledge globally discoverable: are we there yet?

AGRIS search service

Page 23: Making agricultural knowledge globally discoverable: are we there yet?

results mashing up more info

Page 24: Making agricultural knowledge globally discoverable: are we there yet?

similar/relevant efforts

• PubAg: forthcoming service by National Agricultural Library (NAL) for discovering USDA publications – and beyond

• LGU community of ag knowledge: forthcoming service federating institutional repositories of Land Grant Universities

• CGIAR open: (to be) federating & providing access to all CG center repositories

• …and more to come

Page 25: Making agricultural knowledge globally discoverable: are we there yet?

but we are not there yet

a) each initiative replicating technical & data processing effort (harvesting, transforming, indexing…)

b) coverage is not complete – transferring the discovery problem to the level of aggregators

c) still not focusing on the needs of each specific subject, group, region, project, …

d) agriculture is multi-disciplinary: relevant publications may be found in other domains (health, economics, environment, … )

Page 26: Making agricultural knowledge globally discoverable: are we there yet?

agricultural knowledge: globally accessible?

a more demanding case study

Page 27: Making agricultural knowledge globally discoverable: are we there yet?

CSPI• the organized voice of the American public on

nutrition, food safety, health and other issues– “improve food safety laws and reduce the incidence of

foodborne illness” • has tracked foodborne illness outbreaks since 1997– events where two or more people become ill from

eating the same food – outbreaks where both the food and pathogen can be

identified

Page 28: Making agricultural knowledge globally discoverable: are we there yet?

US Outbreak Alert Database (until 2011)

http://cspinet.org/foodsafety/outbreak/pathogen.php

Page 29: Making agricultural knowledge globally discoverable: are we there yet?

US Outbreak Report (after 2011)

http://cspinet.org/foodsafety/outbreak_report.html

Page 30: Making agricultural knowledge globally discoverable: are we there yet?

Safe Food International

http://regionalnews.safefoodinternational.org

Page 31: Making agricultural knowledge globally discoverable: are we there yet?

data sources of interest• CDC - Foodborne Outbreak Online Database (FOOD)– http://wwwn.cdc.gov/foodborneoutbreaks/

• ProMED mail – http://www.promedmail.org

• Kansas FS-net – blogging at http://barfblog.com – posting news at http://bites.ksu.edu – archive at http://www.safefoodhandler.com/fsnet.htm

• Project TYCHO– https://www.tycho.pitt.edu

Page 32: Making agricultural knowledge globally discoverable: are we there yet?

some of the challenges

a) time-consuming & laborious primary data identification and documentation (by hand)

b) not complete coverage: incomplete & problematic data collection and sharing

c) multiple & outdated databases for secondary/processed data storage and curation

d) time-consuming & expensive processed data visualization & publication

Page 33: Making agricultural knowledge globally discoverable: are we there yet?

improving curation of data• focus on making data documentation,

storage, management easiera) migrate existing multiple databases in single

data repositoryb) improve data organization & classification

schemes (e.g. by pathogen, food, geographical location, time reported, …etc)

c) improve data curation & filtering workflows (document & store data once, feed multiple sites/access points; US vs. international sites)

Page 34: Making agricultural knowledge globally discoverable: are we there yet?

modernize outbreak data repository

Page 35: Making agricultural knowledge globally discoverable: are we there yet?

advanced data organisation & classification

Page 36: Making agricultural knowledge globally discoverable: are we there yet?

use single data repository for all CSPI sites

Page 37: Making agricultural knowledge globally discoverable: are we there yet?

improving discovery & processing• focus on foodborne illness outbreak reports &

product recallsa) automate as much as possible workflow of

reports’ processing (feeding directly into CSPI data repository)

b) extend coverage of data types (include food product recalls)

c) extend coverage of data sources (include more sites with outbreak reports & product recalls)

Page 38: Making agricultural knowledge globally discoverable: are we there yet?

auto extract structured data from text

Page 39: Making agricultural knowledge globally discoverable: are we there yet?

include & link to food recall data

Page 40: Making agricultural knowledge globally discoverable: are we there yet?

include waterborne illness data

Page 41: Making agricultural knowledge globally discoverable: are we there yet?

add more (relevant) data sources

Page 42: Making agricultural knowledge globally discoverable: are we there yet?

improving visualization & publication• focus on making processed & validated data

accessible immediately onlinea) automate as much as possible workflows for

generating filtered reports (feed diagrams & tables for CSPI publications, present directly online through CSPI & SFI web sites)

b) offer opportunities for public to interact with data online (play with parameters and generate new data reports & visualizations)

c) share data openly for research, education and awareness through CSPI & SFI web sites)

Page 43: Making agricultural knowledge globally discoverable: are we there yet?

enhance search/discovery of data

Landing page Search and filter page View details and access page

Page 44: Making agricultural knowledge globally discoverable: are we there yet?

use of advanced data visualizations

Page 45: Making agricultural knowledge globally discoverable: are we there yet?

allow users to customize data reports

Page 46: Making agricultural knowledge globally discoverable: are we there yet?

provide multi-channel access to data

Page 47: Making agricultural knowledge globally discoverable: are we there yet?

shaping a more big & hairy goal…

Page 48: Making agricultural knowledge globally discoverable: are we there yet?

let’s imagine that• we have an very big, open, scalable platform

that…– …will catalog all relevant information entities– …will make all information machine readable and discoverable– …will allow information providers express how, with whom, under

which license and for which purposes they share this info– …will help people utilize the collective power of information to

solve more societal challenges, better– …will make funding & resource use transparent for donors and

the public– …will coordinate, consolidate and harmonize data & technology

sharing among agri-food sectors and user communities

Page 49: Making agricultural knowledge globally discoverable: are we there yet?

for example: CIARD RING

Page 50: Making agricultural knowledge globally discoverable: are we there yet?

catalogues (some) data

Page 51: Making agricultural knowledge globally discoverable: are we there yet?

catalogues (some) solutions

Page 52: Making agricultural knowledge globally discoverable: are we there yet?

catalogues (some) organisations

Page 53: Making agricultural knowledge globally discoverable: are we there yet?

could federate & include: more data

Page 54: Making agricultural knowledge globally discoverable: are we there yet?

could federate & include: more software

Page 55: Making agricultural knowledge globally discoverable: are we there yet?

could federate & include: donors

Page 56: Making agricultural knowledge globally discoverable: are we there yet?

could federate & include: funding

Page 57: Making agricultural knowledge globally discoverable: are we there yet?

scale up, per federated info type

Meta-registry platform federating all existing registries & making information discoverable

Registries of data sources

Federated data registry

Federated information providers

Registries of organisations’ catalogs

Federated org registry

Registries of software apps/components

Federated solution registry

…etc

Page 58: Making agricultural knowledge globally discoverable: are we there yet?

evolving technology further

HARVESTER

OAI-PMH Service Provider #1

Schema #1

OAI-PMH Service Provider #n

Schema #n

INDEXER

AggregatedXML Repository

Web Portals

Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)

VOA3R (UAH)...

AGRIS AP Schema

IEEE LOM Schema

DC Schema

...

RDF Triple Store

Common Schema

SPARQL endpoint(Data Source #1)

SPARQL endpoint(Data Source #n)

INDEXER

Web Portals

SPARQL endpoint

NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES

How Many?

Big Data Problem!

Is it feasible?

http://semagrow.eu

Page 59: Making agricultural knowledge globally discoverable: are we there yet?

wrapping up

Page 60: Making agricultural knowledge globally discoverable: are we there yet?
Page 61: Making agricultural knowledge globally discoverable: are we there yet?

which are the real problems that we are trying to solve?

information & technology are just enablers