big data europe - fot-netfot-net.eu/.../van-nuffelen-big-data-europe-fotnet.pdf · stakeholders...
TRANSCRIPT
BIG DATA EUROPE Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
Partners
Mission
Lower barrrier for using big data technologies
o Required effort and resources
o Required data science skills
Assist in establishing
cross-lingual/organizational/domain Data Value
Chains
Show societal value of Big Data 16-mars-15 www.big-data-europe.eu
cross-lingual / cross-organizational / cross-domain
Societal Domain Preliminary Big Data Focus area Selected Key Data assets
Life Sciences &
Health
Heterogeneous data Linking & integration
Biomedical Semantic Indexing & QA
ACD Labs / ChemSpider, ChEBI, ChEMBL, Con-ceptWiki, DrugBank, EN-ZYME, Gene
Ontology, GO Annotation, Swis-sProt, UniProt, Wik-iPathways, PubMed, MeSH, Disease
Ontology (DO), Joint Chemical Dic-tionary (Jochem), Bio-ASQ datasets
Food & Agriculture Large-scale distributed data integration INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural Bibliography
Network (ABN), AGRIS, AquaMaps, Fishbase
Energy Real-time monitoring, stream processing,
data analytics, and decision support European Energy Exchange Data, smart meter measurement data, gas/fuels/energy
market/price data, consumption statistics, equipment condition monitoring data)
Transport Streaming sensor network & geo-spatial
data integration GTFS data, OSM/ LinkedGeoData, MobilityMaps, Transport sensor data, ROSATTE
Road safety attributes, European Road Data Infrastructure - EuroRoadS
Climate Real-time monitoring, stream processing, and
data analytics. European Grid Infrastructure (EGI), Databases hosting atmospheric data. Several
software frameworks for simulation, calibration and reconstruction.
Social Sciences Statistical and research data linking &
integration Federated social sciences data catalogs, statistical data from public data portals and
statistical offices (e.g. EuroStats, UNESCO, WorldBank)
Security
Real-time monitoring, stream processing, and
data analytics.
Image data analysis
Earth Observation data (e.g. Very High Resolution Satellite Imagery acquired from
commercial providers and governmental systems) and collateral data for supporting
CFSP/CSDP missions and operations, Databases hosting atmospheric Data.
Experimental and simulation data concerning dispersion of hazardous substances
Project Summary
Two clearly defined coordination and support measures:
Coordination: Engaging with a diverse range of stakeholder groups representing particularly the Horizon
2020 societal challenges Health, Food & Agriculture, Energy, Transport, Climate, Social Sciences and
Security; Collecting requirements for the ICT infrastructure needed by data-intensive science practitioners
tackling a wide range of societal challenges; covering all aspects of publishing and consuming semantically
interoperable, large-scale data and knowledge assets;
Support: Designing, realizing and evaluating a Big Data Aggregator platform infrastructure that meets
requirements, minimises disruption to current workflows, and maximises the opportunities to take advantage
of the latest European RTD developments (incl. multilingual data harvesting, data analytics & visualisation).
BigDataEurope will implement and apply two main instruments to successfully realize these measures:
Build Societal Big Data Interest Groups in the W3C interest group scheme and involving a large number of
stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts;
Design, integrate and deploy a cloud-deployment-ready Big Data aggregator platform comprising key
open-source Big Data technologies for real-time and batch processing, such as Hadoop, Cassandra and
Storm.
Orthogonal Dimensions of Big Data Ecosystems
Generic Big Data Enabling Technologies
Data Value Chain
Data Generation & Acquisition
Data Analysis & Processing
Data Storage & Curation
Data Visualization &
Usage
Data-driven Services
So
cie
tal
Ch
all
en
ge
s
Do
ma
in S
pe
cifi
c D
ata
Ass
ets
& T
ech
no
log
y
Healthcare
Food Security
Energy
Intelligent Transport
Climate & Environment
Inclusive & Reflective Societies
Secure Societies
BigDataEurope Platform
16-mars-15 www.big-data-europe.eu
Work Packages & Implementation Phases
Community
Building
M1-M12 M13-M24 M25-M36
Enabling Technologies
Component Integration
Uptake
Integrator Deployment
Community Assessment
WP3 – Big Data Generic Enabling Technologies & Architecture
WP5 – Big Data Integrator Instances
WP7 – Dissemination & Communication
WP2 – Community Building & Requirements
WP4 – Big Data Integrator Platform
WP6 – Real-life Deployment & User Evaluation
BDE platform covers complete data-landscape
Data processing with human organized information
Similar data processing steps applied on a large
quantity
Similar data processing steps applied on a stream
of data
Blueprint BDE platform
Dis
sem
ina
tion A
PI
aggregated
data
Search
index
Dataset
Meta data
SPARQL
JSON
LOD
search
Dis
sem
ination s
tora
ge
JSON-LD
Real time aggregator
Bulk data aggregator
Background aggregator
Rep
ort
ing
API
Bulk
database
Background
knowhow
Blueprint BDE platform
Dis
sem
ina
tion A
PI
aggregated
data
Search
index
Dataset
Meta data
SPARQL
JSON
LOD
search
Dis
sem
ination s
tora
ge
JSON-LD
Real time aggregator
Bulk data aggregator
Background aggregator
Rep
ort
ing
API
Bulk
database
Background
knowhow
Deployment
Coordination
16-mars-15 www.big-data-europe.eu
Networking partners
www.big-data-europe.eu
Health, demographic
change and wellbeing Food, Agriculture,
Forestry, Water and
Bioeconomy
Inclusive, innovative and
Reflective Societies
Secure, clean and
efficient energy Climate, environment,
resource efficiency and
raw materials
Smart, green and
integrated transport Secure
Societies
Envisioned societal stakeholder engagement cycle
Community building and supporting
◎ Establish 7 Societal Big Data Interest Groups
o modelled after the W3C interest groups
o involving a large number of stakeholders from the H2020 societal challenges as well as technical Big
Data experts
o each group has a domain and a technical chair
◎ Building a European network and multiplier organization per societal challenge to
o engage with stakeholders in the particular societal challenge area and raise awareness
o support the requirements elicitation, definition and prioritization
o assemble a library of data sources and datasets
o provide a comprehensive test bed for the evaluation of the BDE Aggregator Platform
o select pilot use cases, across different domains
o promote the showcase developed for the societal domain and support the dissemination of the BDE
results
o provide appropriate academic and training curricula for training future researchers and practitioners. 27-févr.-15
www.big-data-europe.eu
Workshops
◎ 7 X 3 Workshops (at least 3 per Societal Challenge)
◎ First series of workshops in the next months will focus on requirements
definition
o analyse workshops results and create 1st draft per societal challenge,
o examine also the use of other tools such as
❖ surveys (broad audience to ask for (big) data management needs)
❖ manage experts interviews with Big Data experts
❖ interviews with EC representative per societal challenge
◎ Second series of workshops in the 2nd year will focus on a review of the
architecture and first prototype implementation
◎ Third series of workshops in the 3rd year will focus on the platform
evaluation and showcases for the societal domains 27-févr.-15 www.big-data-europe.eu
Big Data Europe
16-mars-15 www.big-data-europe.eu
OPEN PHACTS - BIG DATA
AND DRUG DISCOVERY
BRYN WILLIAMS-JONES, CEO THE OPEN PHACTS FOUNDATION
Big Data Europe
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Repeat @ each
company x
Lowering industry firewalls: pre-competitive informatics in drug discovery
Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
Pre-competitive Informatics:
Pharma companies are all accessing, processing, storing & re-processing external open research data
• EC funded public-private
partnership for pharmaceutical
research
• Focus on key problems
– Efficacy, Safety, Education
& Training, Knowledge
Management
The Innovative Medicines Initiative
The Open PHACTS Project
• Create a semantic integration hub (“Open
Pharmacological Space”)…
• Runs 2011-2014, ENSO till 2016
• Deliver services to support on-going drug
discovery programs in pharma and public domain
• Leading academics in semantics, pharmacology
and informatics, driven by solid industry business
requirements
• 10 EFPIA companies, 15 academics, 6 SMEs
• Focus on sustainability and long term impact of
the Open PHACTS infrastructure
Integrate Multiple Research Biomedical
Data Resources
Into A Single Open & Free
Access Point
Open PHACTS Mission
ChEMBL DrugBank Gene
Ontology Wikipathways
UniProt
ChemSpider
UMLS
ConceptWiki
ChEBI
TrialTrove
GVKBio
GeneGo
TR Integrity
What do research scientists want
to know?
Number sum Nr of 1 Question
15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse
18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is
the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?
24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that
contain substructure X.
38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match
stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What
are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of
the target family (i.e. PKC) both from structured assay databases and the literature.
44 13 8 Give me all active compounds on a given target with the relevant assay data
46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
59 14 8 Identify all known protein-protein interaction inhibitors
‘Business Questions’
Nanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) Domain
Specific
Services
Identity Resolution
Service
Chemistry
Registration
Normalisation &
Q/C
Identifier
Management
Service
Indexing
Core
Pla
tform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
The Open PHACTS Discovery
Platform
http://dx.doi.org/10.1016/j.websem.2014.03.003
Sustaining Impact
“Software is free like puppies
are free - they both need money
for maintenance”
…and more resource for future
development
How do we move data about and
integrate it?
http://imgs.xkcd.com/comics/standards.png
Data Standardisation is vital
P12047 X31045
GB:29384
Yet the bioscience world really
struggles to agree on names
[email protected] @Open_PHACTS
Open PHACTS Practical Semantics
Acknowledgements GlaxoSmithKline – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Novartis
Merck Serono
H. Lundbeck A/S
Eli Lilly Netherlands Bioinformatics Centre
Swiss Institute of Bioinformatics ConnectedDiscovery
EMBL-European Bioinformatics Institute
Janssen Esteve Almirall
OpenLink Scibite
The Open PHACTS Foundation
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn
AstraZeneca
Pfizer
Big Data Europe
16-mars-15 www.big-data-europe.eu