elixir: data challenges in the life...
TRANSCRIPT
![Page 1: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/1.jpg)
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
ELIXIR: Data Challenges in the Life Sciences
e-IRG workshop, Athens, 9-10 June 2014
Andrew Smith ELIXIR Hub
![Page 2: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/2.jpg)
2
medicine
environment
bioindustries
society
To build a sustainable European infrastructure for biological information, supporting life science research and its translation to:
ELIXIR’s mission
![Page 3: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/3.jpg)
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
The potential
![Page 4: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/4.jpg)
4
Genome-wide analysis of crop plants
• Population growth and climate change are major challenges to food security.
• Traditional routes to crop improvement are too slow to keep up with this increase in demand.
• Understanding plant genomes helps us identify which species will be most tolerant to drought, salt and pests while still providing optimum nutrition.
![Page 5: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/5.jpg)
5
Matching the treatment to the cancer
• One in 10 women in the EU-27 will develop breast cancer before the age of 80.
• If we can identify patterns of genes that are active in different tumours, we can diagnose and treat cancers earlier.
![Page 6: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/6.jpg)
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
The challenges
![Page 7: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/7.jpg)
Growing data
12 month doubling
18 month doubling 4 month doubling
3 month doubling
100000000
1E+09
1E+10
1E+11
1E+12
1E+13
1E+14
1E+15
1E+16
2002 2004 2006 2008 2010 2012 2014 2016
byte
s
date
EGA
ENA
PRIDE
MetaboLights
ArrayExpress
![Page 8: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/8.jpg)
The data challenge: geography
8
• Data production increasing sites across Europe
• European Illumina sales up 20% 2o13
Source: http://omicsmaps.com
![Page 9: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/9.jpg)
Data resources in life science
• Diverse • Many
• Disperse
Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2012. MY Galperin, GR Cochrane – Nucleic Acids Research, 2011
Genomics Databases (non-vertebrate) (17.9%)Protein sequence databases (12.9%)Human Genes and Diseases (9.8%)Structure Databases (9.7%)Metabolic and Signaling Pathways (9.3%)Nucleotide Sequence Databases (8.8%)Human and other Vertebrate Genomes (7.1%)Plant databases (7.1%)RNA sequence databases (4.9%)Microarray and other Gene Expression Databases (4.5%)Other Molecular Biology Databases (3.3%)Immunological databases (1.8%)Organelle databases (1.6%)Proteomics Resources (1.2%)Cell biology (0.2%)
~1800 molecular biology
data resources
![Page 10: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/10.jpg)
Users are global
10
Source: EMBL-EBI Live Data Map
![Page 11: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/11.jpg)
The policy drivers: Open Access to data
• Open access to life science data is essential for advances in many areas of research
• It provides a valuable path to discovery, one that in many other areas of research is limited by commercial confidentiality
• National funders increasingly require researchers to make data open
• EC’s H2020 pilot on Open Research Data and Data Management Plans
![Page 12: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/12.jpg)
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
The response
![Page 13: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/13.jpg)
Infrastructure for Life Sciences
13
Compute
Data
Standards
Tools Access Search Analysis
Formats Ontologies Guidelines
Integration Optimization Privacy
Storage Network Computing
Training
![Page 14: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/14.jpg)
ELIXIR’s structure
14
• Tools • Standards • Data • Compute • Training • Industry
![Page 15: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/15.jpg)
ELIXIR Nodes
15
![Page 16: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/16.jpg)
16
![Page 17: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/17.jpg)
Training
17
For Big Data to become huge, however, there are still hurdles to leap. For one thing, the tools to analyse data are not yet good enough. And people with the skills to analyse data are scarce and will become scarcer. By 2018 there will be a “talent gap” of between 140,000 and 190,000 people, …
![Page 18: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/18.jpg)
ELIXIR pilots addressed key challenges in biomedical research
1. Cloud computing “Embassy cloud”: Access reference data in a virtual environment – work as though you are at EMBL-EBI or SIB, Switzerland
2. Authentication & Authorisation Improved methods and processes for access to clinical data
![Page 19: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/19.jpg)
Identifying new drug targets ELIXIR pilot: Interoperability of high-resolution protein data at EMBL-EBI and HPA, Sweden
The Human Protein Atlas portal is a publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 46 different normal human tissues and 20 different cancer types, as well as 47 different human cell lines.
![Page 20: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/20.jpg)
European ELIXIR Data - ”LightPath” (EBI / CSC)
• To explore the replication of large scale (Petabyte scale) archives to remote sites
• To create a separate source of data files for challenging DataIO projects
• Selection of pilot data transfer technology between EBI and CSC
• Established a dedicated light path between datacenters in London and Kajaani
• Development of model for future IO needs in the life sciences in Europe
![Page 21: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/21.jpg)
Cross-site VM Operation - pilot
21
• Perform analysis via cloud infrastructures and VMs
• Transfer VMs between computing centers to allow researchers to perform analyses that they could not otherwise do locally
• Supported by 5 NRENs and in collaboration with
![Page 22: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/22.jpg)
Cross-site VM Operation
22
CSC
EMBL-EBI
University of Groningen
Data Analysis tools
Computation
Data
Analysis tools
VM
VM
VM
Chipster 200GB
NBIC Galaxy 50GB
GoNL 60TB
ENA 3.2PB 1GB lightpath
1GB lightpath 1GB lightpath
Funet
Janet
SURFnet
![Page 23: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/23.jpg)
European Research Infrastructures
23
LS
e-infrastructures life sciences
ICT
![Page 24: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/24.jpg)
Knowledge exchange workshop Discussion of big data challenges in life sciences
Focus on few representative domains
Looking 5 years ahead
Jointly identify potential solutions to our problems
Data
ICT e-infrastructures
LS life sciences Physical facilities
Scientific information
Transfer Computation Storage
![Page 25: ELIXIR: Data Challenges in the Life Sciencese-irg.eu/documents/10920/260645/elixir_e-irg_andy_final.pdf · 2014-12-02 · skills to analyse data are scarce and will become scarcer](https://reader034.vdocuments.net/reader034/viewer/2022050100/5f3f771fa6ef6e363609ecec/html5/thumbnails/25.jpg)
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org
Thank you