analyzing big data in medicine with virtual research environments and microservices

26
Analyzing Big Data in Medicine with Virtual Research Environments and Microservices Ola Spjuth <[email protected]> Department of Pharmaceutical Biosciences Science for Life Laboratory Uppsala University

Upload: ola-spjuth

Post on 14-Apr-2017

361 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Analyzing Big Data in Medicine with Virtual Research Environments and

MicroservicesOla Spjuth <[email protected]>

Department of Pharmaceutical BiosciencesScience for Life Laboratory

Uppsala University

Page 2: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Today: We have access to high-throughput technologies to study biological phenomena

Page 3: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

New challenges: Data management and analysis

• Storage• Analysis methods, pipelines• Scaling• Automation• Data integration, security• Predictions• …

Page 4: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

4

European Open Science Cloud (EOSC)

• The vast majority of all data in the world (in fact up to 90%) has been generated in the last two years.

• Scientific data is in direct need of openness, better handling, careful management, machine actionability and sheer re-use.

• European Open Science Cloud: A vision of a future infrastructure to support Open Research Data and Open Science in Europe– It should enable trusted access to services, systems and the re-use

of shared scientific data across disciplinary, social and geographical borders

– research data should be findable, accessible, interoperable and re-usable (FAIR)

– provide the means to analyze datasets of huge sizes

http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud

Page 5: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

5

Contemporary Big Data analysis in bioinformatics

• High-Performance Computing with shared storage– Linux, Terminal, batch queue

• Problems/challenges– Access to resources is limited– Dependency management for tools is cumbersome, need help from

system administrators to install software– Privacy-related issues– Difficult to share/integrate data– Accessibility issues

• A common approach: Internet-based services– Retrieve data– Analysis tools

Page 6: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

6

Workflows

Page 7: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Service-Oriented Architectures (SOA) in the life sciences

• Standardize– Agree on e.g. interfaces, data formats,

protocols etc.• Decompose and compartmentalize

– Experts (scientists) should provide services – do one thing and do it well

– Achieve interoperability by exposing data and tools as Web services

• Integrate– Users should access and integrate

remote services

API

Scientist

service

Scientist

consume

Page 8: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Service-Oriented Architectures (SOA) in the life sciences, ~2005

Scientist

downtime

API changed

Not maintained

Difficult to sustain,unreliable solutions

APIAPIAPI

Page 9: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Cloud Computing

• Cloud computing offers advantages over contemporary e-infrastructures in the life sciences– On-demand elastic resources and services– No up-front costs, pay-per-use

• A lot of businesses (and software development) moving into the cloud– Vibrant ecosystem of frameworks and tools, including for

big data• High potential for science

Page 10: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

10

Virtual Machines and Containers

Virtual machines• Package entire systems (heavy)• Completely isolated• Suitable in cloud environments

Containers:• Share OS• Smaller, faster, portable• Docker!

Page 11: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

MicroServices

• Similar to Web services: Decompose functionality into smaller, loosely coupled services communicating via API– “Do one thing and do it well”

• Preferably smaller, light-weight and fast to instantiate on demand• Easy to replace, language-agnostic

– Suitable for loosely coupled teams (which we have in science)– Portable - easy to deploy and scale– Maximize agility for developers

• Suitable to deploy as containers in cloud environments

Page 12: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

12

Scaling microservices

http://martinfowler.com/articles/microservices.html

Page 13: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

13

Shippingcontainers?

Page 14: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

14

Orchestrating containers

Page 15: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Kubernetes: Orchestrating containers

• Origin: Google• A declarative language for

launching containers• Start, stop, update, and manage

a cluster of machines running containers in a consistent and maintainable way

• Suitable for microservices

Containers

Scheduled and packed containers on nodes

Page 16: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

16

Virtual Research Environment (VRE)

• Virtual (online) environments for research– Easy and user-friendly access to computational resources, tools and

data, commonly for a scientific domain

• Multi-tenant VRE – log into shared system• Private VRE

– Deploy on your favorite cloud provider

Page 17: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

• Horizon 2020-project, €8 M, 2015-2018– “standardized e-infrastructure for the processing, analysis and information-mining

of the massive amount of medical molecular phenotyping and genotyping data generated by metabolomics applications.”

• Enable users to provision their own virtual infrastructure (VRE)– Public cloud, private cloud, local servers– Easy access to compatible tools exposed as microservices– Will in minutes set up and configure a complete data-center (compute

nodes, storage, networks, DNS, firewall etc)– Can achieve high-availability, scalability and fault tolerance

• Use modern and established tools and frameworks supported by industry– Reduce risk and improve sustainability

• Offer an agile and scalable environment to use, and a straightforward platform to extend

http://phenomenal-h2020.eu/

Page 18: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Users should not see this…

Page 19: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices
Page 20: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Deployment and user access

Launch on reference installation

Launch on public cloudPrivate VRE

Page 21: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

In-house deployment scenarios

MRC-NIHR Phenome Centre

• Medium-sized IT-infrastructure

• Dedicated IT-personnel

• Users: ICL staff

Hospital environment

• Dedicated server

• No IT-personnel• User: Clinical

researcher

Private VRE

Page 22: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Build and test tools, images, infrastructure

Docker Hub

PhenoMeNalJenkins

PhenoMeNalContainer Hub

Development: Container lifecycle

Source code repositories

Page 23: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Two proof of concepts so far

Kultima group Pablo Moreno

Page 24: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

24

Implications

• Improve sustainability– Not dependent on specific data centers

• Improve reliability and security– Users can run their own service environments (VREs) within isolated

environments– High-availability and fault tolerance

• Scalability– Deploy in elastic environments

• Agile development– Automate “from develop to deploy”

• Agile science – Simple access to discoverable, scalable tools on elastic compute resources with

no up-front costs

• NB: Many problems of interoperability remains!– Data– APIs– etc.

Page 25: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

25

Ongoing research on VREs

Datafederation

Computefederation

Privacypreservation

Workflows

Big Dataframeworks

Data management and modeling

Page 26: Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

26

Acknowledgements

Wesley SchaalJonathan AlvarssonStaffan ArvidssonArvid BergSamuel LampaMarco CapucciniMartin DahlöValentin GeorgievAnders LarssonPolina GeorgievMaris Lapins

AstraZenecaLars CarlssonErnst Ahlberg

University ViennaDavid KreilMaciej Kańduła

SNIC Science CloudAndreas HellanderSalman Toor

Caramba.clinicKim KultimaStephanie HermanPayam Emami

ToxHQ teamBarry HardyThomas ExnerJoh DoklerDaniel Bachler