big data in genomics: opportunities and challenges

22
Big Data in Genomics: Opportunities and Challenges Dr. Matthieu-P. Schapranow Bio Data World Congress, Cambridge, UK Oct 22, 2015

Upload: matthieu-schapranow

Post on 13-Apr-2017

638 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Big Data in Genomics: Opportunities and Challenges

Big Data in Genomics: Opportunities and Challenges

Dr. Matthieu-P. Schapranow Bio Data World Congress, Cambridge, UK

Oct 22, 2015

Page 2: Big Data in Genomics: Opportunities and Challenges

■  Online: Visit we.analyzegenomes.com for latest research results, tools, and news

■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014

■  In Person: Join us for “Festival of Genomics” Jan 19-21, 2016 in London, UK

Important things first: Where do you find additional information?

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

2

Page 3: Big Data in Genomics: Opportunities and Challenges

What is the Hasso Plattner Institute, Potsdam, Germany?

Schapranow, HPI, Oct 13, 2015

Analyze Genomes: A Federated In-Memory Database Computing Platform

3

Page 4: Big Data in Genomics: Opportunities and Challenges

Prof. Dr. h.c. Hasso Plattner

■ Research focuses on the technical aspects of enterprise software and design of complex applications

□  In-Memory Data Management for Enterprise Applications

□ Enterprise Application Programming Model

□ Scientific Data Management

□ Human-Centered Software Design and Engineering

■  Industry cooperations, e.g. SAP, Siemens, Audi, and EADS

■ Research cooperations, e.g. Stanford, MIT, and Berkeley

Hasso Plattner Institute Enterprise Platform and Integration Concepts Group

Schapranow, HPI, Oct 13, 2015

Analyze Genomes: A Federated In-Memory Database Computing Platform

4

Partner of Stanford Center for Design

Research

Partner of MIT in Supply Chain

Innovation and CSAIL

Partner at UC Berkeley

RAD / AMP Lab

Partner of SAP AG

Page 5: Big Data in Genomics: Opportunities and Challenges

■  Since 2009 Program Manager E-Health & Life Sciences

■  2006-2014 Strategic Projects SAP HANA

■  Visiting Scientist at V.A., Boston, MA and Charité, Berlin

■  Software Engineer by training (PhD, M.Sc., B.Sc.)

With whom are you dealing?

Schapranow, HPI, Oct 13, 2015

Analyze Genomes: A Federated In-Memory Database Computing Platform

5

Page 6: Big Data in Genomics: Opportunities and Challenges

■  Patients

□  Individual anamnesis, family history, and background

□  Require fast access to individualized therapy

■  Clinicians

□  Identify root and extent of disease using laboratory tests

□  Evaluate therapy alternatives, adapt existing therapy

■  Researchers

□  Conduct laboratory work, e.g. analyze patient samples

□  Create new research findings and come-up with treatment alternatives

The Setting Actors in Oncology

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 6

Big Data in Genomics: Opportunities and Challenges

Page 7: Big Data in Genomics: Opportunities and Challenges

IT Challenges Distributed Heterogeneous Data Sources

7

Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes

Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)

Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov

Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB

PubMed database >23M articles

Hospital information systems Often more than 50GB

Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records

>160k records at NCT Big Data in Genomics: Opportunities and Challenges

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Page 8: Big Data in Genomics: Opportunities and Challenges

Schapranow, HPI, Oct 13, 2015

Our Approach Analyze Genomes: Real-time Analysis of Big Medical Data

8

In-Memory Database

Extensions for Life Sciences

Data Exchange, App Store

Access Control, Data Protection

Fair Use

Statistical Tools

Real-time Analysis

App-spanning User Profiles

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

Analyze Genomes: A Federated In-Memory Database Computing Platform

Drug Response Analysis

Pathway Topology Analysis

Medical Knowledge Cockpit Oncolyzer

Clinical Trial Recruitment

Cohort Analysis

...

Indexed Sources

Page 9: Big Data in Genomics: Opportunities and Challenges

Case Vignette

■  Patient: 48 years, female, non-smoker, smoke-free environment

■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV

■  Markers: KRAS, EGFR, BRAF, NRAS, (ERBB2)

■  Initial treatment: Surgery

■  Therapy: Palliative chemotherapy

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

9

Page 10: Big Data in Genomics: Opportunities and Challenges

Cloud-based Services for Processing of DNA Data

■  Control center for processing of raw DNA data, such as FASTQ, SAM, and VCF

■  Personal user profile guarantees privacy of uploaded and processed data

■  Supports reproducible research process by storing all relevant process parameters

■  Implements prioritized data processing and fair use, e.g. per department or per institute

■  Supports additional service, such as data annotations, billing, and sharing for all Analyze Genomes services

■  Honored by the 2014 European Life Science Award

Big Data in Genomics: Opportunities and Challenges

Standardized Modeling and runtime environment for

analysis pipelines

10

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Page 11: Big Data in Genomics: Opportunities and Challenges

■  Query-oriented search interface

■  Seamless integration of patient specifics, e.g. from EMR

■  Parallel search in international knowledge bases, e.g. for biomarkers, literature, cellular pathway, and clinical trials

Medical Knowledge Cockpit for Patients and Clinicians Linking Patient Specifics with International Knowledge

Big Data in Genomics: Opportunities and Challenges

11

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Page 12: Big Data in Genomics: Opportunities and Challenges

Medical Knowledge Cockpit for Patients and Clinicians

■  Search for affected genes in distributed and heterogeneous data sources

■  Immediate exploration of relevant information, such as

□  Gene descriptions,

□  Molecular impact and related pathways,

□  Scientific publications, and

□  Suitable clinical trials.

■  No manual searching for hours or days: In-memory technology translates searching into interactive finding!

Big Data in Genomics: Opportunities and Challenges

Automatic clinical trial matching build on text

analysis features

Unified access to structured and un-structured data

sources

12

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Page 13: Big Data in Genomics: Opportunities and Challenges

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Medical Knowledge Cockpit for Patients and Clinicians Pathway Topology Analysis

■  Search in pathways is limited to “is a certain element contained” today

■  Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA

■  Implemented graph-based topology exploration and ranking based on patient specifics

■  Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start

Big Data in Genomics: Opportunities and Challenges

Unified access to multiple formerly disjoint data sources

Pathway analysis of genetic variants with graph engine

13

Page 14: Big Data in Genomics: Opportunities and Challenges

Real-time Data Analysis and Interactive Exploration

Drug Response Analysis Data Sources

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

Smoking status, tumor classification

and age (1MB - 100MB)

Raw DNA data and genetic variants

(100MB - 1TB)

Medication efficiency and wet lab results

(10MB - 1GB)

14

Patient-specific Data

Tumor-specific Data

Compound Interaction Data

Page 15: Big Data in Genomics: Opportunities and Challenges

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

15

Page 16: Big Data in Genomics: Opportunities and Challenges

Showcase

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

16 Calculating Drug Response… Predict Drug Response

Page 17: Big Data in Genomics: Opportunities and Challenges

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

17

cetuximab might be more beneficial for the current case

Page 18: Big Data in Genomics: Opportunities and Challenges

Our Methodology Design Thinking

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

18

Page 19: Big Data in Genomics: Opportunities and Challenges

Our Methodology Design Thinking

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

19

Desirability

■  Portfolio of integrated services for clinicians, researchers, and patients

■  Include latest treatment option, e.g. most effective therapies

Viability

■  Enable precision medicine also in far-off regions and developing countries

■  Involve word-wide experts (cost-saving)

■  Combine latest international data (publications, annotations, genome data)

Feasibility

■  HiSeq 2500 enables high-coverage whole genome sequencing in 20h

■  IMDB enables allele frequency determination of 12B records within <1s

■  Cloud-based data processing services reduce TCO

Page 20: Big Data in Genomics: Opportunities and Challenges

Combined column and row store

Map/Reduce Single and multi-tenancy

Lightweight compression

Insert only for time travel

Real-time replication

Working on integers

SQL interface on columns and rows

Active/passive data store

Minimal projections

Group key Reduction of software layers

Dynamic multi-threading

Bulk load of data

Object-relational mapping

Text retrieval and extraction engine

No aggregate tables

Data partitioning Any attribute as index

No disk

On-the-fly extensibility

Analytics on historical data

Multi-core/ parallelization

Our Technology In-Memory Database Technology

+

+++

+

P

v

+++t

SQL

xx

T

disk

20

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

Page 21: Big Data in Genomics: Opportunities and Challenges

■  For patients

□  Identify relevant clinical trials and medical experts

□  Become an informed patient

■  For clinicians

□  Identify pharmacokinetic correlations

□  Scan for similar patient cases, e.g. to evaluate therapy efficiency

■  For researchers

□  Enable real-time analysis of medical data, e.g. assess pathways to identify impact of detected variants

□  Combined mining in structured and unstructured data, e.g. publications,

diagnosis, and EMR data

What to Take Home? Test it Yourself: AnalyzeGenomes.com

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015 21

Big Data in Genomics: Opportunities and Challenges

Page 22: Big Data in Genomics: Opportunities and Challenges

Keep in contact with us!

Hasso Plattner Institute Enterprise Platform & Integration Concepts (EPIC)

Program Manager E-Health Dr. Matthieu-P. Schapranow

August-Bebel-Str. 88 14482 Potsdam, Germany

Dr. Matthieu-P. Schapranow [email protected] http://we.analyzegenomes.com/

Schapranow, Bio Data, Cambridge, UK, Oct 22, 2015

Big Data in Genomics: Opportunities and Challenges

22