big data in pharma - overview and use cases

58
Big Data Analyses in Pharma An Overview Josef Scheiber, PhD Managing Director July 2015

Upload: josef-scheiber

Post on 14-Aug-2015

95 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Big Data in Pharma - Overview and Use Cases

Big Data Analyses in PharmaAn Overview

Josef Scheiber, PhDManaging Director

July 2015

Page 2: Big Data in Pharma - Overview and Use Cases

Geographie

Startup Center in Waldsassen

Main siteData Analyses and Software Development

Westpark CenterGarmischer Str. in MunichScientific ActivitiesSince Jan 1, 2015

Basel/SwitzerlandData Curation and customer-related activities

Prag150 km

München200 km

Berlin300 km

Frankfurt250 km

Page 3: Big Data in Pharma - Overview and Use Cases

BioVariance at a Glance –Get most out of your complex data

Curate.Integrate

Analyze.Model

Visualize.ExploreDECIDE

Page 4: Big Data in Pharma - Overview and Use Cases

Overview

• Background• Strategy• Examples

Page 5: Big Data in Pharma - Overview and Use Cases

Background

Page 6: Big Data in Pharma - Overview and Use Cases

Courtesy: M. Zeinab, slideshare

Page 7: Big Data in Pharma - Overview and Use Cases

What do we need out of Big Data?1. What are the inhibitors of kinase X and the five most similar

kinases with IC50 < 1 μM and with MW < 500 from all internal and external data sources?

2. What assay technologies have been used against my kinase? Which cell lines?

3. What other proteins are in the same kinase branch as target X, where there were validated chemical hits from external or internal sources?

4. If I hit a particular kinase, what would the potential side-effect profile look like? Which known inhibitor of this kinase has the best safety profile and the fewest known IC50s?

5. Have I identified other compounds with a bioactivity profile similar to compound X and with the same core substructure?

6. Can we create a phylochemical tree of kinases and for a new kinase target place it into the tree on the basis of activity against a reference panel of compounds?

7. Have I identified all kinases with an x-ray structure (in-house or external) that are in pathway X?

Bridging Chemical and Biological Data: Implications for Pharmaceutical Drug DiscoveryJL Jenkins, J Scheiber, D Mikhailov, A Bender, A Schuffenhauer, B Cornett, V Chan, J Kondracki, B Rohde, JW Davies (2012) In: Computational Approaches in Cheminformatics andBioinformatics Edited by:A Bender, R Guha. 25-56 John Wiley & Sons, Inc.

ANSW

ERS

Page 8: Big Data in Pharma - Overview and Use Cases

Context matters!

metabolitesdrugs

targets pathways

diseases (phenotypes)

Page 9: Big Data in Pharma - Overview and Use Cases

Context matters

RNADNA

It´s not that simple …

Page 10: Big Data in Pharma - Overview and Use Cases

Descriptive:What happened?

Diagnostic:Why did it happen?

Predictive:What will happen?

Prescriptive:How can we make it happen?

Better data for better analytics

Hindsight Insight Foresight

Page 11: Big Data in Pharma - Overview and Use Cases

Need for interpretation

33,3

1020

30

70

33,3 8070

60

10

33,3

10 10 1020

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Before molecularbiology

Molecular biologygolden age

Genomics age Deep sequencingage

Very soon

Data Analysis Experiment Experimental Design

Page 12: Big Data in Pharma - Overview and Use Cases

Big Data?

Page 13: Big Data in Pharma - Overview and Use Cases

Volume

Page 14: Big Data in Pharma - Overview and Use Cases

Genome Sequencing

Slide adapted from George Church

Page 15: Big Data in Pharma - Overview and Use Cases

Genome Sequencing

Slide adapted from George Church

Page 16: Big Data in Pharma - Overview and Use Cases

Cost Reduction - Example

458 Ferrari Spider - $398,000 in 2006 –

40 cents now!

Page 17: Big Data in Pharma - Overview and Use Cases

Much more data for way lessmoney

Page 18: Big Data in Pharma - Overview and Use Cases

Challenges for Informatics? –1 genome is roughly 500 GB/data

2011 – several 100 exomes

Page 19: Big Data in Pharma - Overview and Use Cases

Drug Discovery Pipeline

Target finding

Lead FindingLead

Optimization… Phase 1 … Market

Drug candidates Patients

Page 20: Big Data in Pharma - Overview and Use Cases

Velocity

Page 21: Big Data in Pharma - Overview and Use Cases

Velocity

• Mutations in tumor• Resistance mechanisms in patients• long term/short term AE • compliance• Nutrition and microbiome• Data from wearables relevant for drugs

Page 22: Big Data in Pharma - Overview and Use Cases

For each patient

Page 23: Big Data in Pharma - Overview and Use Cases

Variety

Page 24: Big Data in Pharma - Overview and Use Cases

Variety

Page 25: Big Data in Pharma - Overview and Use Cases

Variety

• Bioinformatics• Clinical

• Social network• E-health

• Also text/patents

Page 26: Big Data in Pharma - Overview and Use Cases

A simplified overview –Molecules in Man

Adapted from Gohlke JM, Portier CJ.Environ. Health Perspect. 115:1261-1263 (2007)

Page 27: Big Data in Pharma - Overview and Use Cases

A question of complexity –They all interact …

Biology

Chemistry

Physics

Page 28: Big Data in Pharma - Overview and Use Cases

Dealing with a very complex environment –i.e. many opportunities

DNA RNA Protein Interactions Clinical parameters Treatment History Tissue anatomy Surgical History Epigenetic Profiles from many

patients at different timeponits

Target Off-targets Metabolites Additional indications Unspecific effects Similar drugs

Adapted from: J. Scheiber; How can we enable drug discovery informatics for personalized healthcare?Expert Opinion on Drug Discovery, 1-6; 2/2011

Page 29: Big Data in Pharma - Overview and Use Cases

… individual polypharmacology

Page 30: Big Data in Pharma - Overview and Use Cases

Sequences Expression Proteomics Biological networks(but also: Cells, Tissues, Organs)

POPULATION

Page 31: Big Data in Pharma - Overview and Use Cases

Veracity

Page 32: Big Data in Pharma - Overview and Use Cases

Veracity

• Chemogenomics data• Gene expression data Imputation?

Page 33: Big Data in Pharma - Overview and Use Cases

Veracity - Chemogenomics

Adapted from Tanrikulu et al. Missing Value Estimation for Compound-Target Activity Data, J. Mol. Inf

Page 34: Big Data in Pharma - Overview and Use Cases

Veracity - Interactomics

A Proteome-Scale Map of the Human Interactome Network

Rolland, Thomas et al.Cell , Volume 159 , Issue 5 , 1212 - 1226

Page 35: Big Data in Pharma - Overview and Use Cases

Veracity – Social Media

Page 36: Big Data in Pharma - Overview and Use Cases
Page 37: Big Data in Pharma - Overview and Use Cases

Strategy

Page 38: Big Data in Pharma - Overview and Use Cases

Biological/PharmacologicalUnderstanding

drugs

targets pathways

diseases (phenotypes)

Page 39: Big Data in Pharma - Overview and Use Cases

Data integration strategy

a) A central vocabulary/pointer server (informationstored are preferred names and synonyms plus pointers to data servers, where to find what)

b) semantic integration layer with domain-specificterminology and referential data

c) A database for each datatype collected, storing onlypreferred names along with raw measurements

d) Clearly defined APIs for further integration withpublic data sources and to enable large-scaleanalyses

Page 40: Big Data in Pharma - Overview and Use Cases

Vocabularies needed

• Genes, Drugs, Proteins• Diseases• Organisms• Microbiome species & genes• Localization & source• Phenotype• Metabolite common names

Page 41: Big Data in Pharma - Overview and Use Cases

Answering workflow

Vocabulary

Vocabulary server acts astranslator, aggregator andlocator, i.e. knows wherethe respective facts can befound

Firmicutes produce alpha-Linolein and thereby cause gut irritation

species

metabolite

Further

Data of each type isstored in a specificdatabase toenhanceperformance oflarge-scale analysesExpert tools talk todata directly or via webservices

API

API

API

API

End

use

rin

terf

ace

and

visu

aliz

atio

n

Page 42: Big Data in Pharma - Overview and Use Cases

Examples

Page 43: Big Data in Pharma - Overview and Use Cases

Genome data at scale

Page 44: Big Data in Pharma - Overview and Use Cases

Workflow

Identify drug targets(primary and off-targets,

from DrugBank)

Call variations on a per-individuum basis

Page 45: Big Data in Pharma - Overview and Use Cases

Workflow

Analyse mutation rates in the targets and in

particular drug bindingpockets

Page 46: Big Data in Pharma - Overview and Use Cases

Example: Donepezil / Acetylcholinesterase

• PDB 4EY7

Image extracted from Cheung et al.,

2012 [2]

Page 47: Big Data in Pharma - Overview and Use Cases

Example: Donepezil / Acetylcholinesterase

Page 48: Big Data in Pharma - Overview and Use Cases

Example: Acetylcholinesterase

Integrative Genomics Viewer

Page 49: Big Data in Pharma - Overview and Use Cases

Not very successful

Alignment of the 3D structures of mutant number 52 (yellow) and PDB 4EY7 AChE protein (green). The only changed residue is the Y150 (magenta) to H150 (red). The white surface represents the molecular surface of donepezil.

Page 50: Big Data in Pharma - Overview and Use Cases

Why is this a bad example?

AChE a key enzyme in human biology these arethe most highly conserved, even interspecies

Learning: Look at that stuff before investingtime

Page 51: Big Data in Pharma - Overview and Use Cases

Generating Vocabularies

Page 52: Big Data in Pharma - Overview and Use Cases

Vocabulary generation

Extensive mapping of terms from various sources

Page 53: Big Data in Pharma - Overview and Use Cases

Vocabulary generation

397211 preferred

names

598532 synonyms

102086 identifiers

The chevron diagram shows the number of samples annotated

with names. Already by looking at the numbers you can see that

mapping everything is non-trivial.

A Big Data exercise in itself …

Page 54: Big Data in Pharma - Overview and Use Cases

Tweet mining

Page 55: Big Data in Pharma - Overview and Use Cases

Mining Twitter for side effects

Needed Drug Name and synonyms:

AdalimumabHumiraExemptia331731-18-1L04AB04

MedDRA vocabulary

Page 56: Big Data in Pharma - Overview and Use Cases

Many birds tweet lots of noise … BUT …

• [1] "Lipitor headache 0"[1] "Lipitor rash 1"[1] "Lipitor pain 27"[1] "Lipitor bleeding 0"[1] "Lipitor cough 0"[1] "Lisinopril headache 0"[1] "Lisinopril rash 0"[1] "Lisinopril pain 8"[1] "Lisinopril bleeding 0"[1] "Lisinopril cough 7"[1] "Simvastatin headache 0"[1] "Simvastatin rash 0"[1] "Simvastatin pain 0"[1] "Simvastatin bleeding 0"[1] "Simvastatin cough 0"[1] "Plavix headache 0"[1] "Plavix rash 0"[1] "Plavix pain 0"[1] "Plavix bleeding 1"[1] "Plavix cough 0"[1] "Crestor headache 0"[1] "Crestor rash 0"[1] "Crestor pain 0"[1] "Crestor bleeding 0"[1] "Crestor cough 0"

Page 57: Big Data in Pharma - Overview and Use Cases

Top 200 drugs

- Cutoff is at 1500 tweets that a few drugs easily surpass (althoughit's mostly only pharmaciesadvertizing) - Others are not mentioned once(probably a synonym issue as I restricted to English as language). -- top drugs are tweeted moreoften, but e.g. Tarceva (in 2006) at the very bottom also reaches thetop number of tweets (109 on list).

Page 58: Big Data in Pharma - Overview and Use Cases

089 – 189 6582 – 80Garmischer Str. 4/V80339 München

[email protected]: 09632 – 9248 325Konnersreuther Str. 6g95652 Waldsassen

Questions?