predicting adverse drug reactions using pubchem screening data

28
TITLE Yannick Pouliot (with significant contributions from Annie Chiang) 8/31/2010 It’s Back: Predicting Adverse Drug Reactions Using PubChem Screening Data

Upload: yannick-pouliot

Post on 06-Jul-2015

218 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Predicting Adverse Drug Reactions Using PubChem Screening Data

TITLE

Yannick Pouliot (with significant contributions from Annie Chiang)

8/31/2010

It’s Back: Predicting Adverse Drug Reactions Using PubChem Screening Data

Page 2: Predicting Adverse Drug Reactions Using PubChem Screening Data

Motivation

Short-term: Determine feasibility of predicting specific classes of adverse drug reactions (ADRs) using machine learning and compound screening data

Long-term: Use collection of simple screens to assess likelihood of tissue-specific ADRs

Page 3: Predicting Adverse Drug Reactions Using PubChem Screening Data

Understanding “BioAssay” Notion

• Usually, BioAssay = collection of activity measurements for compounds screened against a specific target in a cell type at one or more concentrations

• However, scope of BioAssay DB goes beyond compound screening:

▫ Cell-free assays

▫ In vivo assays

Page 4: Predicting Adverse Drug Reactions Using PubChem Screening Data

What’s a SOC?

• SOC = System of Organ Classes

• A SOC groups “… adverse reaction Preferred Terms pertaining to the same system-organ”.

• Example: SOC C0236104 - “Resistance Mechanism Disorders”

Page 5: Predicting Adverse Drug Reactions Using PubChem Screening Data

Knowns

• Drugs frequently exhibit a higher frequency of tissue-specific ADRs beyond generic liver and kidney damage.

• Pubchem Bioassays DB offers a large number of assays involving a significant number of protein targets

Page 6: Predicting Adverse Drug Reactions Using PubChem Screening Data

Hypothesis

H1: Drugs with increased frequency of SOC-specific ADRs can be identified from patterns of reactivity in PubChem BioAssay screens.

Ho: Reactivity patterns in PubChem BioAssay do not distinguish drugs with increased frequency of tissue-specific ADRs .

Page 7: Predicting Adverse Drug Reactions Using PubChem Screening Data

Data Features

• For a given SOC, matrix of

▫ PRR

▫ drug CUI

▫ BioAssay ID (“AID”)

• Sparse matrix: most compounds have been screened in a few assays only

▫ limited overlap between CVAR and BioAssay

• Very large data sets (more later)

Page 8: Predicting Adverse Drug Reactions Using PubChem Screening Data

Data

Integration

Page 9: Predicting Adverse Drug Reactions Using PubChem Screening Data

Analytical

Process

Binarized PRR (PRR>=2 1

else 0)

Page 10: Predicting Adverse Drug Reactions Using PubChem Screening Data

Selected Statistic:

Proportional Risk Ratio (PRR)

Drug of interest

Other drugs

Event of interest

A B

Other events C D

• PRR = OBS/EXP = [A / (A+C)] / [B / (B+D)]

• Serious ADR Threshold PRR≥2, w/at least

3 cases reported

Page 11: Predicting Adverse Drug Reactions Using PubChem Screening Data

Results: Max PRR by SOC for Statins

Active Ingredient PRR (no. cases) SOCAtorvastatin 6.2 (958) Musculo-skeletalCerivastatin 10.47 (284) Musculo-skeletalFluvastatin 5.12 (7) Musculo-skeletalLovastatin 4.62 (11) Musculo-skeletalPravastatin 5.13 (104) Musculo-skeletal

Rosuvastatin 7.34 (803) Musculo-skeletalSimvastatin 5.79 (186) Musculo-skeletal

Page 12: Predicting Adverse Drug Reactions Using PubChem Screening Data

Addressing Zero ADR

• Many drugs do not have a SOC-specific PRR

▫ Unclear if this means they are unusually safe (could be due to e.g. low prescription volume)

▫ Approach: Assign SOC-specific PRR = 0 if at least 10 ADR reports exist overall

Page 13: Predicting Adverse Drug Reactions Using PubChem Screening Data

Results Since Last Meeting

Page 14: Predicting Adverse Drug Reactions Using PubChem Screening Data

Properties of CVAR drug ingredients

Number

Ingredients with drug reports in CVAR 2,901

Ingredients with drug reports in CVAR WITH `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction'

2,746

Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction' AND whoart_soc_cui is not null

2,731

Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction' and whoart_soc_cui is not null AND total_number_reports >= 10

1,550

Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse Reaction' and whoart_soc_cui is not null and total_number_reports >= 10 AND present in PUBCHEM_BIOASSAY

485

Page 15: Predicting Adverse Drug Reactions Using PubChem Screening Data

BioAssay Subset Properties

Assays and Drugs in PubChem BioAssay with SOC-identified CVAR drug ingredients and ADR reports >=10

AssayType NumberOfAssays NumberCVARCmpds

confirmatory 545 664

in vivo_screening 81 341

other 93 790

screening 466 629

summary 6 202

Total: 1,191 2,626

Page 16: Predicting Adverse Drug Reactions Using PubChem Screening Data

Mapping Results

Number

All SIDs 913,742

CVAR drug ingredients mapped to SIDs 7,913

CVAR drug ingredients with SOC-identified ADRs mapped to SIDs 4,382

CVAR drug ingredients with SOC-identified ADRs and >= 10 reports mapped to SIDs 3,136

Page 17: Predicting Adverse Drug Reactions Using PubChem Screening Data

SOC ID SOCName Avg Model AUC InitCmpds CmpdsRetained

C0236104resistance mechanism disorders

0.92 (0.000593) 468 70

C0221016red blood cell disorders

0.79 (0.000318) 468 185

C0236099reproductive disorders - male

0.77 (0.000167) 468 271

C0027651 neoplasms 0.76 (0.000802) 468 115

C0035204respiratory system disorders

0.76 (0.000465) 468 177

C0027765centr & periph nervous system disorders

0.74 (0.000272) 468 376

C0014130endocrine disorders

0.72 (0.000721) 468 126

C0042790 vision disorders 0.72 (0.000174) 468 286

C0037272skin and appendages disorders

0.7 (0.000196) 468 250

Predictive

Modeling

Results - 1

Page 18: Predicting Adverse Drug Reactions Using PubChem Screening Data

SOCName Avg Model AUC AID1 AssayType Objective TargetAvg p-value AID1 Avg Coeff AID1

resistance mechanism disorders

0.92 (0.000593) AID119 confirmatorySmall molecule inhibitors of tumor cell

growth in implanted CCRF-CEM leukemia cells in mice

2.95E-004 (1.05E-005)

1.15E+000 (5.08E-003)

red blood cell disorders

0.79 (0.000318) AID330in vivo

screening

Small molecule inhibitors of tumor cell growth in implanted P388 leukemia

CD2F1 (CDF1) tumors in mice

1.15E-004 (1.30E-006)

2.34E-001 (6.33E-004)

reproductive disorders - male

0.77 (0.000167) AID1461 confirmatorySmall molecule inhibitors of

neuropeptide S receptor (NPSR) signaling

G protein-coupled receptor for asthma susceptibility isoform A (NPRS A) [Homo sapiens]

7.49E-008 (1.45E-009)

5.55E-001 (3.87E-004)

neoplasms 0.76 (0.000802) AID543 confirmatorySmall molecules cytoxic to H-4-II-E rat

hepatoma cell line2.16E-005

(8.19E-007)9.39E-001 (2.21E-

003)

respiratory system disorders

0.76 (0.000465) AID774 otherSmall molecule inhibitors of Inhibition of

dnzymes frequently used to reach a NAD/NADH Endpoint

2.14E-003 (3.85E-005)

-9.21E+000 (2.31E-002)

centr & periph nervous system disorders

0.74 (0.000272) AID1672 screening

Small molecule inhibitors of inward-rectifying potassium ion channel Kir2.1

in HEK293 cells (human embryonic kidney)

potassium inwardly-rectifying channel J2 [Mus musculus]

1.70E-007 (5.27E-009)

3.08E-001 (1.70E-004)

endocrine disorders

0.72 (0.000721) AID885 confirmatorySmall molecule inhibitors of cytochrome

P450 3A4 (cell-free)cytochrome P450_ subfamily IIIA-polypeptide 4

[Homo sapiens]1.22E-003

(6.53E-005)8.75E-001 (2.65E-

003)

vision disorders 0.72 (0.000174) AID2553 screeningSmall molecule inhibitors of transient receptor potential cation channel C6

(TRPC6) in HEK293 cells

short transient receptor potential channel 6 [Mus musculus]

5.04E-004 (1.09E-005)

1.97E-001 (2.23E-004)

skin and appendages disorders

0.7 (0.000196) AID781 screeningSmall molecule inhibitors of 14-3-3/Bad

interactions (cell-free)

tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein-zeta

polypeptide [Bos taurus]

6.23E-005 (1.37E-006)

4.95E-001 (4.66E-004)

Page 19: Predicting Adverse Drug Reactions Using PubChem Screening Data

ROC AUC For C0236104 -

“resistance mechanism disorders”

Page 20: Predicting Adverse Drug Reactions Using PubChem Screening Data

LOOCV Validation For C0236104 -

“resistance mechanism disorders”

Page 21: Predicting Adverse Drug Reactions Using PubChem Screening Data

Universe of Data For SOC 0236104

Page 22: Predicting Adverse Drug Reactions Using PubChem Screening Data

Disorders Associated With SOC C0236104

(“Resistance Mechanism Disorders”)

Allergic conditions

Autoimmune disorders

Immune disorders NEC

Immunodeficiency syndromes

Ancillary infectious topics

Bacterial infectious disorders

Chlamydial infectious disorders

Ectoparasitic disorders

Fungal infectious disorders

Helminthic disorders

Infections - pathogen unspecified

Mycobacterial infectious disorders

Mycoplasmal infectious disorders

Protozoal infectious disorders

Rickettsial infectious disorders

Viral infectious disorders

Page 23: Predicting Adverse Drug Reactions Using PubChem Screening Data

Indications For Drugs Correlated with Model For SOC

C0236104 (“Resistance Mechanism Disorders”)

Antineoplastic Agents

Anti-Bacterial Agents

Anti-inflammatory Agents

Anticholesteremic Agents

Anti-Inflammatory Agents, Non-Steroidal

Anti-Allergic Agents

Analgesics

Anti-Dyskinesia Agents

Page 24: Predicting Adverse Drug Reactions Using PubChem Screening Data

Lessons Learned

• Limitation of relational databases sans partitioning

▫ Queries won’t return if >50M rows

• Sneaky MySQL loader

▫ Can fail to load records w/o reporting error

▫ Problem when on can’t easily verify expected number of records from XML files

▫ Solution: Write your own loader (can include data validation)

• BMIR cluster has serious NFS problems

▫ Couldn’t run more than a few parsing jobs at same time

• … and my favorite: The dreaded NCBI surprise!

Page 25: Predicting Adverse Drug Reactions Using PubChem Screening Data

The Case Of The Missing Atorvastatin

• Problem: Why were some statins missing from my dataset?

▫ E.g.: Atorvastatin

• Answer: It is present, but there is no way to identify it as such

• Example from AID 881 Atorvastatin SID = 29215408

… and no

synonyms!

Page 26: Predicting Adverse Drug Reactions Using PubChem Screening Data

And Now For Some Test Marketing

Page 27: Predicting Adverse Drug Reactions Using PubChem Screening Data

Acknowledgements

• Alex and Chirag, for contributing secret R knowledge

• Atul, for being helpfully skeptical and patient

• Alex S for quickly addressing DB issues

• NCBI, for providing DBs and messing up my life

Page 28: Predicting Adverse Drug Reactions Using PubChem Screening Data

Need To Standardize And Normalize Assay Activity Metrics

Types of activity metrics (substr 1-12)

% Cell Viabi

% cellular A

% CPE Inhibi

% Inhibition

%Activity at

%displacemen

%Efficacy at

%Inhibition

%Response of

Activity at

AF_20uM

AreaNm

AreaoftheNuc

Ave %Efficac

Ave %Inhibit

AverageInteg

AverageInten

AverageSpots

Baseline-Act

Cell-Activit

CellCount

CellsNucInte

Donor-Activi

Fed-Activity

FP-Activity

F_Ratio

GFP-Activity

Mean High

Mean Low

Mean_NC

Mean_PC

MPIPiCm

MPIPiNm

MS % Inhibit

NucleiNucAre

NumberofCell

Parental-Act

PercentagePo

PiNmbyPiCm

Primary % In

Rate-Activit

Ratio-Activi

RatioofSpoti

RFP-Activity

STD Deviatio

Std.Err(Repe

StdDev_NC

StdDev_PC

TIINiNM

TotalCytopla

TotalIntegra

TotalSpotInt

Total_fluore

TSHR-Activit

W460-Activit

W530-Activit

ZScore

ZScore at 10

ZScore at 20