an attempt to use literature curated pathway dbs › sites › default › ... · an attempt to use...

21
An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Upload: others

Post on 07-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

An attempt to use literature curated

pathway DBs

Charles VaskeStuart Lab Meeting

May 6th, 2009

Page 2: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Structured Pathways

• Lots of cancer research/genes/data

• Subsequently, we know a lot about pathways active in cancer

• Can we use this structured knowledge?

Page 3: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Modeling clinical samples

MDM2

TP53

Apoptosis

MDM2g MDM2m MDM2p MDM2a

CNVMDM2 GEMDM2 ProMDM2

TP53g TP53m p53p p53a

CNVTP53 GETP53 ProTP53

MethylMDM2

MethylTP53

Apoptosis

Gene mRNA Protein Activity

CNV

Methylation

GeneExpression

Proteomics

TCGA Data

Pathway

DiagramPathway Perturbation Model

Output:

Page 4: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Use in clinical samples

Patient 3

M

T

S A

WT

M

T

S A

M!

M

T

S A

M! T"

ObservationsPatient 2

M

T

S A

WT

M

T

S A

M!

M

T

S A

M! T"

Observations

n/a n/a

01.52

n/aMDM2

TP53

0.2 0.1

0.90.10.6

0.8Apoptosis

Senescence

+ +

+--

-Tumor

Std. Trtmt. Regimen

+ +

+00

0MDM2

TP53

Proteomics

APC Expression

Sample

Description

Genomics(Copy Num. Var.,

Mutation)

Pe

rtu

rba

tio

ns

Ob

se

rva

tio

ns

0.1Estrogen Receptor

–PCR

6 mo.TTR"La

be

ls"

Pa

tie

nt

3

n/a n/a

01.52

n/aMDM2

TP53

0.2 0.1

0.90.10.6

0.8Apoptosis

Senescence

+ +

+--

-Tumor

Std. Trtmt. Regimen

+ +

+00

0MDM2

TP53

Proteomics

APC Expression

Sample

Description

Genomics(Copy Num. Var.,

Mutation)

Pe

rtu

rba

tio

ns

Ob

se

rva

tio

ns

0.1Estrogen Receptor

–PCR

6 mo.TTR"La

be

ls"

Pa

tie

nt

2

Pathway 1

Pathway 2

...

0.7

0.01

Label 1

Label 2

0.1

–..

.

Pa

tie

nt

1

0.08

0.6

0.99

+

Pa

tie

nt

2

0.78

0.02

0.8

Pa

tie

nt

3 ...

Classifier

Training

Classifier for prediction,

Breast cancer target pathways,

Interventional targets

A B

C

Patient 1

M

T

S A

WT

M

T

S A

M!

M

T

S A

M! T"

Observations

Pathway Perturbation Scoring

n/a n/a

01.52

n/aMDM2

TP53

0.2 0.1

0.90.10.6

0.8Apoptosis

Senescence

+ +

+––

–Tumor

Std. Trtmt. Regimen

+ +

–00

0MDM2

TP53

Proteomics

HLP Activity

Sample

Description

Genomics(Copy Num. Var.,

Mutation)

Pe

rtu

rba

tio

ns

Ob

se

rva

tio

ns

0.1Estrogen Receptor

–PCR

6 mo.TTR"La

be

ls"

Pa

tie

nt

1

Pa

thw

ay

3

Pa

thw

ay

2

MDM2

TP53

Senescence ApoptosisP

ath

wa

y 1

Pert

urb

ation

Pro

babili

tes

Samples

1 2 3

Page 5: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Outline

1. Get pathways (ugly, 50%-95% done)

2. Convert to graphical model

3. Add evidence from patient

4. Infer the value of hidden variables(i.e. Apoptosis, Chemotaxis)

5. Solve cancer (finally)

Page 6: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

• Proteins

• Complexes

• Abstract procceses

• Reactions/modifications/translocations

• Activation vs. participants

Page 7: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

BioPAX

• Based on OWLWeb Ontology Lang.

• Based on RDFResource Desc. Format

• Not human-readable

• Must use tools!

• I love to complain about it

Page 8: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

• Three levels (versions), people only use level 2 (I think)

• Defines “things” which have various properties, including a “class”

• Each “thing” is a URI, which looks like a URL

BioPAX

Page 9: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

RDF/OWL/BioPAX Tools

• Protege: from Stanford, designed more for creating a BioPAX more than looking at “data” in that “format”

• SPARQL/roqet: sort of like SQL for RDF. Don’t use XML tools, you may miss things due to variations in serializations.

Page 10: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Caveat• All this dense typing and

formating is extremely expressive

• However, the amount of expression impedes programmatic understanding

• Test, test, test

This shows the “transcription” of a complex. The meaning is obvious to a human, but befuddling to my naive scripts.

Page 11: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

PREFIX bp: <http://www.biopax.org/release/biopax-level2.owl#>

SELECT ?goname ?entity ?activationWHERE { ?mod bp:CONTROL-TYPE ?activation . ?mod bp:NAME ?goname . ?mod bp:CONTROLLED ?reaction . ?reaction bp:RIGHT ?pep . ?pep bp:PHYSICAL-ENTITY ?entity}

Example query: find abstract processes that

are parents

Page 12: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

• Started by finding the proper queries to extract interactions, names, parts of complexes...

• Want a simple tab-delimited format:

Parsing

abstract metaphaseabstract mitosiscomplex AurC/AurB/INCENPprotein H3F3Aprotein AuroraBprotein AuroraCprotein INCENPAurC/AurB/INCENP H3F3A -a>mitosis H3F3A -ap> metaphase AurC/AurB/INCENP -ap>AuroraB AurC/AurB/INCENP component>INCENP AurC/AurB/INCENP component>AuroraC AurC/AurB/INCENP component>

Entity Definitions

Entity Interactions

Page 13: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

abstract metaphaseabstract mitosiscomplex AurC/AurB/INCENPprotein H3F3Aprotein AuroraBprotein AuroraCprotein INCENPAurC/AurB/INCENP H3F3A -a>mitosis H3F3A -ap> metaphase AurC/AurB/INCENP -ap>AuroraB AurC/AurB/INCENP component>INCENP AurC/AurB/INCENP component>AuroraC AurC/AurB/INCENP component>

Page 14: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

My hopeful monster

• Makefile converted to executable script

• A bit experimental

Page 15: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

$MAPDIR/Data/Pathways

• Early, molten stage, but useful

• Human/NCIPID has NCI pathways

• Human/KEGG has early KEGG attempts

Page 16: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Pathway stats

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

0 20 40 60 80 100 120 140

050

100

150

200

250

Number of entities in pathway

Num

ber o

f int

erac

tions

in p

athw

ay

0 50 100 150

010

2030

40

050

100

150

200

250

300

01020304050

132 Pathways

3766 Unique Entities

7569 Entity instances

10182 Unique Interactions

Entity Breakdown

1742 protein

1638 complex

296 abstract

90 chemical

Interaction Breakdown

2619 -a> (activation)

278 -a| (inhibition)

874 -ap> (abstract)

103 -ap|

528 -t> (transcription)

104 -t|

5676 component>

Page 17: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Outline

1. Get pathways (ugly, 50%-95% done)

2. Convert to graphical model

3. Add evidence from patient

4. Infer the value of hidden variables(i.e. Apoptosis, Chemotaxis)

5. Solve cancer (finally)

MDM2

TP53

Apoptosis

MDM2g MDM2m MDM2p MDM2a

CNVMDM2 GEMDM2 ProMDM2

TP53g TP53m p53p p53a

CNVTP53 GETP53 ProTP53

MethylMDM2

MethylTP53

Apoptosis

Gene mRNA Protein Activity

CNV

Methylation

GeneExpression

Proteomics

TCGA Data

Pathway

DiagramPathway Perturbation Model

Output:

Page 18: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Aurora C Factor Graph

abstract metaphaseabstract mitosiscomplex AurC/AurB/INCENPprotein H3F3Aprotein AuroraBprotein AuroraCprotein INCENPAurC/AurB/INCENP H3F3A -a>mitosis H3F3A -ap> metaphase AurC/AurB/INCENP -ap>AuroraB AurC/AurB/INCENP component>INCENP AurC/AurB/INCENP component>AuroraC AurC/AurB/INCENP component>

metaphase

AurC/AurB/INCENP

mitosis

H3F3Ag H3F3Am H3F3Ap H3F3Aa

AuroraBg AuroraBm AuroraBp AuroraBa

AuroraCg AuroraCm AuroraCp AuroraCa

INCENPg INCENPm INCENPp INCENPa

Page 19: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Aurora C Evidence

AuroraB genome 0.87082AuroraB mRNA 0.37673AuroraC genome 0.170729AuroraC mRNA 0.045578INCENP genome -0.082277INCENP mRNA -0.060272H3F3A genome -0.411328

metaphase

AurC/AurB/INCENP

mitosis

H3F3Ag H3F3Am H3F3Ap H3F3Aa

AuroraBg AuroraBm AuroraBp AuroraBa

AuroraCg AuroraCm AuroraCp AuroraCa

INCENPg INCENPm INCENPp INCENPa

• Data points are signed, log p-values

• Right now, I discretize into up/down/same at 0.05 level

• Therefore, many patients look “identical” on hidden variables

Page 20: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Aurora C Inference

• using the package libDAI, which implements many approximate inference algorithms (and exact)

• Using exact at the moment

• 128 patients, 132 pathways ~ 2 hours

Page 21: An attempt to use literature curated pathway DBs › sites › default › ... · An attempt to use literature curated pathway DBs Charles Vaske Stuart Lab Meeting May 6th, 2009

Prelim. Pathway results• 2 data sets

• Glioblastoma 224 samples

• Ovarian Cancer 128 samples

• Still working out kinks in pipeline

• Not satisfied with data treatment

intriguing relationship between the hypermutator phenotype andMGMT methylation status emerged in the treated samples.Specifically,MGMTmethylation was associated with a profound shiftin the nucleotide substitution spectrum of treated glioblastomas(Fig. 4a). Among the 13 treated samples withoutMGMTmethylation,29% (29out of 99) of the validated somaticmutations occurred asGNCtoANT transitions inCpGdinucleotides (characteristic of spontaneousdeamination of methylated cytosines), and a comparable 23% (23 outof 99) of allmutations occurred asGNC toANT transitions in non-CpGdinucleotides. In contrast, in the six treated samples with MGMTmethylation, 81% of all mutations (146 out of 181) turned out to beof the GNC to ANT transition type in non-CpG dinucleotides, whereasonly 4% (8 out of 181) of all mutations were GNC to ANT transitionmutations within CpGs. That pattern is consistent with a failure torepair alkylated guanine residues caused by treatment. In other words,MGMTmethylation shifted themutation spectrumof treated samplesto a preponderance of GNC to ANT transition at non-CpG sites.

Notably, the mutational spectra in the MMR genes themselvesreflected MGMT methylation status and treatment consequences.All seven mutations in MMR genes found in sixMGMTmethylated,hypermutated (treated) tumours occurred as GNC to ANT mutationsat non-CpG sites (Fig. 4b and Supplementary Table 6), whereasneither MMR mutation in non-methylated, hypermutated tumourswas of this characteristic. Hence, these data show that MMR defi-ciency andMGMTmethylation together, in the context of treatment,exert a powerful influence on the overall frequency and pattern ofsomatic point mutations in glioblastoma tumours, an observation ofpotential clinical importance.

Integrative analyses define glioblastoma core pathways

To begin to construct an integrated view of common genetic altera-tions in the glioblastoma genome,wemapped the unequivocal genetic

alterations—validated somatic nucleotide substitutions, homozygousdeletions and focal amplifications—onto major pathways implicatedin glioblastoma1. That analysis identified a highly interconnected net-work of aberrations (Supplementary Figs 7 and 8), including threemajor pathways: RTK signalling, and the p53 and RB tumour sup-pressor pathways (Fig. 5).

By copy number data alone, 66%, 70% and 59%of the 206 samplesharboured somatic alterations in core components of the RB, TP53and RTK pathways, respectively (Supplementary Table 8). In the 91samples for which there was also sequencing data, the frequencies ofsomatic alterations increased to 87%, 78% and 88%, respectively(Supplementary Table 9). There was a statistical tendency towardsmutual exclusivity of alterations of components within each pathway(P-values of 9.33 10210, 2.53 10213 and 0.022, respectively, for thep53, RB and RTK pathways; Supplementary Table 10), consistentwith the thesis that deregulation of one component in the pathwayrelieves the selective pressure for additional ones. However, weobserved a greater than random chance (one-tailed, P5 0.0018) thata given sample harbours at least one aberrant gene from each of thethree pathways (Supplementary Table 10). In fact, 74% harbouredaberrations in all three pathways, a pattern suggesting that deregula-tion of the three pathways is a core requirement for glioblastomapathogenesis.

As well as frequent deletions and mutations of the PTEN lipidphosphatase tumour suppressor gene, 86% of the glioblastoma sam-ples harboured at least one genetic event in the core RTK/PI3K path-way (Fig. 5a). In addition to EGFR and ERBB2, PDGFRA (13%) andMET (4%) showed frequent aberrations (Supplementary Table 9). Atotal of 10 of the 91 sequenced samples have amplifications or pointmutations in at least 2 of the 4 RTKs catalogued (EGFR, ERBB2,PDGFRA and MET; Supplementary Table 9), suggesting that geno-mic activation can be a mechanism for co-activated RTKs44.

a

b

EGFR ERBB2 PDGFRA MET

RASNF1

AKT

FOXO

PTEN

MDM4

TP53

MDM2

RB1

CDK4 CDK6CCND2

RTK/RAS/PI(3)Ksignalling altered

in 88%

p53signallingalteredin 87%

RBsignallingalteredin 78%

CDKN2CCDKN2BCDKN2A(P16/INK4A)

Homozygous deletion,mutation in 52%

Homozygousdeletion in 47%

Homozygousdeletion in 2%

Amplificationin 2%

Amplificationin 1%

Amplificationin 18%

G1/S progression

Activated oncogenes

ApoptosisSenescence

CDKN2A(ARF)

Mutation, homozygousdeletion in 35%

Homozygous deletion,mutation in 49%

Amplification in 7%

Amplification in 14%

Homozygous deletion,mutation in 11%

Amplification in 2%

Mutation in 1%

Mutation in 2% Mutation in 15%

Mutation, homozygousdeletion in 36%

Amplificationin 4%

Amplificationin 13%

Mutationin 8%

Mutation, amplificationin 45%

PI(3)KMutation, homozygous

deletion in 18%

Proliferationsurvival

translation

c

Figure 5 | Frequent genetic alterations in three critical signallingpathways. a–c, Primary sequence alterations and significant copy numberchanges for components of the RTK/RAS/PI(3)K (a), p53 (b) and RB(c) signalling pathways are shown. Red indicates activating geneticalterations, with frequently altered genes showing deeper shades of red.Conversely, blue indicates inactivating alterations, with darker shades

corresponding to a higher percentage of alteration. For each alteredcomponent of a particular pathway, the nature of the alteration and thepercentage of tumours affected are indicated. Boxes contain the finalpercentages of glioblastomas with alterations in at least one knowncomponent gene of the designated pathway.

NATURE |Vol 455 |23 October 2008 ARTICLES

1065 ©2008 Macmillan Publishers Limited. All rights reserved