finding consistent subnetworks across microarray dataset

21
FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASET Fan Qi GS5002 Journal Club

Upload: ada

Post on 24-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Finding Consistent Subnetworks across Microarray dataset. Fan Qi GS5002 Journal Club. Outline. Introduction Methodology Results & Discussions Conclusions. Introduction. Identify Differential Gene Expression Identify significant genes w.r.t a phenotype Importance: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Finding Consistent Subnetworks across Microarray dataset

FINDING CONSISTENT SUBNETWORKS ACROSS MICROARRAY DATASETFan QiGS5002 Journal Club

Page 2: Finding Consistent Subnetworks across Microarray dataset

2

OUTLINE Introduction

Methodology

Results & Discussions

Conclusions

Page 3: Finding Consistent Subnetworks across Microarray dataset

3

INTRODUCTION Identify Differential Gene Expression

Identify significant genes w.r.t a phenotype

Importance: Testing effectiveness of treatment Biological insights of diseases Develop new treatment Disease Prophylaxis Any others ?

Page 4: Finding Consistent Subnetworks across Microarray dataset

4

CURRENT METHODS Individual Genes

Search for individual differentially expressed genes

Fold-change, t-test, SAM

Gene Pathway Detection Looking at a set of genes instead of individual

genes Bayesian learning and Boolean network learning

Gene Classes Adding existing biological insights Over-representation analysis (ORA), Functional

Class Scoring(FCS), GSEA, NEA, ErmineJ

Page 5: Finding Consistent Subnetworks across Microarray dataset

5

CHALLENGE Different Results from Different Dataset of

the SAME disease!

Zhang M [1] demonstrated inconsistency in SAM:Datasets DEGs POG nPOG

Prostate cancer

Top 10 0.3 0.3Top 50 0.14 0.14TOP 100 0.15 0.15

Lung cancerTop 10 0.00 0.00Top 50 0.20 0.19TOP 100 0.31 0.30

DMDTop 10 0.20 0.20Top 50 0.42 0.42TOP 100 0.54 0.54

Reconstruct from Table 1 in [1]

Inconsistencyamong datasets

Page 6: Finding Consistent Subnetworks across Microarray dataset

6

NEW APPROACH SNet [2]

Proposed in 2011 Utilize gene-gene relationship in analysis

Gene-gene relationship Activates VS. Inhibits

Gene Subnetwork Gene is the Vertex, Relationship is an edge

From Fig 1 in [2]

RHOA VAV PIK3R2

ARHGEF1 RAC1 IQGAP

1 Partially adapted from Fig 2 in [2]

Page 7: Finding Consistent Subnetworks across Microarray dataset

7

METHODOLOGY Input:

Genes labeled with phenotype Gain from microarray experiment

Third-party Info: Gene Pathway Info Gene Reaction Info

Attributes of Subnetwork Size, Score

Output: A set of significant sub-network

Subnetwork

Extraction

Subnetwork

Scoring

Subnetwork

Significance

Page 8: Finding Consistent Subnetworks across Microarray dataset

8

METHODOLOGY –STEP 1

P3 P2P1

Phenotypes

……..

Patient’s Gene Ranked List

Page 9: Finding Consistent Subnetworks across Microarray dataset

9

METHODOLOGY –STEP 1

P1 P1

Only top genes is kept

for patient

Repeat for every phenotype group

Page 10: Finding Consistent Subnetworks across Microarray dataset

10

METHODOLOGY –STEP 1

P1 (d)

Select one phenotype as others as

select genes occur in of patients

𝛽=50

𝐺𝐿

P1 P1 P1 P1

…….

Page 11: Finding Consistent Subnetworks across Microarray dataset

11

METHODOLOGY –STEP 1

Partition into multiple pathwaysGenerate Subnetwork

𝐺𝐿

………

𝑎1

𝑎5𝑎3

𝑎4 𝑎7

𝑎6𝑎2

𝑎1

𝑎5𝑎3

𝑎4 𝑎7

𝑎6𝑎2

A list of Subnetworks w.r.t

Page 12: Finding Consistent Subnetworks across Microarray dataset

12

METHODOLOGY – STEP 2 For each Subnetwork in in the and Patient ,

compute overall expression level: = , where a gene in that is highly expressed in # patients in who have highly expressed : total # patients in

For Patients and compute t-test

𝑆 𝑠𝑝𝑠𝑝 ,𝑑=¿𝑆𝑁𝑒𝑡𝑠𝑝 ,1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛>¿

𝑆 𝑠𝑝𝑠𝑝 ,¬𝑑=¿𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑛+1 ,𝑆𝑁𝑒𝑡𝑠𝑝 ,𝑛+2…𝑆𝑁𝑒𝑡 𝑠𝑝 ,𝑚>¿𝑆𝑆𝑝 𝑠𝑝 , 𝑡

T test

Assign to each Subnetwork

𝑎1

𝑎5𝑎3

𝑎4 𝑎7

𝑎6𝑎2

P1 (d)

Page 13: Finding Consistent Subnetworks across Microarray dataset

13

METHODOLOGY – STEP 3A. Randomly Swap Phenotype labels of

patient, recreating subnetworks and t-test scores (step 1-2)

B. Repeat [A] for 1,000 permutations.• Forms a 2-D histogram ()

C. Estimate the nominal p-value of each Subnetwork

D. Select Subnetwork with -Null-hypo: subnetwork with is not significant

Fig 5 in original paper

Page 14: Finding Consistent Subnetworks across Microarray dataset

14

RESULTS AND DISCUSSIONS Dataset:

Leukemia: Golub VS Armstrong ALL: Ross VS Yeoh DMD: Haslett VS Pescatori Lung: Bhattacharjee VS Garber

Performance Comparison: Subnetwork Overlap (with GSEA) Gene Overlap (GSEA, SAM, t-Test)

Other Comparisons: Network Size, Gene Validity with t-Test

Page 15: Finding Consistent Subnetworks across Microarray dataset

15

RESULTS AND DISCUSSIONS Subnetwork Overlap

Disease Dataset 1 Dataset 2 SNET GSEA SNET

GSEA

Leukemia Golub Armstrong

83.33% 0% 20 0

ALL Ross Yeoh 47.63% 23.1% 10 6DMD Haslett Pescatori 58.33% 55.6% 7 10Lung Bhattacharj

eeGarber 90.90% 0% 9 0

Synthesized from Table 1, 2 from [2]Higher the better

Page 16: Finding Consistent Subnetworks across Microarray dataset

16

RESULTS AND DISCUSSIONS Gene Overlap

Disease Snet GSEA T-Test (p <0.05)

T-Test(top)

SAM(p <0.05)

SAM(top)

Leukemia 91.30% 2.38% 73.01% 14.29% 49.96% 22.62%

ALL 93.01% 4.0% 60.20% 57.33% 81.25% 49.33%

DMD 69.23% 28.9% 49.60% 20.00% 76.98% 42.22%

Lung 51.18% 4.0% 65.61% 26.16% 65.61% 24.62%

Synthesized from Table 3, 4,5 from [2]Higher the better

Page 17: Finding Consistent Subnetworks across Microarray dataset

17

RESULTS AND DISCUSSIONS Size of subnetworks

Disease T-Test SNetSize of Network 2 3 4 5 5 6 7 >8

Leukemia 84 8 1 0 0 2 3 2 1

Subtype 75 5 1 1 1 1 0 1 6

DMD 45 3 1 0 0 1 0 0 5

Lung 65 3 2 1 0 5 3 0 1

Reconstructed from Table 6 from [2]

Page 18: Finding Consistent Subnetworks across Microarray dataset

18

RESULTS AND DISCUSSIONS Validity

Compare the genes in EACH Subnetwork with those in t-test

Genes in each Subnetwork appears in T-Test is around 70%- 100%

Selected Results (too large to present full) Subnetwork Name Percentage Subnetwork Name PercentageLeukaemia_B Cell-VAV1 81.82% SNET_CTNNB1 100%

Leukaemia_UBC 100% SNET_TNFSF10 60%

Leukaemia_RAC1 57.15% SNET_PYGM 60%

DMD_RHOA 75% DMD_ACTB 83.33%

DMD_SDC3 88.89% Leaukaemia_POU2F2 75.00%

MLLBCR_ACAA1 28.67% BCR_T_RASA1 44.44%

MLLBCR_BLNK 72.73% BCR_ABL1 75.00%

SNET_NOTCH3 100% DMD_CALM1 80%

Selected from Table 7,8,9,10 in[2]

Page 19: Finding Consistent Subnetworks across Microarray dataset

19

CONCLUSIONS Traditional Methods have inconsistency

problem across different dataset of the same disease

SNet utilize Biological insights to mitigate the gap Gene-to-Gene relationship Gene Pathway knowledge

SNet shows better results than established algorithms More consistent

Page 20: Finding Consistent Subnetworks across Microarray dataset

20

REFERENCES [1] Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D,

Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.

[2] Donny Soh, Difeng Dong1, Yike Guo, Limsoon Wong Finding consistent disease subnetworks across microarray datasets

Page 21: Finding Consistent Subnetworks across Microarray dataset

21

THANK YOU!!