association rule mining on rnaseq data trends in ... · association rule mining on rnaseq data 2....

50
Association Rule Mining on RNAseq Data Trends in Bioinformatics Lukas Ehrig Jan 23 th , 2019

Upload: others

Post on 22-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq DataTrends in Bioinformatics

Lukas Ehrig

Jan 23th, 2019

Page 2: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

Agenda

1. Motivation2. Method

2.1. Filtering Association Rules2.2. Interestingness Measures2.3. Weighting of Interestingness Measures

3. Experiments4. Conclusion

Chart 2

Page 3: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

1. Motivation

Image: [2]

Chart 3

Page 4: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule MiningDiscovering significant relations between attributes (“items”) in large datasets.

Association Rules

Gene1↑ → Gene2↓

Illness → Gene1↑, Gene2↓Gene1↑, Gene2↓ → Illness

Association Rule Mining on RNAseq Data

1. Motivation

Chart 4

Page 5: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

1. Motivation

Chart 5

Number of features infeasible Exponential number of possible frequent item sets→ Runtime complexity

Huge number of possible / output rules n frequent items yield 2n − 2 association rules→ which rules are relevant/interesting?

⇒ Select relevant subset of rules

Page 6: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method

Chart 6

Page 7: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method

Chart 7

Page 8: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method - Filtering of Association Rules

Filter association rules that can be removed for formal reasons or because of what the researcher wants to find out or use the results for.

Chart 8

Examples

Item-FilterFilter association rules containing wanted/unwanted items, e.g. Gene1― → Gene2―.

Redundancy-FilterRemove rules that can be deduced from other rules.

If using multiple filters in a row, the order might matter!

Page 9: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method - Min-Max Filter

Min-Max Filter

Idea The antecedent of a rule should be as small, the consequent as large as possible.

Min-Max1: ensure, that or hold for rule .

Min-Max2: Given two rules, and , remove , if and . If , we do not lose information!

Example

r0: X,Y,Z → Cr1: X,Y → C Chart 9

Page 10: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

(Strong) Monotonicity

If a rule X → Y holds on a statistically large subset of a dataset D, then X → ¬Y does not hold on any statistically large subset of D.

Example

G2↓ → G4↑G2↓, G3↑ → G4↓

Association Rule Mining on RNAseq Data

2. Method - Monotonicity

Chart 10

Page 11: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method - Monotonicity Filter

Graph creation Rule Filtering

Chart 11

Page 12: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Data: TCGA-LAML_TCGA-GBM

Page 13: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed
Page 14: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed
Page 15: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed
Page 16: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed
Page 17: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed
Page 18: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method

Chart 18

Page 19: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method

Chart 19

Page 20: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Interestingness Measures

Association Rule Mining on RNAseq Data

2. Method - Interestingness Measures

Interestingness Measure

Objective Measure Subjective Measure

Chart 20

Page 21: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Interestingness Measures

ObjectiveDepend only on the data and patterns, no knowledge of user or application required. Mostly based on theories in probability, statistics or information theory.

Examples

SupportConfidenceLift

Association Rule Mining on RNAseq Data

2. Method - Objective Interestingness

Chart 21

Page 22: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Interestingness Measures

SubjectiveTake into account both the data and the user of these data.Definition requires domain knowledge (or background knowledge about the data) and its explicit representation.

Examples

NoveltyUnexpectednessActionability

Association Rule Mining on RNAseq Data

2. Method - Subjective Interestingness

Chart 22

Page 23: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

External domain knowledge

Genes associated with medical condition:http://www.disgenet.org/https://www.targetvalidation.org/

Co-Expression of genes:http://coxpresdb.jp/http://www.genefriends.org

Association Rule Mining on RNAseq Data

2. Method - Subjective Interestingness

Figure: Disgenet searchChart 23

Page 24: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

First idea:

Jaccard-Index:

Jaccard distance:

Association Rule Mining on RNAseq Data

2. Method - Subjective Interestingness

Chart 24

Intuition:

Index measures “overlap” / similarity of sets. If there are lots of genes associated with a medical condition (B), or a rule (A) is very long, an overlap is more likely, but the denominator also increases.

Page 25: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

First idea:

For a rule X → Y and a set B of genes associated with a medical condition: measures how much a rule corresponds to the domain knowledge.

measures the novelty of an association rule.

Association Rule Mining on RNAseq Data

2. Method - Subjective Interestingness

Chart 25

Page 26: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

First idea:

For a rule X → Y and a set B of genes associated with a medical condition: measures how much a rule corresponds to the domain knowledge.

measures the novelty of an association rule.

Problems:

Scores for different medical conditions hardly comparableIf B is small, is almost always 0!

Association Rule Mining on RNAseq Data

2. Method - Subjective Interestingness

Chart 26

Page 27: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Next idea:

Define a subjective interestingness measure based on gene co-expression.

E.g. how strongly do the genes in the antecedent of the association rulecorrelate with with the genes in the consequent of the association rule?

Association Rule Mining on RNAseq Data

2. Method - Subjective Interestingness

Chart 27

planned

Page 28: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method

Chart 28

Page 29: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

2. Method

Chart 29

Page 30: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Given interestingness measures in a vector and an -dimensional weight vector , a combined interesting score can be computed by:

Scalar product

Geometric Method

Association Rule Mining on RNAseq Data

2. Method - Weighting

Setting weights

Value range of e.g. support , confidence

Effect of weights in geometric method

increases weight of , but decreases weight of

Chart 30

Page 31: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Setup

DataRNAseq data from The Cancer Genome Atlas (TCGA)DSI: TCGA-BRCA_TCGA-PAAD (307 samples; selected 2.7k genes by variance)DSII: TCGA-LAML_TCGA-GBM (1,28k samples; selected 2.5k genes by variance)

All data sets were discretized by threshold.

Association Rule Miner

Weka 3.9 Apriori Associator

Association Rule Mining on RNAseq Data

3. Experiments

Chart 31

Page 32: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Experiment #1Number of filtered rules

1.1 Min-Max1 filter

Association Rule Mining on RNAseq Data

3. Experiments

Chart 32

Page 33: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Experiment #1Number of filtered rules

1.2 Min-Max2 filter 1.3 Monotonicity filter

Association Rule Mining on RNAseq Data

3. Experiments

Chart 33

Page 34: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

3. Experiments

Chart 34

Experiment #2 (planned)Compute average change in rank for association rules after weighting with different subjective interestingness measures.

Experiment #3 (planned)Evaluate new subjective novelty measure(s) by holding back domain knowledge and see, whether it is discovered.

Experiment #4 (planned)Retrieve gene network from additional external domain knowledge source and compare to (weighted) association rules / generated graph.

...

Page 35: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Filtering techniques are powerful to reduce number of association rules.

Assessing the biological relevance of rules by using subjective interestingness measures proved to be challenging at first try.

Discussion

How to improve the first subjective interestingness measure based on the Jaccard index?

What other subjective interestingness measures are conceivable?

Other ideas regarding experiments and their evaluation?

Association Rule Mining on RNAseq Data

4. Conclusion

Chart 35

Page 36: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq DataTrends in Bioinformatics

Lukas Ehrig

Jan 23th, 2019

Page 37: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Title/Final Slide:Boehm Konstruktion http://www.boehm-konstruktion.com/referenzen/hasso-plattner-high-tech-park/

[2] Central Dogma of Molecular Biology:Khan Academyhttps://www.khanacademy.org/science/high-school-biology/hs-molecular-genetics/hs-rna-and-protein-synthesis/a/intro-to-gene-expression-central-dogma

The BPMN model of the method was made using Signavio Academic Initiative:https://academic.signavio.com/

Association Rule Mining on RNAseq Data

Appendix: Image Sources

Page 38: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Association Rule Mining on RNAseq Data

Appendix: Gene Expression Data

Chart 38

Page 39: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 39

Gene

Association Rule Mining on RNAseq Data

Appendix: Gene Expression Data

Page 40: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 40

Sample

Association Rule Mining on RNAseq Data

Appendix: Gene Expression Data

Page 41: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 41

Discretize

Association Rule Mining on RNAseq Data

Appendix: Gene Expression Data

Page 42: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 42

Association Rule Mining on RNAseq Data

Appendix: Gene Expression Data

Page 43: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 43

{Headset,Laptop}

Association Rule Mining on RNAseq Data

Appendix: Association Rule Mining

Page 44: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 44

A: Laptop ⇒ Headsetsup(A) = 4conf(A) = 0.8B: Headset ⇒ Laptopsup(B) = 4 _conf(B) = 0.6

Association Rule Mining on RNAseq Data

Appendix: Association Rule Mining

Page 45: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 45

Association Rule Mining on RNAseq Data

Appendix: Association Rule Mining

Page 46: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 46

Transaction

Item: ENSG0..03=-1

Transaction ID

Association Rule Mining on RNAseq Data

Appendix: Association Rule Mining

Page 47: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Redundant Association Rules

An association rule is redundant if its structure and statistical measures can be deduced from another rule.

Different notions of redundancy. Based on Galois-closure according to Zaki (for approximate association rules):

Association Rule Mining on RNAseq Data

Appendix: Redundancy Filter

[Mohammed J. Zaki. 2004. Mining Non-Redundant Association Rules. Data Min. Knowl. Discov. 9, 3 (Nov. 2004), 223-248.]

Page 48: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 48

Association Rule Mining on RNAseq Data

Appendix: Non-redundant Association Rules

[Mohammed J. Zaki. 2004. Mining Non-Redundant Association Rules. Data Min. Knowl. Discov. 9, 3 (Nov. 2004), 223-248.]

Page 49: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 49

Association Rule Mining on RNAseq Data

Appendix: Non-redundant Association Rules

[Mohammed J. Zaki. 2004. Mining Non-Redundant Association Rules. Data Min. Knowl. Discov. 9, 3 (Nov. 2004), 223-248.]

Page 50: Association Rule Mining on RNAseq Data Trends in ... · Association Rule Mining on RNAseq Data 2. Method - Filtering of Association Rules Filter association rules that can be removed

Chart 50

Association Rule Mining on RNAseq Data

Appendix