de novo discovery of mutated driver pathways in cancer discussion leader: matthew bernstein scribe:...

28
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer Sciences 838 https :// compnetbiocourse.discovery.wisc.edu by Fabio Vandin, Eli Upfal, and Benjamin J. Raphael Genome Research, 2012

Upload: bernard-jacobs

Post on 29-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

De novo discovery of mutated driver pathways in cancer

Discussion leader: Matthew BernsteinScribe: Kun-Chieh WangComputational Network Biology BMI 826/Computer Sciences 838https://compnetbiocourse.discovery.wisc.edu

by Fabio Vandin, Eli Upfal, and Benjamin J. RaphaelGenome Research, 2012

Page 2: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Problem overview

• Cancer is caused by a genetic mutation, or set of mutations, that leads to uncontrolled growth and division

• A driver pathway is any pathway such that a mutation in the pathway leads to cancer. A mutation in a driver pathway is called a driver mutation

• Other mutations are called passenger mutations• Problem statement:

Given: A set of cancer genomesGoal: Find the driver mutations

Page 3: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Challenges

• Passenger mutations are difficult to discern from driver mutations

• Cancer genomes are highly heterogeneous in respect to both passenger and driver mutations – Many combinations of driver mutations may lead to cancer– Cannot test all combinations of genes

Page 4: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Assumptions

• As is often done in computational biology, we make some assumptions to make the problem well defined:– Driver mutations tend to be rare and thus can be assumed

to be mutually exclusive, meaning that if a cancer genome has one driver mutation it does not have another

– A set of driver mutations should “explain” the global set of cancer genomes. Meaning that each cancer genome should have one driver mutation.

Page 5: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

• With these assumptions we search for a set of mutations with high:– Coverage – most patients have at least one mutation in the

set of driver mutations– Exclusivity – a patient has only one driver mutation

Given: A set of cancer genomesGoal: Find a set of mutations with maximum coverage while maintaining exclusivity

Page 6: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

• number of genes• number of patients• an matrix where if gene is mutated in

patient • – the set of patients for which gene is mutated

– the set of patients that have a mutation in some gene in the set of genes

Page 7: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

• A set of genes is mutually exclusive if for all pairs of genes the following holds:

• An submatrix of is mutually exclusive if each row of the submatrix contains at most one value of 1

Page 8: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

• The problem can now be restated mathematically:

• PROBLEM: driver mutations may not be measured as mutually exclusive due to experimental error. Furthermore, passenger mutations may co-occur in driver pathways.

Given: A mutation matrix and Goal: Find a mutually exclusive submatrix of size with the largest number of non-zero rows

Page 9: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

• We must reformulate the problem. Our current formulation is too strict.

• Instead of strictly mutual exclusive mutations, we’ll attempt to find approximately exclusive mutations:– most patients have no more than 1 mutation in

• This introduces a tradeoff:Increase coverage Decrease

exclusivityWhy? We can always increase coverage by adding a new mutation to our set of driver mutations. But this mutation might be highly non-exclusive

Page 10: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

• To make this problem mathematically well-defined, we need to formalize this tradeoff

• We measure the coverage overlap using the following equation:

• Given 2 genes (red) and (blue) we can visualize this equation as:

The area ofthe overlap is

Page 11: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the object function

• We measure the tradeoff between coverage and exclusivity with the following measure:

Penalizes non-exclusivity.The lower the better.

Measures coverage.The higher the better.

Given: A mutation matrix and Goal: Find a submatrix of size that maximizes

Page 12: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Formulating the objective function

Page 13: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Maximizing the objective function

• The authors prove that solving this problem is NP-hard• Roughly, this translates to the fact that we need to try every

combination of genes to find the one that maximizes• Thus, we require either an algorithm for finding an

approximate solution, or a heuristic

Page 14: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

The Greedy Approach

• Greedily add mutations to the current set of driver mutations as long as the objective function increases until genes are added:1. 2. for :

1. 2.

3. return

Page 15: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results-Greedy approach

• Even with this very naïve approach, we can make interesting guarantees on its accuracy under the gene independence model– Gene mutations are independent– Driver genes have high coverage– Each driver mutation contributes to the value

• Can prove that under this model, we would need 2,400 patients to maximize the objective function with probability 1-(1x10-4)

– This number of patients is not currently available

Page 16: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Better idea: MCMC

• Markov Chain Monte Carlo (MCMC) is a method for sampling from a complicated joint probability distribution

• Problem:

• Solution: Form a Markov chain such that its stationary distribution is the distribution of interest

Given: A joint distribution Goal: Generate a sample

Page 17: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Quick review: Markov chains

• A Markov chain is a basic model for modeling a stochastic process. It consists of a set of states and probabilities for transitioning from state to state

• Example:

• The stationary distribution is the probability of being in each state if we let the random process traverse from state to state for an infinite amount of time

Page 18: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

The MCMC Approach

• Sample from sets of genes in proportion to • We do so by forming a Markov chain such that each state in

the Markov chain is a associated with a set of genes • Stochastically transition from state to state. The most

frequently visited state is most likely have the highest

Page 19: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

The MCMC Approach

• More specifically, given current state we obtain as follows:1. Choose a gene uniformly at random from the global

set of genes 2. Choose gene uniformly at random from3. Let 4. With probability set

otherwise

Page 20: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

The MCMC Approach

• With this definition of the transition matrix, the stationary distribution is

• The authors prove that this Markov chain approaches its stationary distribution quickly

Page 21: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – Simulated data

• Generated 2 simulated datasets– A dataset starting from a set of 6 genes– A dataset consisting of 2 driver pathways and

• Control coverage and exclusivity• Simulate passenger mutations using observed characteristics

in Glioblastoma data • Simulated both single-nucleotide mutations as well as copy-

number abberations (CNAs)• Ran the MCMC algorithm for 107 iterations and sampled every

104 iterations on each dataset

Page 22: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – Simulated data

Page 23: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – Simulated data

Page 24: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – real data

• Built matrices from various cancer genome studies• Searched for sets of size• Once a statistically significant set of mutations was found,

they remove them from the matrix and re-run the algorithm to find new sets

• Performed a statistical test. The test statistic was and the null model was obtained by independently permuting the mutations for each mutation group among the patients– This preserved the mutation frequency– The reason for doing this is to assess the significance of the

coverage and exclusivity given a fixed mutation frequency

Page 25: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – multiple cancer types

Page 26: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – Lung adenocarcinoma

Page 27: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Results – Glioblastoma multiforme

Page 28: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer

Discussion

• Is there an underlying network model?• In contrast to nearly every other method that we have discussed

in this class, this method does not utilize a biological network such as a protein-protein or protein-DNA interaction network– Can we incorporate such a network into this method?

• Are coverage and exclusivity the best metrics for finding driver mutations?

• Does their objective function correctly capture coverage and exclusivity?

• What other methods could they have tried in order to solve their combinatorial optimization problem?

• How can this method be validated with biological experiments?