de novo discovery of mutated driver pathways in cancer discussion leader: matthew bernstein scribe:...
TRANSCRIPT
![Page 1: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/1.jpg)
De novo discovery of mutated driver pathways in cancer
Discussion leader: Matthew BernsteinScribe: Kun-Chieh WangComputational Network Biology BMI 826/Computer Sciences 838https://compnetbiocourse.discovery.wisc.edu
by Fabio Vandin, Eli Upfal, and Benjamin J. RaphaelGenome Research, 2012
![Page 2: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/2.jpg)
Problem overview
• Cancer is caused by a genetic mutation, or set of mutations, that leads to uncontrolled growth and division
• A driver pathway is any pathway such that a mutation in the pathway leads to cancer. A mutation in a driver pathway is called a driver mutation
• Other mutations are called passenger mutations• Problem statement:
Given: A set of cancer genomesGoal: Find the driver mutations
![Page 3: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/3.jpg)
Challenges
• Passenger mutations are difficult to discern from driver mutations
• Cancer genomes are highly heterogeneous in respect to both passenger and driver mutations – Many combinations of driver mutations may lead to cancer– Cannot test all combinations of genes
![Page 4: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/4.jpg)
Assumptions
• As is often done in computational biology, we make some assumptions to make the problem well defined:– Driver mutations tend to be rare and thus can be assumed
to be mutually exclusive, meaning that if a cancer genome has one driver mutation it does not have another
– A set of driver mutations should “explain” the global set of cancer genomes. Meaning that each cancer genome should have one driver mutation.
![Page 5: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/5.jpg)
Formulating the objective function
• With these assumptions we search for a set of mutations with high:– Coverage – most patients have at least one mutation in the
set of driver mutations– Exclusivity – a patient has only one driver mutation
Given: A set of cancer genomesGoal: Find a set of mutations with maximum coverage while maintaining exclusivity
![Page 6: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/6.jpg)
Formulating the objective function
• number of genes• number of patients• an matrix where if gene is mutated in
patient • – the set of patients for which gene is mutated
•
– the set of patients that have a mutation in some gene in the set of genes
![Page 7: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/7.jpg)
Formulating the objective function
• A set of genes is mutually exclusive if for all pairs of genes the following holds:
• An submatrix of is mutually exclusive if each row of the submatrix contains at most one value of 1
![Page 8: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/8.jpg)
Formulating the objective function
• The problem can now be restated mathematically:
• PROBLEM: driver mutations may not be measured as mutually exclusive due to experimental error. Furthermore, passenger mutations may co-occur in driver pathways.
Given: A mutation matrix and Goal: Find a mutually exclusive submatrix of size with the largest number of non-zero rows
![Page 9: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/9.jpg)
Formulating the objective function
• We must reformulate the problem. Our current formulation is too strict.
• Instead of strictly mutual exclusive mutations, we’ll attempt to find approximately exclusive mutations:– most patients have no more than 1 mutation in
• This introduces a tradeoff:Increase coverage Decrease
exclusivityWhy? We can always increase coverage by adding a new mutation to our set of driver mutations. But this mutation might be highly non-exclusive
![Page 10: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/10.jpg)
Formulating the objective function
• To make this problem mathematically well-defined, we need to formalize this tradeoff
• We measure the coverage overlap using the following equation:
• Given 2 genes (red) and (blue) we can visualize this equation as:
The area ofthe overlap is
![Page 11: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/11.jpg)
Formulating the object function
• We measure the tradeoff between coverage and exclusivity with the following measure:
Penalizes non-exclusivity.The lower the better.
Measures coverage.The higher the better.
Given: A mutation matrix and Goal: Find a submatrix of size that maximizes
![Page 12: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/12.jpg)
Formulating the objective function
![Page 13: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/13.jpg)
Maximizing the objective function
• The authors prove that solving this problem is NP-hard• Roughly, this translates to the fact that we need to try every
combination of genes to find the one that maximizes• Thus, we require either an algorithm for finding an
approximate solution, or a heuristic
![Page 14: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/14.jpg)
The Greedy Approach
• Greedily add mutations to the current set of driver mutations as long as the objective function increases until genes are added:1. 2. for :
1. 2.
3. return
![Page 15: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/15.jpg)
Results-Greedy approach
• Even with this very naïve approach, we can make interesting guarantees on its accuracy under the gene independence model– Gene mutations are independent– Driver genes have high coverage– Each driver mutation contributes to the value
• Can prove that under this model, we would need 2,400 patients to maximize the objective function with probability 1-(1x10-4)
– This number of patients is not currently available
![Page 16: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/16.jpg)
Better idea: MCMC
• Markov Chain Monte Carlo (MCMC) is a method for sampling from a complicated joint probability distribution
• Problem:
• Solution: Form a Markov chain such that its stationary distribution is the distribution of interest
Given: A joint distribution Goal: Generate a sample
![Page 17: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/17.jpg)
Quick review: Markov chains
• A Markov chain is a basic model for modeling a stochastic process. It consists of a set of states and probabilities for transitioning from state to state
• Example:
• The stationary distribution is the probability of being in each state if we let the random process traverse from state to state for an infinite amount of time
![Page 18: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/18.jpg)
The MCMC Approach
• Sample from sets of genes in proportion to • We do so by forming a Markov chain such that each state in
the Markov chain is a associated with a set of genes • Stochastically transition from state to state. The most
frequently visited state is most likely have the highest
![Page 19: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/19.jpg)
The MCMC Approach
• More specifically, given current state we obtain as follows:1. Choose a gene uniformly at random from the global
set of genes 2. Choose gene uniformly at random from3. Let 4. With probability set
otherwise
![Page 20: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/20.jpg)
The MCMC Approach
• With this definition of the transition matrix, the stationary distribution is
• The authors prove that this Markov chain approaches its stationary distribution quickly
![Page 21: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/21.jpg)
Results – Simulated data
• Generated 2 simulated datasets– A dataset starting from a set of 6 genes– A dataset consisting of 2 driver pathways and
• Control coverage and exclusivity• Simulate passenger mutations using observed characteristics
in Glioblastoma data • Simulated both single-nucleotide mutations as well as copy-
number abberations (CNAs)• Ran the MCMC algorithm for 107 iterations and sampled every
104 iterations on each dataset
![Page 22: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/22.jpg)
Results – Simulated data
![Page 23: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/23.jpg)
Results – Simulated data
![Page 24: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/24.jpg)
Results – real data
• Built matrices from various cancer genome studies• Searched for sets of size• Once a statistically significant set of mutations was found,
they remove them from the matrix and re-run the algorithm to find new sets
• Performed a statistical test. The test statistic was and the null model was obtained by independently permuting the mutations for each mutation group among the patients– This preserved the mutation frequency– The reason for doing this is to assess the significance of the
coverage and exclusivity given a fixed mutation frequency
![Page 25: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/25.jpg)
Results – multiple cancer types
![Page 26: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/26.jpg)
Results – Lung adenocarcinoma
![Page 27: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/27.jpg)
Results – Glioblastoma multiforme
![Page 28: De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer](https://reader034.vdocuments.net/reader034/viewer/2022051821/5697bfef1a28abf838cba0f4/html5/thumbnails/28.jpg)
Discussion
• Is there an underlying network model?• In contrast to nearly every other method that we have discussed
in this class, this method does not utilize a biological network such as a protein-protein or protein-DNA interaction network– Can we incorporate such a network into this method?
• Are coverage and exclusivity the best metrics for finding driver mutations?
• Does their objective function correctly capture coverage and exclusivity?
• What other methods could they have tried in order to solve their combinatorial optimization problem?
• How can this method be validated with biological experiments?