Transcript
Page 1: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Bioinformatics and Graphical Models:

Computation, approximation, and their value

MSR:Nebojsa Jojic, Vladimir

Jojic, Chris Meek, David Heckerman

UW:Jim Mullins, Mark

Jensen, Jerry Learn

Page 2: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Overview• Computational cost of usual algorithms

– State of the art– Phylogeny + alignment– Phylogeny + sequence modeling– Approximations and their pitfalls

• Recombination– Analogy to other ML domains– Graphical model– Experiments and computational cost

• Value of the computation– Potential applications– Drug discovery cycle– Value of time and clinical success– Market size and growth

• Discussion

Page 3: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Rational vaccine design(Jim Mullins et al)

• Rational design– Analysis of sequences to form a model of

virus evolution (phylogenies, etc.)– Develop vaccines that target as much

variability as possible

• Traditional design– Trial and error– Educated guesses

Page 4: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

State of the art sequence analysis programs

• Example: – Rational AIDS vaccine design– Analysis of the envelope gene from a single patient in one visit– 200 sequences with 600 base pairs each– Overnight to align– 1-2 hours to 2-3 days to build a tree, depending on how much

search you are willing to do– This does not include modeling the inter-sequence

dependencies, coupling alignment and tree search, and it ignores recombination

• The total length of the HIV genome is 10000 and the number of samples is practically only limited by cost

Page 5: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Computational cost of a slightly more detailed analysis

• Metropolis search over all trees on 400 sequences of the full genome (10k) would last around 2 years on one machine

• Exact search intractable!

Page 6: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Approximation

• Free energy as a bound on negative log-likelihood

• Computation and approximation of the free energy:– Iterative conditional modes– Mean-field method– Structured variational techniques– (Loopy) belief propagation– Sampling techniques

• How tight is the bound?• What does the looseness translate to?

Page 7: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

An example of the approximation issues

Page 8: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

An example of the approximation issues

Page 9: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

An example of the approximation issues:Tightness of the bounds

Variational technique Exact EM algorithm

Page 10: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Recombination

• In HIV, the rate of recombination has recently been estimated to be ¼ of the rate of mutation!

• Combinatorial explosion in inference

Page 11: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Similar situations in other domains where graphical models work well

• Occlusion in video

• Source interaction in audio

• Composition of images

Page 12: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

“Occlusion” in audio

Speaker1 Speaker2

M 1-M* *

+

||

Retrieved Speaker1

Retrieved Speaker2

Page 13: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Epitome of an image

Input image

A set of image patches

Epitome

Page 14: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Layers from a single photograph

em

es

S1 s2 M

x

Page 15: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Modeling alignment and recombination by learning a library of gene patterns

sji-1 sj

i sji+1

xji-1 xj

i xji+1

r1={ACTGTCAGT}r2={ACGATC}

copy pattern 1, position 2 (letter C); insertion mutation

s1={(1,1), (1,2), (1,3), (1,3), (1,3), (1,3), (1,4),(1,5),(1,6),(2,1),(2,2),(2,3)}c1 ={ 1 1 1 0 0 0 1 1 1 1 1 1 }x1={ A C T C A T G T A A C G }

s2 ={(2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,4), (1,5), (1,6) }c2 ={ 1 1 1 1 1 1 1 1 1 }x2 ={ A C G A T C G T C }

cji-1 cj

i cji+1

s - pattern positionc = 1 : copy letter

(with possiblemutation)

c = 0 : draw letterfrom a distributionunrelated to the

patterns

Conditionals:

p(xji|s

ji=(1,2),c=1)=f(xj

i,r1(2))=f(xji,C)

p(xji|s,c=0)=g(xj

i)

Example:

Patterns:

Observations and a hidden variable assignment:

Page 16: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Experimental results

Page 17: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Value of computation(from Tufts Center)

Page 18: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

Growth

• Human viruses– West Nile– SARS– Hepatitis C– Polio– …

• Animal viruses– FIV – Pig, chicken and cow viruses

• Most bacterial diseases• Parasitic diseases• The first sign of success of rational design might trigger

great increase in the number of diseases tackled

Page 19: Bioinformatics and Graphical Models: Computation, approximation, and their value MSR: Nebojsa Jojic, Vladimir Jojic, Chris Meek, David Heckerman UW: Jim

How can MS/MSR be involved?

• MS: Architecture, platform, tools– Storage, transmission, computation– E.g., parallelizable computation on a single machine;

pear-to-pear networks for parallel computation on multiple machines

• MSR:– Helping to speed up the scientific progress leading to

the new opportunities for growth– Advising MS on the research direction in the

community and the future requirements for the platform


Top Related