bioinformatics and graphical models: computation, approximation, and their value msr: nebojsa jojic,...
Post on 15-Jan-2016
219 Views
Preview:
TRANSCRIPT
Bioinformatics and Graphical Models:
Computation, approximation, and their value
MSR:Nebojsa Jojic, Vladimir
Jojic, Chris Meek, David Heckerman
UW:Jim Mullins, Mark
Jensen, Jerry Learn
Overview• Computational cost of usual algorithms
– State of the art– Phylogeny + alignment– Phylogeny + sequence modeling– Approximations and their pitfalls
• Recombination– Analogy to other ML domains– Graphical model– Experiments and computational cost
• Value of the computation– Potential applications– Drug discovery cycle– Value of time and clinical success– Market size and growth
• Discussion
Rational vaccine design(Jim Mullins et al)
• Rational design– Analysis of sequences to form a model of
virus evolution (phylogenies, etc.)– Develop vaccines that target as much
variability as possible
• Traditional design– Trial and error– Educated guesses
State of the art sequence analysis programs
• Example: – Rational AIDS vaccine design– Analysis of the envelope gene from a single patient in one visit– 200 sequences with 600 base pairs each– Overnight to align– 1-2 hours to 2-3 days to build a tree, depending on how much
search you are willing to do– This does not include modeling the inter-sequence
dependencies, coupling alignment and tree search, and it ignores recombination
• The total length of the HIV genome is 10000 and the number of samples is practically only limited by cost
Computational cost of a slightly more detailed analysis
• Metropolis search over all trees on 400 sequences of the full genome (10k) would last around 2 years on one machine
• Exact search intractable!
Approximation
• Free energy as a bound on negative log-likelihood
• Computation and approximation of the free energy:– Iterative conditional modes– Mean-field method– Structured variational techniques– (Loopy) belief propagation– Sampling techniques
• How tight is the bound?• What does the looseness translate to?
An example of the approximation issues
An example of the approximation issues
An example of the approximation issues:Tightness of the bounds
Variational technique Exact EM algorithm
Recombination
• In HIV, the rate of recombination has recently been estimated to be ¼ of the rate of mutation!
• Combinatorial explosion in inference
Similar situations in other domains where graphical models work well
• Occlusion in video
• Source interaction in audio
• Composition of images
“Occlusion” in audio
Speaker1 Speaker2
M 1-M* *
+
||
Retrieved Speaker1
Retrieved Speaker2
Epitome of an image
Input image
A set of image patches
Epitome
Layers from a single photograph
em
es
S1 s2 M
x
Modeling alignment and recombination by learning a library of gene patterns
sji-1 sj
i sji+1
xji-1 xj
i xji+1
r1={ACTGTCAGT}r2={ACGATC}
copy pattern 1, position 2 (letter C); insertion mutation
s1={(1,1), (1,2), (1,3), (1,3), (1,3), (1,3), (1,4),(1,5),(1,6),(2,1),(2,2),(2,3)}c1 ={ 1 1 1 0 0 0 1 1 1 1 1 1 }x1={ A C T C A T G T A A C G }
s2 ={(2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (1,4), (1,5), (1,6) }c2 ={ 1 1 1 1 1 1 1 1 1 }x2 ={ A C G A T C G T C }
cji-1 cj
i cji+1
s - pattern positionc = 1 : copy letter
(with possiblemutation)
c = 0 : draw letterfrom a distributionunrelated to the
patterns
Conditionals:
p(xji|s
ji=(1,2),c=1)=f(xj
i,r1(2))=f(xji,C)
p(xji|s,c=0)=g(xj
i)
Example:
Patterns:
Observations and a hidden variable assignment:
Experimental results
Value of computation(from Tufts Center)
Growth
• Human viruses– West Nile– SARS– Hepatitis C– Polio– …
• Animal viruses– FIV – Pig, chicken and cow viruses
• Most bacterial diseases• Parasitic diseases• The first sign of success of rational design might trigger
great increase in the number of diseases tackled
How can MS/MSR be involved?
• MS: Architecture, platform, tools– Storage, transmission, computation– E.g., parallelizable computation on a single machine;
pear-to-pear networks for parallel computation on multiple machines
• MSR:– Helping to speed up the scientific progress leading to
the new opportunities for growth– Advising MS on the research direction in the
community and the future requirements for the platform
top related