qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · qualifying exam...

80
Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University of Missouri Columbia, MO 65211 USA 10/13/2014 1. Marks, Debora S., Thomas A. Hopf, and Chris Sander. "Protein structure prediction from sequence variation." Nature biotechnology 30.11 (2012): 1072-1080. 2. My research projects. 3. Wu, Sitao, and Yang Zhang. "A comprehensive assessment of sequence-based and template-based methods for protein contact prediction." Bioinformatics 24.7 (2008): 924-931. 4. Hinton, Geoffrey, Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554.

Upload: others

Post on 31-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Qualifying exam presentation

Badri Adhikari Student ID: 14155100 Department of Computer ScienceUniversity of MissouriColumbia, MO 65211 USA

10/13/2014

1. Marks, Debora S., Thomas A. Hopf, and Chris Sander. "Protein structure prediction from sequence variation." Nature biotechnology 30.11 (2012): 1072-1080.

2. My research projects.3. Wu, Sitao, and Yang Zhang. "A comprehensive assessment

of sequence-based and template-based methods for protein contact prediction." Bioinformatics 24.7 (2008): 924-931.

4. Hinton, Geoffrey, Simon Osindero, and Yee-Whye Teh. "A fast learning algorithm for deep belief nets." Neural computation 18.7 (2006): 1527-1554.

Page 2: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

"Protein structure prediction from sequence variation."

Page 3: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Abstract

• Evolutionary information about functional constraints in genomic sequences can be mined to detect evolutionary couplings between residues in proteins.

• evolutionary couplings -> protein 3D structures

• Improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes.

• Computation of covariation patterns can complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.

Page 4: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Covariation

Page 5: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Covariation

• Several groups have demonstrated that extracting covariation information from sequences is sufficient to• Estimate which pairs of residues are close in three-

dimensional space

• And, also to fold a protein to reasonable accuracy

• These pairs of covarying residues should also be predictive functional sites, protein interactions and alternative conformations.

• To find true evolutionary covariation between residues, one must minimize the effect of transitive correlations.

Page 6: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

the problem of transitive correlations• They are the false positive correlations observed.

• Example: two residues that contact the same third residue but do not actually contact each other.• If residues A and B contact each other, as do B and C,

then there is in general, a transitive influence observed between residues A and C (chaining effect).

• Local statistical methods assume that pairs of residue positions are statistically independent of other pairs of residues

Page 7: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Transitive correlations removed by global statistical approaches• Steps for “entropy maximization under data

constraints”:1. Create a multiple sequence alignment between many

members of an evolutionarily related protein family2. Calculate the covariance matrix (observed minus

expected pair counts) of dimension 20L2

• by counting how often a given pair of the 20 amino acids, say alaline and lysine, occurs in a particular pair of positions, say position 15 and 67, in any once sequence, summing over all sequences in MSA

3. To compute a measure of causative correlations, the conditional mutual information, take the inverse of the covariance matrix. This is the numerical estimate of direct pair interactions.

Page 8: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Transitive correlations removed by global statistical approaches

Page 9: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Transitive correlations removed by global statistical approaches• Using similar approach Lapedes and Jarzynski

reached a first break-through in contact prediction in 2002 for 11 small proteins• But they did not compute three-dimensional structures

• After removal of transitive correlations and other confounding effects, predicted contacts based on the global probability models provide a base for the computation of three-dimensional folds.

Page 10: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

From contact predictions to protein folding• To what extent does improved contact prediction

lead to improved de novo prediction of 3D structures?• A folding protocol, EVfold, is developed

• Predicted residue contacts from coevolution patterns are translated into detailed atomic coordinates by using distance restraints placed on an extended polypeptide.

Page 11: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

EVfold

• A 3D structure is calculated by constraining the distance between pairs of residues with high covariance scores using a standard distance geometry algorithm used to solve 3D structures with experimental constraints deduced from NMR spectroscopy data.

• Followed by simulated annealing by molecular dynamics to ensure correct bond lengths and side-chain conformations.

Page 12: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University
Page 13: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University
Page 14: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

EVfold evaluation

• All-atom 3D structure coordinates were predicted from sequence alone for 15 diverse globular folds of up to 220 amino acids

• Overall accuracy of as low as 2.8-5.1A Ca r.m.s. deviation relative to experimentally determined structures.

• The accuracy of atomic coordinates were reported to be best (down to around 1A all-atom over 5-10 residues) around active sites.

Page 15: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Quality of predicted folds likely to improve over time• More sequence information tends to lead to higher

accuracy of distance constraints.

• Currently limited atomic accuracy is likely to improve with advanced molecular dynamics refinement methods (for example CNS, Rosetta, etc)

Page 16: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Structure prediction of membrane proteins• The structures of membrane proteins is notoriously

difficult to determine by crystallography or NMR spectroscopy.

• Another group has tested the ability to predict the 3D structure of membrane proteins on 25 membrane proteins with up to 487 residues.

• Two of the proteins in the data set are membrane proteins: G protein-coupled receptors and membrane transporters. The accuracy ranges from 2.6A to 4.8A, which is notable.

Page 17: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

EVfold uses least structural information• Other global statistical modeling approaches use

fragment-based prediction methods• The Jones group, FILM3, predicted the structure of 32

known membrane proteins

• The Onuchic group, DCAfold, predicted structures of 15 bacterial protein domains up to 133 residues. This is comparable to EVfold.

• EVfold uses the least existing structural information and therefore shows the potential for prediction of unknown folds.

Page 18: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Applications of improved structure-prediction methods

Page 19: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Limitations

• Many of the predicted contacts involved in protein features (functional sites, homomultimer contacts, alternative conformations) may appear as false positives in the prediction of intradomain residue contacts. • A challenge for the field will be to develop algorithms that

can disambiguate the different functional constraints.

• Detection of evolutionary couplings between residues requires a substantially diverse set of sequences, which is not yet available for many families.• EVfold needs about 5L sequences in multiple alignment.• May be addressed over time.

Page 20: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Combine experimental and computational structural biology• Protein-structure determination by NMR spectroscopy

is ideally suited for a hybrid approach, as it is based on the determination of distance constraints.

• Combining reduced X-ray and NMR spectroscopy data sets with predicted three-dimensional models may open a new phase for structural biology with much more rapid determination of high-accuracy protein structures.

• Using the massive sequence data sets, successful decoding of the molecular record of evolutionary constraints could now reveal structural and functional information about proteins at an unprecedented rate.

Page 21: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

1,250 alpha-helical transmembrane protein families known in mid-2012

Page 22: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

My Research

“Contact assisted protein structure modeling.”

Page 23: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

My projects

• Reconstruction of “hard” protein targets using residue contacts. Not much luck.

• Building 3D models using SVMcon, NNcon, and DNconpredicted contacts. Not much luck.

• Fragment assembly based model building. Not much luck.

• Combining various types of contacts. Not much luck.

• Contact filtering to improve model building. Not much luck.

• Beta sheet construction using DGSA protocol. Good news here.

• Folding proteins that can be folded with contacts. Working on a publication.

Page 24: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University
Page 25: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Contact-assisted fragment replacement based protein structure prediction

Randomly pick a fragment of (length 9)

Predict features (SS and SA) for this fragment

Lookup in the fragment database

to find the most matching fragment

Convert structure into angular space

and replace the fragment

Convert structure back to Cartesian

co-ordinates

Compute energy of the new structure

(using contact satisfaction score)

Extended structure

Final Structure

Monte-Carlo Simulated Annealing

Page 26: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Protein folding process for target T0716 demonstrated through a moviehttp://www.youtube.com/watch?v=HBONCqN9U4k

Page 27: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Current pipeline

Page 28: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

My pipeline versus EVfold

Page 29: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Example: RASH_HUMAN

Page 30: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

"A comprehensive assessment of sequence-based and template-based methods for

protein contact prediction."

Page 31: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Abstract

Method Easy targets Medium targets Hard targets

SVM-LOMETS Outperforms SVM-SEQ

SVM-SEQ Outperforms SVM-LOMETSby 12-25%

Combined Contact prediction accuracy improves by 60%

collects consensus contact predictions from multiple threading templates

sequence based machine learning approach which trains a variety of sequence-derived features

Page 32: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Dataset and contact definition

• 500 non-homogeneous proteins from PDBSELECT with a pair-wise sequence identity <25% and sizes ranging from 50 to 559.• Removed proteins with broken chains or missing entities or

format errors.• 22k/27k/28k contacts and 87k/107k/112k non-contacts in the

short/medium/long-ranges respectively.

• A pair of residues are in contact if their Ca atom distance is <8Å.• Sequence separation: 6-11 -> short range, 12-24 -> medium

range, and >24 -> long range.

• Acc = Ncorrect/Npredicted and

• Pct = Npredicted/L

Page 33: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

SVM-SEQ

• Residue pairs in the training structures are categorized as ‘contacted’ or ‘non-contacted’.

• In principle, training data should cover as many as possible residue pairs.• Including too many pairs in the training data need a long

training CPU time.

• The number of non-contacted pairs is much larger than that of contacted pairs (>20:1).

• By trial and error, ratio of non-contacted/contacted residue pairs is kept as 4:1 by randomly selecting residue pairs.

Page 34: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

SVM-SEQ Local window features

1. position-specific scoring matrices (PSSM)• generated by PSI-BLAST search of query against non-

redundant sequence database.

2. Secondary structure predictions• Predicted by PSIPRED• 3 states: alpha-helix -> [0 1 0 ], beta-strand -> [ 0 0 1 ], and

coil -> [ 1 0 0 ]

3. Solvent accessibility predictions• Predicted by neural network training• buried -> [ 0 1 ], and exposed -> [ 1 0 ]

• With 15-residue window, total number of local features for a pair is 750 (=2*15*(20+3+2)).

Page 35: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

SVM-SEQ In-between segment feature sets1. Number of the residues between i and j, i.e. |i-j|

2. The compositional percentage of three secondary structure elements and two burial states of the in-between residues.

3. State distributions of the in-between residues, which are specified by four moments Fn = ⟨(k-⟨k⟩)n⟩, n = 1,2,3, and 4 where k (=m-i) is the position of the mth residues relative to i along the chain and each moment is calculated for five specific states of helix, strand, coil, burial and exposure.

4. The local features of five selected in-between residues which are evenly distributed between i and j.

• The SVM software developed by Joachims (2002) is used to classify the contacted and non-contacted residue pairs.• The accuracy of Neural Network was found 30% lower.

Page 36: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

LOMETS

• LOMETS is a local meta-threading server which includes nine locally installed threading programs.

• For each target, LOMETS first thread a sequence through the PDB library to identify possible templates.

• The threading programs of LOMETS represent a diverse set of state-of-art algorithms using different approaches: sequence profile alignments, structural profile alignments, pair-wise potentials, and hidden Markov models.

• A consensus combination of the meta-server algorithms significantly outperforms the individual threading methods.

Page 37: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

SVM-LOMETS

• One defect of the LOMETS prediction is the coarse-grained distance cutoff (e.g. a distance of 7.9Å or 8.1Å results in different assignment of contact or non-contact despite the tiny difference)

• In SVM-LOMETS, SVM algorithm is used to train the distance cutoff parameters and alignment qualities on the contact map.

• For each pair of residues (i and j), the training features are prepared.

Page 38: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

SVM-LOMETS Features

1. Frequency of the contacts occurred on the top N templates (N is 10, 20, .. 90).

2. The average and SD of the Ca distance (dij) calculated from the templates which have dij < 12Å.

3. Number of continuously aligned residues within a 5-residue window

4. The burial depth of the residuesdistance from the Ca atoms of I and j to the centroid of the template structure divided by the radius of gyration

5. The average of normalized Z-scores6. The predicted TM-score of templates

separate SVM training

Page 39: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

SVM-LOMETS training

• Nine different training datasets with 10, 20, …, 90 top templates.

• Targets are split into 3 categories: Easy, Medium, and Hard based on specific Z-score of programs.

• 27 SVM classifiers trained

• Number of templates of Z-score > 0.55 * Z0 to N is used to determine which SVM classifier is finally used to generate the contact predictions.

Page 40: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Results SVM-SEQ versus other machine-learning methods• Some modest (if any) advantage of SVM-SEQ in

comparison with current machine learning methods.

Page 41: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Results SVM-LOMETS versus LOMETS• an improvement of 5.2%

Page 42: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Results Template-based versus sequence-based methods

Page 43: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Results Template-based versus sequence-based methods

Page 44: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Results dependence of SVM-SEQ on target categories• Unexpectedly, results show

dependence of SVM-SEQ on the target categories (easy, medium, hard) since SVM-SEQ does not exploit template information.

• The larger number of homologous sequences helps construct a better PSSM which SVM-SEQ has been mainly trained on.• This explains the different

performance of SVM-SEQ in different categories.

Page 45: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Results combining SVM-LOMETS and SVM-SEQ• count the portion of correctly predicted contacts by

SVM-SEQ, which is not predicted by SVM-LOMETS.

• For hard/very hard targets, a combination of ofSVM-SEQ with SVM-LOMETS can result in an enlargement of total correct contact predictions by 62% compared with using SVM-LOMETS alone.

Page 46: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

RESULTS New fold targets in CASP7• examine the sequence-based and template-based

methods on the 15 new fold (NF) targets as categorized in CASP7 experiment.• there is no similar structure solved in the PDB library for

these targets

• Compare with SAM-T06-server, the best server predictor in the CASP7 experiment.

Page 47: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

RESULTS New fold targets in CASP7

Page 48: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Conclusions

• The accuracy of SVM-SEQ is comparable to the top sequence-based machine learning methods published.

• SVM-LOMETS (trained additionally using contact frequency, Ca-distances, and template qualities) generates slightly better contact predictions (by 5.2%) than the original LOMETS method.

• The overall accuracy of contact prediction based on templates if much higher than that by the sequence-based contact predictions.

• For Hard targets, SVM-SEQ generates contact predictions with an accuracy comparable or better than the template based predictions.• Incorporating the SVM-SEQ contact predictions in the I-TASSER

simulation results in an about 5% TM-score increase for the first models of the Hard targets.

• For the new fold targets in CASP7, accuracy of threading template-based contact prediction is close to random, SVM-SEQ generates contact predictions with about 20% of them being correct.

Page 49: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

"A fast learning algorithm for

deep belief nets."

Page 50: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Abstract

• Use of “complementary priors” to eliminate explaining away effects that make inference difficult in densely connected belief nets that have many hidden layers.

• Derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time.

• After fine-tuning, a network of three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels.

• The generative model gives better digit classification than the best discriminative learning algorithms.

Page 51: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Belief Networks

Source: http://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf

Page 52: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Stochastic binary units

Source: http://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf

Page 53: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Restricted Boltzman Machines

Source: http://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf

Page 54: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Contrastive Divergence Algorithm

Source: http://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf

Source:http://cl.naist.jp/~kevinduh/a/deep2014/140116-ResearchSeminar.pdf

Page 55: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Gibbs sampling

• Gibbs sampling is a Markov chain Monte Carlo algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution (i.e. from the joint probability distribution of two or more random variables), when direct sampling is difficult.

Page 56: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Definition of General Complementarity• Consider a joint distribution over observables x and

hidden variables y.

• For a given likelihood function, P(x|y), we defined corresponding family of complementary priors to be those distributions, P(y), for which• The joint distribution P(x,y) = P(x|y)P(y), leads to

posteriors, P(y|x), that exactly factorize, that is, leads to a posterior that can be expressed as P(y|x) = ∏jP(yi|x)

Page 57: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Logistic belief net

• A logistic belief net is composed of stochastic binary units.

• When the net is used to generate data, the probability of turning on unit i is a logistic function of the states of its immediate ancestors, j, and of the weights, wij, on the directed connections from the ancestors:

where bi is the bias of unit i.

Page 58: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

The phenomenon of explaining away

Even if two hidden causes are independent, they can become dependent when we observe an effect that they can both influence.

If we learn that there was an earthquake it reduces the probability that the house jumped because of a truck.

Posterior:p(1,1) = 0.0001p(1,0) = 0.4999p(0,1) = 0.4999p(0,0) = 0.0001

Page 59: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Complementary Priors

• The phenomenon of explaining away makes inference difficult in directed belief nets.

• To eliminate explaining away in the first hidden layer by using extra hidden layers to create a “complementary” prior that has exactly the opposite correlations to those in the likelihood term.

• Then, when the likelihood term is multiplied by the prior, we get a posterior that is exactly factorial.

• The use of tied weights to construct complementary priors is like a trick for making directed models equivalent to undirected ones.

Page 60: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University
Page 61: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

An Infinite Directed Model with Tied Weights• We can generate data from infinite directed net like

any other directed acyclic belief net.

• We can sample from the true posterior distribution over all of the hidden layers by starting with a data vector on the visible units and then using the transposed weight matrices to infer the factorial distributions over each hidden layer in turn.

• At each hidden layer, we sample from the factorial posterior before computing the factorial posterior for the layer above.

Page 62: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

An Infinite Directed Model with Tied Weights• Computing the derivative for a generative weight,

wij00, from unit j in layer H0 to unit i in layer V0. In a

logistic belief net, the maximum likelihood learning rule for a single data vector, v0, is

where <.>denotes an average of the sampled states and cap(v)i

0 is the probability that unit i would be turned on if the visible vector was stochastically reconstructed from the sampled hidden states.

Page 63: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Restricted Boltzman Machines and Contrastive Divergence• The infinite directed net (Fig 3) is equivalent to a

restricted Boltzman machine (RBM).

• To generate data from an RBM, we start with a random state in one of the layers and then perform alternating Gibbs sampling.• This is the same process as generating data from the

infinite belief nets with tied weights.

Page 64: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Restricted Boltzman Machines and Contrastive Divergence• To perform maximum likelihood learning in an RBM, we can use

the difference between two correlations. For each weight wij, between visible unit i and a hidden unit j, we measure correlation <vi

0hj0> when a data vector is clamped on the visible units and the

hidden states are sampled from their conditional distribution.

• Then using alternating Gibbs sampling, we run the Markov chain until it reaches its stationary distribution and measure the correlation <vi

∞hj∞>. The gradient of the log probability of the

training data is then:

• This learning rule is same as maximum likelihood learning rule for the infinite logistic belief net with tied weights, and each step of Gibbs sampling corresponds to computing the exact posterior distribution in a layer of the infinite logistic belief net.

Page 65: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Restricted Boltzman Machines and Contrastive Divergence

Page 66: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Restricted Boltzman Machines and Contrastive Divergence• Contrastive divergence learning in a restricted

Boltzman machine is efficient enough to be practical.

• Variations that use real-valued units and different sampling schemes are described in literatures.

• However, efficiency has been brought at a high price.

• When applied, contrastive divergence divergencelearning fails for deep, multilayer networks with different weights at each layer.

Page 67: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

A Greedy learning algorithm for Transforming Representations• The equivalence between RBMs and infinite

directed nets with tied weights suggests an efficient learning algorithm for multilayer networks in which the weights are not tied.

Page 68: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

A Greedy learning algorithm for Transforming Representations

Page 69: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

A Greedy learning algorithm for Transforming Representations• Top two layers interact via undirected connections

and all of the other connections are directed. There are no intra-layer connections.

• It is possible to learn a sensible values (though not optimal) for the parameters W0 by assuming that the parameters between higher layers will be used to construct complimentary prior for W0.

• The task of learning W0 under this condition reduces to the task of learning an RBM, and good approximate solutions can be found rapidly by minimizing contrastive divergence.

• Once W0 has been learned, the data can be mapped through W0T to create higher-level “data” at the first hidden layer.

Page 70: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

A Greedy learning algorithm for Transforming Representations• RBM will not be able to model the data perfectly.

We can make generative model better using the greed algorithm:

1. Learn W0 assuming all the weight matrices are tied.

2. Freeze W0 and commit to using W0T, even if

subsequent changes in higher-level weights mean that this inference method is no longer correct.

3. Keeping all the higher-weight matrices tied to each other, but untied from W0, learn the RBM model of the higher-level data that was produced by using W0

T to transform the original data.

Page 71: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Performance on the MNIST Database

Page 72: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Performance on the MNIST Database• The database contains 60,000 training images and

10,000 test images.

• The performance of the network was 1.25% errors on the official test set.

• The only standard machine learning technique that comes close to 1.25% error rate is a support vector machine that gives an error rate of 1.4%.

Page 73: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Performance on the MNIST Database

Page 74: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Looking into the Mind of a Neural Network• To generate samples from the model, perform

Gibbs sampling in the top-level associative memory until the Markov chain converges to the equilibrium distribution.

• A sample from this distribution is used as input to the layers below and generate image.

Page 75: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Looking into the Mind of a Neural Network

Page 76: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Conclusions I

• It is possible to learn a deep, densely connected belief network one layer at a time.

• This can be done by assuming that the higher layers do exist (in contrast).

• Assume that higher layers have tied weights that are constrained to implement a complementary prior that makes the true posterior exactly factorial.

• This is equivalent to having an undirected model that can be learned efficiently using contrastive divergence.

Page 77: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Conclusions II

• After each layer has been learned, its weights are untied from the weights in higher layers.

• As the higher-level weights change, the priors for lower layers cease to be complementary, so the true posterior distributions in the lower layers are no longer factorial.

• Nevertheless, adapting the higher-level weights improves the overall generative model.

Page 78: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Conclusions III

• It might be better to learn an ensemble of larger, deeper networks

• The implemented network has as many parameters as 0.002 cubic millimeters of a mouse cortex, and several hundred networks of this complexity could fit within a single voxel of high-resolution fMRI scan.

• This suggests that much bigger networks may be required to compete with human shape recognition abilities.

Page 79: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

Conclusion IV Advantages over discriminative models• Generative models can learn low-level features

without requiring feedback from the label, and they can learn many more parameters than discriminative models.

• It is easy to see what the network has learnt by generating from its model.

• It is possible to interpret the nonlinear, distributed representations in the deep hidden layers by generating images from them.

Page 80: Qualifying exam presentationumsl.edu/~adhikarib/talks/2014_qualifying_exam.pdf · Qualifying exam presentation Badri Adhikari Student ID: 14155100 Department of Computer Science University

“Thank you."