cs 59000 statistical machine learning lecture 12 yuan (alan) qi purdue cs oct. 7 2008

Download CS 59000 Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct. 7 2008

If you can't read please download the document

Upload: cory-matthews

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Probit Regression Probit function:

TRANSCRIPT

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct Outline Review of Probit regression, Laplace approximation, BIC, Bayesian logistic regression Kernel methods Kernel ridge regression Kernel Principle Component Analysis Probit Regression Probit function: Labeling Noise Model Robust to outliers and labeling errors Generalized Linear Models Generalized linear model: Activation function: Link function: Canonical Link Function If we choose the canonical link function: Gradient of the error function: Examples Laplace Approximation for Posterior Gaussian approximation around mode: Evidence Approximation Bayesian Information Criterion Approximation of Laplace approximation: More accurate evidence approximation needed Bayesian Logistic Regression Kernel Methods Predictions are linear combinations of a kernel function evaluated at training data points. Kernel function feature space mapping Linear kernel: Stationary kernels: Fast Evaluation of Inner Product of Feature Mappings by Kernel Functions Inner product needs computing six feature values and 3 x 3 = 9 multiplications Kernel function has 2 multiplications and a squaring Kernel Trick 1. Reformulate an algorithm such that input vector enters only in the form of inner product. 2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel function: Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector Machines Dual variables: Dual Representation for Ridge Regression Kernel Ridge Regression Using kernel trick: Minimize over dual variables: Generate Kernel Matrix Positive semidefinite Consider Gaussian kernel: Combining Generative & Discriminative Models by Kernels Since each modeling approach has distinct advantages, how to combine them? Use generative models to construct kernels, Use these kernels in discriminative approaches Measure Probability Similarity by Kernels Simple inner product: For mixture distribution: For infinite mixture models: For models with latent variables (e.g,. Hidden Markov Models:) Fisher Kernels Fisher Score: Fisher Information Matrix: Fisher Kernel: Sample Average : Principle Component Analysis (PCA) Assume We have is a normalized eigenvector: Feature Mapping Eigen-problem in feature space Dual Variables Suppose, we have Eigen-problem in Feature Space (1) Eigen-problem in Feature Space (2) Normalization condition: Projection coefficient: General Case for Non-zero Mean Case Kernel Matrix: Kernel PCA on Synthetic Data Contour plots of projection coefficients in feature space Limitations of Kernel PCA Discussion