generative probabilistic novelty detection with ...vision.csee.wvu.edu/gpnd.pdf · generative...

1
Computing log likelihood of the test data point for novelty test Since matrix U is unitary: Data point representation with respect to local coordinates that define the tangent space, and its orthogonal complement Linearization SVD decomposition Jacobi matrix: Tangent space: Generative Probabilistic Novelty Detection with Adversarial Autoencoders Stanislav Pidhorskyi, Ranya Almohsen, Donald A. Adjeroh, Gianfranco Doretto Introduction Results Manifold Learning Existing novelty detection techniques Our Approach Training Data Problem: identifying whether a new data point is considered to be an inlier or an outlier, assuming that training data is available to describe only the inlier distribution. Why: From a statistical point of view this process usually occurs while prior knowledge of the distribution of inliers is the only information available. Outliers are often very rare, or even dangerous to experience (e.g., in industry process fault detection), and there is a need to rely only on inlier training data. How: By learning the probability distribution of the inliers, adversarial autoencoders, and learning the parameterized manifold shaping the inlier distribution. Statistical methods. Model distribution of inliers, outliers are identified as those having low probability under the learned model. Distance based outlier detection methods. Identify outliers by their distance to neighboring examples. Deep learning based approaches: Reconstruction error based methods with encoder-decoder architectures. Input perturbation + max softmax score thresholding. Training: Testing: Inlier or outlier ? Model defines at every The novelty test is designed to assert whether data point was sampled from the model. Given point: can be projected onto: PDF describing r.v. PDF describing r.v. Coordinates which are parallel to Coordinates which are orthogonal to Independent Assumption: Novelty detection test: Neighborhood parameterization PDF describing r.v. since V is unitary: is linearization independent and can be learned offline from training data. Using adversarial autoencoders it is possible to force components of z to be i.i.d. variables with distribution close to normal. Thus PDF can be learned by fitting generalized Gaussian distribution. is approximated with it’s average over hypersphere of radius Distribution can be learned offline by hystogramming the norms. Since we have already made an assumption, that in order to save computational resources, we may approximate as: which eliminates need to perform full SVD How to compute and ? Architecture of the network for manifold learning. It is based on training an Adversarial Autoenconder (AAE) it has an additional adversarial component to improve generative capabilities of decoded images and a better manifold learning. The mappings and represent and are modeled by an encoder network, and a decoder network, respectively. Discriminator Encourages encoder to produce a latent representation of normal distribution and with components been independent random variables Discriminators Discriminator Encourages decoder to produce close to real data from a vector sampled from normal distribution. Objective Adversarial loss for matching the distribution of the latent space with the prior distribution Adversarial loss for matching the distribution of the decoded images from z and the known, training data distribution Reconstruction loss Adversarial losses: Autoencoder loss: 1. Maximize by updating weights of 2. Minimize by updating weights of 3. Maximize by updating weights of 4. Minimize and by updating weights of and Full objective Results on MNIST dataset Ablation study. Comparison on MNIST of the model components of GPND. F1 scores on MNIST [37]. Inliers are taken to be images of one category, and outliers are randomly chosen from other categories. Results on Fashion-MNIST Results on Coil-100. Inliers are taken to be images of one, four, or seven randomly chosen categories, and outliers are randomly chosen from other categories (at most one from each category). Comparison with ODIN. " indicates larger value is better, and # indicates lower value is better. Comparison with baselines. All values are percentages. Up-arrow indicates larger value is better, and down-arrow indicates lower value is better. Training Phase Network parameters: Testing Phase Alternate update steps: Computing the Tangent and Orthogonal Marginals Learned by fitting generalized normal distribution Learned by histogramming Results on COIL-100 dataset

Upload: others

Post on 20-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generative Probabilistic Novelty Detection with ...vision.csee.wvu.edu/GPND.pdf · Generative Probabilistic Novelty Detection with Adversarial Autoencoders Stanislav Pidhorskyi, Ranya

Computing log likelihood of the test data point for novelty test

Since matrix U is unitary:

Data point representation with respect to local coordinates that define the tangent space, and its orthogonal complement

Linearization

SVD decomposition

Jacobi matrix: Tangent space:

Generative Probabilistic Novelty Detection with Adversarial AutoencodersStanislav Pidhorskyi, Ranya Almohsen, Donald A. Adjeroh, Gianfranco Doretto

Introduction ResultsManifold Learning

Existing novelty detection techniques

Our Approach

Training Data

Problem: identifying whether a new data point is considered to be an inlier or an outlier, assuming that

training data is available to describe only the inlier distribution.

Why: From a statistical point of view this process usually occurs while prior knowledge of the distribution of

inliers is the only information available. Outliers are often very rare, or even dangerous to experience (e.g., in

industry process fault detection), and there is a need to rely only on inlier training data.

How: By learning the probability distribution of the inliers, adversarial autoencoders, and learning the

parameterized manifold shaping the inlier distribution.

➢ Statistical methods. Model distribution of inliers, outliers are identified as those having low

probability under the learned model.

➢ Distance based outlier detection methods. Identify outliers by their distance to neighboring

examples.

➢ Deep learning based approaches:

❑ Reconstruction error based methods with encoder-decoder architectures.

❑ Input perturbation + max softmax score thresholding.

Training: Testing:

Inlier or outlier ?

Model

defines

at every

The novelty test is designed to assert whether data point was sampled from the model.

Given point:

can be projected onto:

PDF describing r.v.

PDF describing r.v.

Coordinates which are parallel to

Coordinates which are orthogonal to Independent

Assumption:

Novelty detection test:

Neighborhood parameterization

PDF describing r.v. since V is unitary:

is linearization independent and can be learned offline from training data. Using adversarial autoencoders it is possible to force components of z to be i.i.d. variables with distribution close to normal. Thus PDF can be learned by fitting generalized Gaussian distribution.

is approximated with it’s average over hypersphere of radius

Distribution can be learned offline by hystogrammingthe norms.Since we have already made an assumption, thatin order to save computational resources, we may approximate as:

which eliminates need to perform full SVD

How to compute and ?

Architecture of the network for manifold learning. It is based on training anAdversarial Autoenconder (AAE) it has an additional adversarial component toimprove generative capabilities of decoded images and a better manifoldlearning.

The mappings and represent and are modeled by an encoder network, and a decoder network, respectively.

Discriminator

Encourages encoder to produce a latent representation of normal distribution and with components been independent random variables

Discriminators

Discriminator

Encourages decoder to produce close to real data from a vector sampled from normal distribution.

Objective

Adversarial loss for matching thedistribution of the latent space withthe prior distribution

Adversarial loss for matching thedistribution of the decoded imagesfrom z and the known, training datadistribution

Reconstruction loss

Adversarial losses:

Autoencoder loss:

1. Maximize by updating weights of2. Minimize by updating weights of3. Maximize by updating weights of4. Minimize and by updating weights of and

Full objective

Results on MNIST dataset

Ablation study. Comparison onMNIST of the model components of GPND.

F1 scores on MNIST [37]. Inliers are taken to be images of one category, and outliers are randomly chosen from other categories.

Results on Fashion-MNIST

Results on Coil-100. Inliers are taken to be images of one, four, or seven randomly chosencategories, and outliers are randomly chosen from other categories (at most one from each category).

Comparison with ODIN. " indicates larger value is better, and # indicates lower value is better.

Comparison with baselines. All values are percentages. Up-arrow indicates larger value is better, and down-arrow indicates lower value is better.

Training Phase Network parameters:

Testing Phase

Alternate update steps:

Computing the Tangent and Orthogonal Marginals

Learned by fitting generalizednormal distribution

Learned by histogramming

Results on COIL-100 dataset