divergence estimation via density-ratio estimation

22
1 Divergence Estimation via Density-Ratio Estimation Makoto Yamada [email protected]

Upload: others

Post on 19-Oct-2021

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Divergence Estimation via Density-Ratio Estimation

1

Divergence Estimationvia Density-Ratio Estimation

Makoto [email protected]

Page 2: Divergence Estimation via Density-Ratio Estimation

nMeasuring similarity between data sets is important for various problems.nSimilarity between documents (ranking).nSimilarity between access logs (illegal access detection)

nDivergence is useful for measuring (dis)similarity.nKullback-Leibler divergence

nMutual information

2Research Motivation

Ratio of densities

Page 3: Divergence Estimation via Density-Ratio Estimation

nThe ratio of probability densities:

e.g., Kullback-Leibler divergence:

3Density Ratio

We mainly focus on continuous variable.

Page 4: Divergence Estimation via Density-Ratio Estimation

Density-Ratio Applications

nChange point detectionnTransfer learning

nSpeaker identificationnHuman pose estimationnAction recognition

nDimensionality reductionnOutlier detection

4

Change Point

Training

Test

Page 5: Divergence Estimation via Density-Ratio Estimation

Direct Density-Ratio Estimation 5

Knowing densities

nNaïve approach: Separately estimate and take their ratio ⇒ poor

nVapnik said: When solving a problem of interest,one should not solve a more general problems as an intermediate step (Vapnik principle)

⇒Estimating densities is more general than estimating a density ratio

n Following the Vapnik principle, methods which directly estimate the density ratio without density estimation were proposed.

Knowing ratio

Sugiyama, Suzuki, & Kanamori (2012)

Page 6: Divergence Estimation via Density-Ratio Estimation

Density-Ratio after 2016

nWe developed several density-ratio based approaches (mostly kernel based approaches) by 2012.

nAfter 2016nGenerative Adversarial Networks (GAN)

nGenerative Adversarial Nets from a Density Ratio Estimation Perspective (arXiv)

nLearning in implicit generative models (arXiv)

nApproximate Bayesian Computation (ABC)nLikelihood-free inference by ratio estimation

nMutual Information estimationnMINE: Mutual Information Neural Estimation

6

Page 7: Divergence Estimation via Density-Ratio Estimation

nDensity ratio

can diverge to infinity under a rather simple setting.

7Research Motivation

Cortes et al. (NIPS 2010)

Page 8: Divergence Estimation via Density-Ratio Estimation

nRelative density-ratio:

Relative density-ratio is bounded above by ! nRelative Pearson divergence:

8Relative Density-RatioYamada et al. (NIPS 2011)

Page 9: Divergence Estimation via Density-Ratio Estimation

nData:nKey idea: Fit a density-ratio model to

the true density-ratio under squared-loss.

9Relative unconstrained Least-Squares Importance Fitting (RuLSIF)

Constant

Page 10: Divergence Estimation via Density-Ratio Estimation

nKernel model:

nCost function with kernel model:

10RuLSIF: Model

Page 11: Divergence Estimation via Density-Ratio Estimation

nOptimization problem:

nSolution (analytically obtained):

nCross-validation is possible.nIf , RuLSIF is equivalent to uLSIF.

11RuLSIF: Solution

Regularizer

Kanamori et al. (JMLR 2009)

Page 12: Divergence Estimation via Density-Ratio Estimation

Relative PE Divergence EstimatorsnRelative PE divergence:

nRelative PE divergence estimators:

12

(A)

(B)

(A)

(B)

Page 13: Divergence Estimation via Density-Ratio Estimation

Toy Experiments: RuLSIF 13

: 0.2865 :-0.0001

Page 14: Divergence Estimation via Density-Ratio Estimation

Change Point DetectionnChange-point detection based on PE:

14Liu, Yamada, Collier & Sugiyama (Neural Networks to appear)

Page 15: Divergence Estimation via Density-Ratio Estimation

Covariate Shift Adaptation (Transfer Learning)

nKey idea: Reduce generalization error in test data set (Not in training dataset)!

nCovariate shift adaptation setupnTraining data:nTest data:nKey idea: Learning a function so that error in test data is

minimized under the assumption:

15

Shimodaira (JSPI, 2000)

Page 16: Divergence Estimation via Density-Ratio Estimation

Exponentially-flattened IW (EIW) empirical error minimization

nFlatten the importance weight by

à empirical error minimization. à Intermediate à IW empirical error minimization

Setting to is practically useful for stabilizing the covariate shift adaptation, even though it cannot give an unbiased model under covariate shift.

16

Shimodaira (JSPI 2000)

It still needs importance weight estimation L

Page 17: Divergence Estimation via Density-Ratio Estimation

Relative importance-weighted (RIW) empirical error minimization

nUse relative importance weight (RIW):

controls the adaptiveness to the test distribution.

nRIW-empirical error minimization:

works well in practice.

17

Yamada et al. (NIPS 2011)

Page 18: Divergence Estimation via Density-Ratio Estimation

Toy Example

nPredicted output by IWKR (IWKR = RIW-LS)

18

RIW method gives smaller error and varianceJ

Yamada et al. (NIPS 2011)

Page 19: Divergence Estimation via Density-Ratio Estimation

Application1: 3D Pose EstimationnGiven: large database of image-pose pairs

n Inference: Predict human pose of

19

e.g., Kernel Regression

Testimage

HoG

Yamada, Sigal, & Raptis (ECCV 2012)

Page 20: Divergence Estimation via Density-Ratio Estimation

HumanEva-I Experiments: Settings

nExperimental Settings:nSelection bias (C1-3): All camera data is used for

training and testing.nSubject transfer (C1): Test subject is not included

in training phase. n Error metric:

: True pose

20Sigal et al. (TR 2006)

Page 21: Divergence Estimation via Density-Ratio Estimation

Application2: Human Activity Recognition

nHuman Activity Recognition by accelerometer nWalk, run, bicycle riding, and train riding

classification by accelerometer sensor in iPod touchnTraining: 20 subjects data setnTest: A new subject not included in the training set

Relative importance weight performs wellJ

21

Yamada et al.(NIPS 2011)

Page 22: Divergence Estimation via Density-Ratio Estimation

Conclusion

nRelative Density-RatioRelative Density-ratio is promising for various types of applications.nChange-point detectionnTransfer learning

22