hideitsu hino

96
2016/06/06 @ 1 / 74

Upload: suurist

Post on 14-Jan-2017

44 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Hideitsu Hino

2016/06/06

@

1 / 74

Page 2: Hideitsu Hino

1

2

3

2 / 74

Page 3: Hideitsu Hino

talk

1

If (x) = −!

ln f(x)

H(f) = −!

f(x) ln f(x)dx

f X H(X)

1Shannon Renyi((1− α)−1 log

!f(x)αdx) Tsallis ((q − 1)−1(1−

!fq(x)dx))

( )3 / 74

Page 4: Hideitsu Hino

talk

H(f, g) =Ef [Ig(X)] = −!

f(x) ln g(x)dx,

H(f) =Ef [If (X)] = −!

f(x) ln f(x)dx

Kullback-Leibler

DKL(f, g) = Ef [Ig(X)]− Ef [If (X)] =

!f(x) ln

f(x)

g(x)dx

MI(X,Y ) = H(X) +H(Y )−H(X,Y )

H(X,Y ) X Y

4 / 74

Page 5: Hideitsu Hino

KL

5 / 74

Page 6: Hideitsu Hino

KL

5 / 74

Page 7: Hideitsu Hino

m Y ∈ Rm nX ∈ Rn Y

W ∈ Rn×m :

Y = WX. (1)

Y WWX f(WX) WX (m

) f(wjX), j = 1, . . . ,m

W [Hyvarinen&Oja, 2000]

6 / 74

Page 8: Hideitsu Hino

7 / 74

Page 9: Hideitsu Hino

k

L(c1, . . . , cK) =n"

i=1

minl=1,...,K

∥xi − cl∥2.

Fig. from [Faivishevsky&Goldberger, 2010]

8 / 74

Page 10: Hideitsu Hino

k

L(c1, . . . , cK) =n"

i=1

minl=1,...,K

∥xi − cl∥2.

A Nonparametric Information Theoretic Clustering Algorithm

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

−30 −20 −10 0 10 20 30−30

−20

−10

0

10

20

30

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

(a) (b) (c)

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

−30 −20 −10 0 10 20 30−30

−20

−10

0

10

20

30

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

(d) (e) (f)

Figure 2. Comparison of the proposed clustering method NIC and the k-means clustering algorithm on three syntheticcases. (a)-(c) NIC, (d)-(f) k-means.

e.g. (Wang et al., 2009). Since the pre whitening isaccomplished as multiplication of input data by theinvertible matrix matrix A = Cov(X)−1/2 the mutualinformation between the datapoints and the labels isnot changed. The Nonparametric Information Clus-tering (NIC) algorithm is summarized in Fig. 1.

(a) (b) (c)

Figure 3. Three possible clusterings (into two clusters) ofthe same dataset: (a) ‘correct’ clustering, (b) and (c) erro-neous clusterings. Using MeanNN as the MI estimator, theMI clustering score favors the correct solution while usingthe kNN yields the same score for all the three clusterings.

4. Related work

The commonly used k-means algorithm addresses ob-jects X as vectors in Rd. The k-means score functionmeasures the sum of square-distances between vectorsassigned to the same cluster. Observing that:

!

i|ci=j

∥xi − µj∥2 =

1

2nj

!

i=l|ci=cl=j

∥xi − xl∥2

where µj is the average of all data points in cluster j,we can rewrite Skmeans(C) as follows:

Skmeans(C) =nc!

j=1

1

nj

!

i=l|ci=cl=j

∥xi − xl∥2 (8)

It is instructive to compare the k-means score with themutual information score based on a Gaussian within-cluster density (4) and the proposed SNIC score (7):

(9)

Skmeans(C) =nc!

j=1

1

nj

!

i=l|ci=cl=j

∥xi − xl∥2

SGaussMI(C) =nc!

j=1

log1

nj

!

i=l|ci=cl=j

∥xi − xl∥2

SNIC(C) =nc!

j=1

1

(nj−1)

!

i=l|ci=cl=j

log ∥xi − xl∥2

Fig. from [Faivishevsky&Goldberger, 2010]

8 / 74

Page 11: Hideitsu Hino

H(X|Y )

Fig. from [Faivishevsky&Goldberger, 2010]

9 / 74

Page 12: Hideitsu Hino

H(X|Y )

A Nonparametric Information Theoretic Clustering Algorithm

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

−30 −20 −10 0 10 20 30−30

−20

−10

0

10

20

30

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

(a) (b) (c)

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

−30 −20 −10 0 10 20 30−30

−20

−10

0

10

20

30

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

(d) (e) (f)

Figure 2. Comparison of the proposed clustering method NIC and the k-means clustering algorithm on three syntheticcases. (a)-(c) NIC, (d)-(f) k-means.

e.g. (Wang et al., 2009). Since the pre whitening isaccomplished as multiplication of input data by theinvertible matrix matrix A = Cov(X)−1/2 the mutualinformation between the datapoints and the labels isnot changed. The Nonparametric Information Clus-tering (NIC) algorithm is summarized in Fig. 1.

(a) (b) (c)

Figure 3. Three possible clusterings (into two clusters) ofthe same dataset: (a) ‘correct’ clustering, (b) and (c) erro-neous clusterings. Using MeanNN as the MI estimator, theMI clustering score favors the correct solution while usingthe kNN yields the same score for all the three clusterings.

4. Related work

The commonly used k-means algorithm addresses ob-jects X as vectors in Rd. The k-means score functionmeasures the sum of square-distances between vectorsassigned to the same cluster. Observing that:

!

i|ci=j

∥xi − µj∥2 =

1

2nj

!

i=l|ci=cl=j

∥xi − xl∥2

where µj is the average of all data points in cluster j,we can rewrite Skmeans(C) as follows:

Skmeans(C) =nc!

j=1

1

nj

!

i=l|ci=cl=j

∥xi − xl∥2 (8)

It is instructive to compare the k-means score with themutual information score based on a Gaussian within-cluster density (4) and the proposed SNIC score (7):

(9)

Skmeans(C) =nc!

j=1

1

nj

!

i=l|ci=cl=j

∥xi − xl∥2

SGaussMI(C) =nc!

j=1

log1

nj

!

i=l|ci=cl=j

∥xi − xl∥2

SNIC(C) =nc!

j=1

1

(nj−1)

!

i=l|ci=cl=j

log ∥xi − xl∥2

Fig. from [Faivishevsky&Goldberger, 2010]

9 / 74

Page 13: Hideitsu Hino

H(X|Y ) Fisher

[Hino&Murata, 2010]

10 / 74

Page 14: Hideitsu Hino

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

1st axis

2nd

axis

LDAminH

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

1st axis

2nd

axis

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

1st axis

2nd

axis

LDAminH

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

1st axis

2nd

axis

11 / 74

Page 15: Hideitsu Hino

12 / 74

Page 16: Hideitsu Hino

12 / 74

Page 17: Hideitsu Hino

European single market completed

The Great Hanshion-

Awaji Earthquake

decay of bubble economy

the Gulf war

TO

PIX

Change P

oin

t S

core

1000

1500

2000

2500

3000

0.00

0.02

0.04

0.06

0.08

0.10

1988!02!01

1988!09!01

1989!05!01

1989!12!01

1990!08!01

1991!04!01

1992!04!01

1992!10!01

1993!06!01

1993!12!01

1994!07!01

1995!02!01

1995!09!01

1996!04!01

:

score(t) = logfafter(t)

fbefore(t).

[Murata+, 2013, Koshijima+, 2015]

13 / 74

Page 18: Hideitsu Hino

f(xt+1|xt:1)

50%, 95%f(xt+1|xt:1)

14 / 74

Page 19: Hideitsu Hino

( )

Vapnik

15 / 74

Page 20: Hideitsu Hino

( )

Vapnik

15 / 74

Page 21: Hideitsu Hino

( )

Vapnik

15 / 74

Page 22: Hideitsu Hino

16 / 74

Page 23: Hideitsu Hino

17 / 74

Page 24: Hideitsu Hino

18 / 74

Page 25: Hideitsu Hino

1

2

3

19 / 74

Page 26: Hideitsu Hino

D = {xi}ni=1 ⊂ R 1

D i.i.d.

20 / 74

Page 27: Hideitsu Hino

f(x) =5

8φ(x;µ = 0,σ = 1) +

3

8φ(x;µ = 3,σ = 1)

21 / 74

Page 28: Hideitsu Hino

f(x) =5

8φ(x;µ = 0,σ = 1) +

3

8φ(x;µ = 3,σ = 1)

21 / 74

Page 29: Hideitsu Hino

22 / 74

Page 30: Hideitsu Hino

f(x;h) =1

nh

n"

i=1

κ((x− xi)/h) (2)

κ#κ(x)dx = 1

h > 0

κh(x) = h−1κ(x/h)

f(x;h) =1

n

n"

i=1

κh(x− xi)

23 / 74

Page 31: Hideitsu Hino

f(x;h) =1

nh

n"

i=1

κ((x− xi)/h) (2)

κ#κ(x)dx = 1

h > 0

κh(x) = h−1κ(x/h)

f(x;h) =1

n

n"

i=1

κh(x− xi)

23 / 74

Page 32: Hideitsu Hino

κN (0, 1)

24 / 74

Page 33: Hideitsu Hino

κN (0, 1)

24 / 74

Page 34: Hideitsu Hino

κN (0, 1)

24 / 74

Page 35: Hideitsu Hino

x

MSE(mean squared error): θ

MSE(θ) = E[(θ − θ)2] = Var[θ] + (E[θ]− θ)2

E[f(x;h)] = E[κh(x−X)] =

!κh(x− y)f(y)dy

(f ∗ g)(x) =!

f(x− y)g(y)dy

f(x;h)

E[f(x;h)]− f(x) = (κh ∗ f)(x)− f(x).

Var[f(x;h)] =1

n

$(κ2

h ∗ f)(x)− (κh ∗ f)2(x)%

25 / 74

Page 36: Hideitsu Hino

x

MSE[f(x;h)] =1

n

$(κ2h ∗ f)(x)− (κh ∗ f)2(x)

%

+ {(κh ∗ f)(x)− f(x)}2

26 / 74

Page 37: Hideitsu Hino

L2 ( ) : ISE(integrated squarederror)

ISE[f(·;h)] =! &

f(x;h)− f(x)'2

dx

27 / 74

Page 38: Hideitsu Hino

f(x;h) D = {xi}ni=1ISE f

DMISE(mean integrated squared error)

MISE[f(·;h)] =ED[ISE[f(·;h,D)]]

=

!ED([f(x;h,D)− f(x)])2dx

=

!MSE[f(x;h,D)]dx

28 / 74

Page 39: Hideitsu Hino

MISE[f(·;h)] =n−1! $

(κ2h ∗ f)(x)− (κh ∗ f)2(x)%dx

+

!{(κh ∗ f)(x)− f(x)}2 dx

=(nh)−1!

κ2(x)dx

+ (1− n−1)

!(κh ∗ f)2(x)dx

− 2

!(κh ∗ f)(x)f(x)dx+

!f(x)2dx.

29 / 74

Page 40: Hideitsu Hino

MISE

hMISE

h

30 / 74

Page 41: Hideitsu Hino

1 f C2- L2

2 {hn} hnn h n :

limn→∞

h = 0, limn→∞

nh = ∞.

3 κ 4

!κ(x)dx = 1,

!xκ(x)dx = 0, µ2(κ) =

!x2κ(x)dx < ∞

31 / 74

Page 42: Hideitsu Hino

E[f(x;h)] =#κ(z)f(x− hz)dz f(x− hz)

f(x− hz) = f(x)− hzf ′(x) +1

2h2z2f ′′(x) + o(h2)

E[f(x;h)] = f(x) +1

2h2f ′′(x)

!z2κ(z)dz + o(h2)

E[f(x;h)]− f(x) =1

2h2µ2(κ)f

′′(x) + o(h2) (3)

ff

32 / 74

Page 43: Hideitsu Hino

gR(g) =

#g2(x)dx

Var[f(x;h)] = (nh)−1R(κ)f(x) + o((nh)−1) (4)

(2) (3) 0 MSE

MSE[f(x;h)] =(nh)−1R(κ)f(x) +1

4h4µ2

2(κ)(f′′(x))2

+ o((nh)−1 + h4)

33 / 74

Page 44: Hideitsu Hino

MSE

MISE[f(·;h)] = AMISE[f(·;h)] + o((nh)−1 + h4)

AMISE[f(·;h)] = (nh)−1R(κ) +1

4h4µ2

2(κ)R(f ′′).

AMISE MISE h:

hAMISE =

(R(κ)

µ22(κ)R(f ′′)n

)1/5.

34 / 74

Page 45: Hideitsu Hino

MSE

MISE[f(·;h)] = AMISE[f(·;h)] + o((nh)−1 + h4)

AMISE[f(·;h)] = (nh)−1R(κ) +1

4h4µ2

2(κ)R(f ′′).

AMISE MISE h:

hAMISE =

(R(κ)

µ22(κ)R(f ′′)n

)1/5.

34 / 74

Page 46: Hideitsu Hino

k

f(z) z ∈ Rp

D = {xi}ni=1

z k εk

z ε pb(z; ε) = {x ∈ Rp|∥z − x∥ < ε}

|b(z; ε)| = cpεp

cp = πp/2/Γ(p/2 + 1) Γ( · )

35 / 74

Page 47: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

xi ∈ D ◦z ∈ Rp ×

36 / 74

Page 48: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

xi ∈ D ◦z ∈ Rp ×

36 / 74

Page 49: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●ε

z ε

ε(k )

37 / 74

Page 50: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

ε

z ε

ε(k )

37 / 74

Page 51: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

εz ε

ε(k )

37 / 74

Page 52: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

ε

z ε

ε(k )

37 / 74

Page 53: Hideitsu Hino

k

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

ε

z ε

qz(ε) =

!

b(z;ε)f(x)dx.

k/nk ε = εk

εε

38 / 74

Page 54: Hideitsu Hino

k

Taylor :

qz(εk) =

!

b(z;εk){f(z) +∇f(x)(z − x) +O(ε2k)}dx

= |b(z; εk)|(f(z) +O(ε2k)) ≃ εpkcpf(z).

cp Rp

39 / 74

Page 55: Hideitsu Hino

k

k

n, εpkcpf(z)

fk(z) =k

cpnε−pk (5)

40 / 74

Page 56: Hideitsu Hino

k

k

fk(z) =k

cpnε−pk , (6)

εk z D k

41 / 74

Page 57: Hideitsu Hino

42 / 74

Page 58: Hideitsu Hino

1

2

3

43 / 74

Page 59: Hideitsu Hino

H(f) D = {xi}ni=1

xi ∈ Rp, i = 1, . . . , n f(x) X

44 / 74

Page 60: Hideitsu Hino

z ε

qz(ε) =

!

x∈b(z;ε)f(x)dx (7)

qz(ε) =

!

x∈b(z;ε)

&f(x) + (z − x)⊤∇f(z) +O(ε2)

'dx

= |b(z; ε)|*f(z) +O(ε2)

+= cpε

pf(z) +O(εp+2)

k/n O(εp+2)

45 / 74

Page 61: Hideitsu Hino

z ε

qz(ε) =

!

x∈b(z;ε)f(x)dx (7)

qz(ε) =

!

x∈b(z;ε)

&f(x) + (z − x)⊤∇f(z) +O(ε2)

'dx

= |b(z; ε)|*f(z) +O(ε2)

+= cpε

pf(z) +O(εp+2)

k/n O(εp+2)

45 / 74

Page 62: Hideitsu Hino

z ε qz(ε) ε

qz(ε) = cpf(z)εp+

p

4(p/2 + 1)cpε

p+2tr∇2f(z)+O(εp+4) (8)

qz(ε) kε/n cpεp

kεncpεp

= f(z) + Cε2 +O(ε4) (9)

C = ptr∇2f(z)4(p/2+1)

46 / 74

Page 63: Hideitsu Hino

z ε qz(ε) ε

qz(ε) = cpf(z)εp+

p

4(p/2 + 1)cpε

p+2tr∇2f(z)+O(εp+4) (8)

qz(ε) kε/n cpεp

kεncpεp

= f(z) + Cε2 +O(ε4) (9)

C = ptr∇2f(z)4(p/2+1)

46 / 74

Page 64: Hideitsu Hino

Yε =kε

ncpεpXε = ε2 ε 4

Yε Xε

Yε ≃ f(z) + CXε (10)

2

47 / 74

Page 65: Hideitsu Hino

Yε ≃ f(z) + CXε

Xε Yε

ε

48 / 74

Page 66: Hideitsu Hino

k [ ]

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●ε

z ε

ε(k )

49 / 74

Page 67: Hideitsu Hino

k [ ]

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

ε

z ε

ε(k )

49 / 74

Page 68: Hideitsu Hino

k [ ]

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

εz ε

ε(k )

49 / 74

Page 69: Hideitsu Hino

k [ ]

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

ε

z ε

ε(k )

49 / 74

Page 70: Hideitsu Hino

E = {ε1, . . . , εm},m < nE ε {(Xε, Yε)}ε∈E

R =1

m

"

ε∈E(Yε − f(z)− CXε)

2 (11)

f(z) C

f(z)fs(z)

50 / 74

Page 71: Hideitsu Hino

z fs(z)leave-one-out

Hs(D) = − 1

n

n"

i=1

ln fs,i(xi), (12)

fs,i(xi) xi

Hs(D) Simple Regression EntropyEstimator (SRE) [Hino+, 2015]

51 / 74

Page 72: Hideitsu Hino

SRE: how it works

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Normal

x

density

0 1 2 3 40.24

0.28

0.32

0.36

Normal

epsilon^2f(z)

Fitted density function Fitted intercept fs(z = 0.5)

52 / 74

Page 73: Hideitsu Hino

SRE: how it works

−3 −2 −1 0 1 2 3

0.00

0.10

0.20

0.30

Bimodal

x

density

1.0 1.5 2.0 2.5 3.0 3.5 4.00.225

0.235

0.245

Bimodal

epsilon^2f(z)

Fitted density function Fitted intercept fs(z = 0.5)

53 / 74

Page 74: Hideitsu Hino

ε xi ∈ D

Yε ≃ f(xi) + CXε

Yε =kε

ncpεpC = ptr∇2f(xi)

4(p/2+1) xi

Y iε Ci :

Y iε ≃ f(xi) + CiXε

54 / 74

Page 75: Hideitsu Hino

Y iε = f(xi) + CiXε

xi ∈ D

− 1

n

n"

i=1

lnY iε = − 1

n

n"

i=1

ln$f(xi) + CiXε

%

= − 1

n

n"

i=1

ln f(xi)

,1 +

CiXε

f(xi)

-

= − 1

n

n"

i=1

ln f(xi)−1

n

n"

i=1

ln

.1 +

CiXε

f(xi)

/

≃ − 1

n

n"

i=1

ln f(xi)−1

n

0n"

i=1

Ci

f(xi)

1Xε

55 / 74

Page 76: Hideitsu Hino

− 1

n

n"

i=1

lnY iε ≃ − 1

n

n"

i=1

ln f(xi)−1

n

0n"

i=1

Ci

f(xi)

1Xε

Yε = − 1n

2ni=1 lnY

H(D) = − 1n

2ni=1 f(xi)

C = − 1n

2ni=1

Ci

f(xi)

ε > 0

Yε = H(D) + CXε (13)

56 / 74

Page 77: Hideitsu Hino

ε ∈ E (13)

Rd =1

m

"

ε∈E(Yε −H(D)− CXε)

2

Direct Regression EntropyEstimator (DRE) [Hino+, 2015]

57 / 74

Page 78: Hideitsu Hino

qz(ε) = cpf(z)εp +

p

4(p/2 + 1)cpε

p+2tr∇2f(z) +O(εp+4)

qz(ε) kε/n cpεp

kεncpεp

= f(z) + Cε2 +O(ε4)

Yε = f(z) + CXε

58 / 74

Page 79: Hideitsu Hino

SRE

min1

m

"

ε∈E(Yε − f(z)− CXε)

2,

and

Hs(D) = − 1

n

n"

i=1

ln fi(xi)

DRE

min1

m

"

ε∈E(Yε −H(D)− CXε)

2

59 / 74

Page 80: Hideitsu Hino

k

60 / 74

Page 81: Hideitsu Hino

qz(ε) = cpf(z)εp +

p

4(p/2 + 1)cpε

p+2tr∇2f(z) +O(εp+4)

qz(ε) kε/n n:

kε ≃ cpnf(z)εp + cpn

p

4(p/2 + 1)tr∇2f(z)εp+2

61 / 74

Page 82: Hideitsu Hino

kε ≃ cpnf(z)εp + cpn

p

4(p/2 + 1)tr∇2f(z)εp+2

X = (εp, εp+2) Y = kεY = β⊤X

kε Poisson

62 / 74

Page 83: Hideitsu Hino

maxL(β) =m3

i=1

e−X⊤i β(X⊤

i β)Yi

Yi!

εp β1 β1z β1/(cpn)

SRE LOOEntropy Estimator with Poisson-noise structure andIdentity-link regression(EPI) [Hino+,under review]

63 / 74

Page 84: Hideitsu Hino

1

2

3

64 / 74

Page 85: Hideitsu Hino

H(f)H(D)

AE = |H(f)− H(D)|

100

65 / 74

Page 86: Hideitsu Hino

Univariate Case15 distributions

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Normal

x

density

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

Skewed

x

density

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Strongly Skewed

x

density

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

Kurtotic

x

density

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Bimodal

x

density

−3 −2 −1 0 1 2 30.0

0.1

0.2

0.3

0.4

Skewed Bimodal

x

density

66 / 74

Page 87: Hideitsu Hino

Univariate Case15 distributions

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Trimodal

x

density

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

10 Claw

x

density

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Standard Power Exponential

x

density

−3 −2 −1 0 1 2 3

0.05

0.10

0.15

0.20

0.25

Standard Logistic

x

density

−3 −2 −1 0 1 2 3

0.1

0.2

0.3

0.4

0.5

Standard Classical Laplace

x

density

−3 −2 −1 0 1 2 30.1

0.2

0.3

t(df=5)

x

density

67 / 74

Page 88: Hideitsu Hino

Univariate Case15 distributions

−3 −2 −1 0 1 2 3

0.05

0.10

0.15

0.20

0.25

Mixed t

x

density

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Standard Exponential

x

density

−3 −2 −1 0 1 2 3

0.05

0.10

0.15

0.20

0.25

0.30

Cauchy

x

density

68 / 74

Page 89: Hideitsu Hino

●●

●●

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Normal

x

density

−3 −2 −1 0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

Skewed

x

density

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Strongly Skewed

x

density

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

Kurtotic

x

density

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Bimodal

x

density

69 / 74

Page 90: Hideitsu Hino

●●

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Skewed Bimodal

x

density

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Trimodal

x

density

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

10 Claw

x

density

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Standard Power Exponential

x

density

−3 −2 −1 0 1 2 3

0.05

0.10

0.15

0.20

0.25

Standard Logistic

x

density

69 / 74

Page 91: Hideitsu Hino

●●

●●

●●

−3 −2 −1 0 1 2 3

0.1

0.2

0.3

0.4

0.5

Standard Classical Laplace

x

density

−3 −2 −1 0 1 2 3

0.1

0.2

0.3

t(df=5)

x

density

−3 −2 −1 0 1 2 3

0.05

0.10

0.15

0.20

0.25

Mixed t

x

density

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Standard Exponential

x

density

−3 −2 −1 0 1 2 3

0.05

0.10

0.15

0.20

0.25

0.30

Cauchy

x

density

69 / 74

Page 92: Hideitsu Hino

Univariate CaseResults: Curvature and Improvement

tr∇2f kγ > 0

:

f(x; γ) =1

πγ(1 + (x/γ)2).

∇2f(x; γ) =2

πγ33(x/γ)2 − 1

(1 + (x/γ)2)3

γ 0.01 0.9n = 300 100 k

EPI

|Hk(D)−H(f)|− |Hs(D)−H(f)|

70 / 74

Page 93: Hideitsu Hino

Univariate CaseResults: Curvature and Improvement

maxx∈R log |∇2f(x; γ)|

−0.2

0.0

0.2

0.0 2.5 5.0 7.5LogMaxCurvature

Improvem

ent

71 / 74

Page 94: Hideitsu Hino

That’s all fork

Pros. KDE k-NN

Cons.

72 / 74

Page 95: Hideitsu Hino

I

[Faivishevsky&Goldberger, 2010] Faivishevsky, L. and Goldberger, J. (2010).A Nonparametric Information Theoretic Clustering Algorithm.ICML2010.

[Hino+, 2015] Hino, H., Koshijima, K., and Murata, N. (2015).Non-parametric entropy estimators based on simple linear regression.Computational Statistics & Data Analysis, 89(0):72 – 84.

[Hino&Murata, 2010] Hino, H. and Murata, N. (2010).A conditional entropy minimization criterion for dimensionality reduction andmultiple kernel learning.Neural Computation, 22(11):2887–2923.

[Hyvarinen&Oja, 2000] Hyvarinen, A. and Oja, E. (2000).Independent component analysis: algorithms and applications.Neural Networks, 13(4-5):411–430.

[Koshijima+, 2015] Koshijima, K., Hino, H., and Murata, N. (2015).Change-point detection in a sequence of bags-of-data.Knowledge and Data Engineering, IEEE Transactions on, 27(10):2632–2644.

73 / 74

Page 96: Hideitsu Hino

II

[Murata+, 2013] Murata, N., Koshijima, K., and Hino, H. (2013).Distance-based change-point detection with entropy estimation.In Proceedings of the Sixth Workshop on Information Theoretic Methods inScience and Engineering, pages 22–25.

74 / 74