classification and clustering methods by probabilistic latent semantic indexing model

34
No. 1 Classification and clustering methods Classification and clustering methods by probabilistic latent semantic by probabilistic latent semantic indexing model indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C., March 7-9, 2006 Shigeich Hirasawa Shigeich Hirasawa Department of Industrial and Department of Industrial and Management Systems Engineeri Management Systems Engineeri ng ng School of Science and Engine School of Science and Engine ering ering Waseda University, Japan Waseda University, Japan [email protected]. [email protected].

Upload: sabina

Post on 13-Jan-2016

55 views

Category:

Documents


18 download

DESCRIPTION

A Short Course at Tamkang University Taipei, Taiwan, R.O.C., March 7-9, 2006. Classification and clustering methods by probabilistic latent semantic indexing model. Shigeich Hirasawa - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classification and clustering methods by probabilistic latent semantic indexing model

No. 1

Classification and clustering methods Classification and clustering methods by probabilistic latent semantic by probabilistic latent semantic

indexing model indexing model

A Short Course at Tamkang UniversityTaipei, Taiwan, R.O.C., March 7-9, 2006

Shigeich HirasawaShigeich Hirasawa

Department of Industrial and ManagemDepartment of Industrial and Management Systems Engineering ent Systems Engineering

School of Science and EngineeringSchool of Science and EngineeringWaseda University, JapanWaseda University, Japan

[email protected]@hirasa.mgmt.waseda.ac.jp

Page 2: Classification and clustering methods by probabilistic latent semantic indexing model

No. 2

Format Example in paper archives matrix

Fixed forma

t

Items

- The name of authors- The name of journals- The year of publication- The name of publishers

- The name of countries- The year of publication- The citation link

Free forma

tText

The text of a paper - Introduction - Preliminaries ……. - Conclusion

1. Introduction1. Introduction

Document

DIG 1,0

DTH ,2,1,0

G = [ gmj ]: An item-document matrix

H = [ hij ] : A term-document matrix

dj : The j-th documentti : The i-th termim : The m-th item

gmj : The selected result of the m-th item (im ) in the j-th document (dj )

hij : The frequency of the i-th term (ti ) in the j-th document (dj )

1. Introduction

Page 3: Classification and clustering methods by probabilistic latent semantic indexing model

No. 32. Information Retrieval Model 2. Information Retrieval Model

Text Mining:- Information Retrieval including- Clustering- Classification

Information Retrieval ModelBase Model

Set theory(Classical) Boolean ModelFuzzyExtended Boolian Model

Algebraic

(Classical) Vector Space Model (VSM) [BYRN99]Generalized VSMLatent Semantic Indexing (LSI) Model [BYRN99]Neural Network Model

Probabilistic

(Classical) Probabilistic ModelExtended Probabilistic ModelProbabilistic LSI (PLSI) Model [Hofmann99]Inference Network ModelBayesian Network Model

2. Information Retrieval Model

Page 4: Classification and clustering methods by probabilistic latent semantic indexing model

No. 4

tf ( i,j ) = fij : The number of the i-th term (ti ) in the j-th document (dj ) (Local weight)

Weight wij is given by

The Vector Space Model (VSM)2. Information Retrieval Model 2. Information Retrieval Model

ijij aiidfjitfw )(),( ijij aiidfjitfw )(),(

idf ( i,j ) = log (D/df(i)) : General weight

df( i ) : The number of documents in D for which the term ti appears

(1)

Page 5: Classification and clustering methods by probabilistic latent semantic indexing model

No. 52. Information Retrieval Model 2. Information Retrieval Model

(term vector) ti = (ai1 , ai2 , … , aiD) : The i-th row

(document vector) dj = (a1j , a2j , … , aTj) : The j-th column

(query vector) q = (q1 , q2 , … , qT)T

The similarity s ( q , dj ) between q and dj :

)cosine(||||

),(T

T

j

jjdqs

dq

dq

(2)

Page 6: Classification and clustering methods by probabilistic latent semantic indexing model

No. 6

The Latent Semantic Indexing (LSI) Model

2. Information Retrieval Model 2. Information Retrieval Model

(1) be

intoeq.(2.1).

eq.(2.6)

T

D

minmin

Page 7: Classification and clustering methods by probabilistic latent semantic indexing model

No. 7

|ˆ||ˆ|

ˆˆ),()similality(

T

T

j

jjdqs

dq

dq

where

: the j-th canonical vector

(2)

2. Information Retrieval Model 2. Information Retrieval Model

be

11ˆ)vectorquery(

K

Kqq R

eq.(2.4)eq.(2.4)

Page 8: Classification and clustering methods by probabilistic latent semantic indexing model

No. 8

The Probabilistic LSI (PLSI) Model

2. Information Retrieval Model 2. Information Retrieval Model

(1) Preliminary

A) A=[aij], aij = fij :the number of a term ti

B) reduction of dimension similar to LSI

C) latent class (state model based on factor analysis)

D) (i) an independence between pairs ( ti , dj )(ii) a conditional independence between ti and dj

},max{ DTK

: a set of states

z td

)Pr( kz

)|Pr( kj zd )|Pr( ki zt(2.10)

(2.11)

Page 9: Classification and clustering methods by probabilistic latent semantic indexing model

No. 9

(2)

2. Information Retrieval Model 2. Information Retrieval Model

be

(2.13)

eq.eq.(2.1).(2.1).

Page 10: Classification and clustering methods by probabilistic latent semantic indexing model

No. 10

(3)

2. Information Retrieval Model 2. Information Retrieval Model

(2.14)

(2.17)

(2.16)

(2.15)

eq.eq.(2.11),(2.11),

eq.(2.13)eq.(2.13)

Page 11: Classification and clustering methods by probabilistic latent semantic indexing model

No. 113. Proposed Method3. Proposed Method3. Proposed Method

3.1 Classification method

categories: C1, C2, … , CK

(1) Choose a subset of documents D* ( D) which are already categorized and compute representative document vectors d*

1, d*2,

…, d*K:

where nk is the number of selected documents to compute the representative document vector from Ck.

(2) Compute the probabilities Pr(zk), Pr(dj|zk ) and Pr(ti|zk) which maximizes eq.(13) by the TEM algorithm, where ∥Z∥=K.

(3) Decide the state for dj as

)( ˆˆ kkCz

kjjkjkk

zddzdz ˆˆ )|Pr()|Pr(max

kj C

jk

k n d

dd1*

(3.1)

(3.2)

Page 12: Classification and clustering methods by probabilistic latent semantic indexing model

No. 12

(1) Choose a proper K (≥S) and compute the probabilities Pr(zk), Pr(dj|zk),

and Pr(ti|zk) which maximizes eq.(13) by the TEM algorithm, where ∥Z

∥ =K.

(2) Decide the state for dj as

(3.3)

If S=K, then kj cd ˆ

kjjkjkk

zddzdz ˆˆ )|Pr()|Pr(max

)( ˆˆ kkcz

3. Proposed Method3. Proposed Method3. Proposed Method

3.2 Clustering method

||Z|| = K : The number of latent states S : The number of clusters

SK

Page 13: Classification and clustering methods by probabilistic latent semantic indexing model

No. 133. Proposed Method3. Proposed Method

(3) If S<K, then compute a similarity measure s(zk, zk'):

(3.4)

(3.5)

and use the group average distance method with the similarity measure s(zk,zk') for agglomeratively clustering the states zk's until the n

umber of clusters becomes S. Then we have S clusters, and the members of each cluster are those of a cluster of states.

||||),(

'

'T

'kk

kkkk zzs

zz

zz

T21 ))|Pr(,),|Pr(),|(Pr( kTkkk ztztzt z

Page 14: Classification and clustering methods by probabilistic latent semantic indexing model

No. 144. Experimental Results4. Experimental Results4. Experimental Results

4.1 Document sets

Table 4.1: Document sets

  contents format# words

Tamount categorize

# selected document

DL+DT

(a) articles of Mainichi news paper in ’94 [Sakai99]

Free(texts only)

107,835101,058

(see Table 4.2)

Yes (9+1

categories)

300 (S=3)

(b)200~ 300 (S=2~ 8)

(c)

Questionnaire(see Table 4.3 in detail)

fixed and free (see Table 4.3) 3,993 135+35

Yes (2

categories)

135+35

Page 15: Classification and clustering methods by probabilistic latent semantic indexing model

No. 154. Experimental Results4. Experimental Results

4.2 Classification problem: (a)

• Experimental data: Mainichi Newspaper in ‘94 (in Japanese) 300 article, 3 categories (free format only)

Conditions of (a)

category contents # articles DL+DT

# used for training DL

# used for test DT

C1 business 100 50 50

C2 local 100 50 50

C3 sports 100 50 50

total 300 150 150

Table 4.2: Selected categories of newspaper

• LSI : K = 81PLSI: K = 10

Page 16: Classification and clustering methods by probabilistic latent semantic indexing model

No. 164. Experimental Results4. Experimental Results

Results of (a)

methodfrom

Ck

to Ck

C1 C2 C3

VS method

C1 17 4 29

C2 8 38 4

C3 15 4 31

LSI method

C1 16 6 28

C2 6 43 1

C3 12 5 33

PLSI methodC1 41 0 9

C2 0 47 3

C3 13 6 31

Proposed method

C1 47 0 3

C2 0 50 0

C3 4 2 44

Table 4.5: Classified number form Ck to for each method

kC ˆ

Page 17: Classification and clustering methods by probabilistic latent semantic indexing model

No. 174. Experimental Results4. Experimental Results

Method Classification error

VSM 42.7%

LSI 38.7%

PLSI 20.7%

Proposed method

6.0%

Table 4.6: Classification error rate

Page 18: Classification and clustering methods by probabilistic latent semantic indexing model

No. 184. Experimental Results4. Experimental Results

Clustering process by EM algorithm

step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 2048

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツ

step : 4096

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d)

経済

スポーツ

社会

経済社会

スポーツsports

localbusiness

business

sports

local

Page 19: Classification and clustering methods by probabilistic latent semantic indexing model

No. 194. Experimental Results4. Experimental Results

4.3 Classification Problem: (b)

We choose S = 2, 3, …, 8 categories, each of which contains DL=100~ 450 articles randomly chosen. The half of them DL is used for training, and the rest of them DT, for test.

Condition of (b)

Page 20: Classification and clustering methods by probabilistic latent semantic indexing model

No. 204. Experimental Results4. Experimental Results

Results of (b)

Fig. 4.2: Classification error rate for DL

Page 21: Classification and clustering methods by probabilistic latent semantic indexing model

No. 214. Experimental Results4. Experimental Results

Results of (b)

Fig. 4.3: Classification error rate for S

Page 22: Classification and clustering methods by probabilistic latent semantic indexing model

No. 224. Experimental Results4. Experimental Results

4.4 Clustering Problem: (c)

Format

Number of questions

Examples

Fixed (item)

7 major questions2

- For how many years have you used computers?- Do you have a plan to study abroad?- Can you assemble a PC?- Do you have any license in information technology?- Write 10 terms in information technology which you know4.

Free (text)

5 questions3

- Write about your knowledge and experience on computers.- What kind of job will you have after graduation?- What do you imagine from the name of the subject?

Table 4.3: Contents of initial questionnaire

2 Each question has 4-21 minor questions.3 Each text is written within 250-300 Chinese and Japanese characters.4 There is a possibility to improve the performance of the proposed method by elimination of these items.

Student Questionnaire

Page 23: Classification and clustering methods by probabilistic latent semantic indexing model

No. 234. Experimental Results4. Experimental Results

4.4 Clustering Problem: (c)

Table 4.4: Object classes

Name of subject CourseNumber of students

Introduction to Computer Science

(Class CS)Science Course 135

Introduction to Information Society

(Class IS)Literary Course 35

Object classes

Page 24: Classification and clustering methods by probabilistic latent semantic indexing model

No. 244. Experimental Results4. Experimental Results

Condition of (c)

I) First, the documents of the students in Class CS and those in Class IS are merged.

II) Then, the merged documents are divided into two class (S=2) by the proposed method.

Class

CS

ClassIS

True class

Merge Clustering by the proposed method

Clustering error C(e)

Fig.4.4 Class partition problem by clustering method

Page 25: Classification and clustering methods by probabilistic latent semantic indexing model

No. 25

initial value

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.70step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.75step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.80step: 1024

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.85step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.90step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 0.95step: 1024

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 64

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 256

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 512

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

beta: 1.00step: 1024

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(z1|d)

学生 CS

IS

students

Results of (c)

2

2

25.0

S

K

Fig.4.7 Clustering process by EM algorithm, K=2

Page 26: Classification and clustering methods by probabilistic latent semantic indexing model

No. 26

similarity

Page 27: Classification and clustering methods by probabilistic latent semantic indexing model

No. 27

初期値

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.70step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.75step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.80step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.85step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.90step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 0.95step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 32

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 128

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 256

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 1024

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

β : 1.00step : 2048

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1p(z1|d)

p(z 2

|d) CS IS

C1C3

C2

4. Experimental Results4. Experimental Results

2

3

25.0

S

K

S=K=2 C(e)=0.411K-means method

Fig.4.7 Clustering process for EM algorithm, K=3

Page 28: Classification and clustering methods by probabilistic latent semantic indexing model

No. 28

C(e) : the ratio of the number of students in the difference set between divided two classes and the original classes to the number of the total students.

4. Experimental Results4. Experimental Results

Fig.4.5 Clustering error rate C(e) vs. λ

(Item only)(Text only)

Page 29: Classification and clustering methods by probabilistic latent semantic indexing model

No. 294. Experimental Results4. Experimental Results

Fig.4.6 Clustering error rate C(e) vs. K

Page 30: Classification and clustering methods by probabilistic latent semantic indexing model

No. 304. Experimental Results4. Experimental ResultsResults of (c)

Statistical analysis by discriminant analysis

jjj xaxaxaaz 5522110

ISClass:0

CSClass:0

j

j

dz

dz

Page 31: Classification and clustering methods by probabilistic latent semantic indexing model

No. 314. Experimental Results4. Experimental Results

Another Experiment

ClassCS

CassG

ClassS

Clustering by the proposed method

Clustering error C(e)

Clustering for class partition problem

Only form IQ

Fig. Another Class partition problem by clustering method

S: Specialist

G: Generalist

Page 32: Classification and clustering methods by probabilistic latent semantic indexing model

No. 324. Experimental Results4. Experimental Results

(1) Member of students in each class

class

Characteristics of students

student's selection

S - Having a good knowledge of technical terms- Hoping the evaluation by exam

G - Having much interest in use of a computer

Clustering

S- Having much interest in theory - Having higher motivation for a graduate school

G- Having much interest in use of a computer- Having a good knowledge of system using the computer

Page 33: Classification and clustering methods by probabilistic latent semantic indexing model

No. 33

(2) Member of students in each class

By discriminant analysis, two classes are evaluated for each partition which are interpreted in table 5. The most convenient case for characteristics of students should be chosen.

4. Experimental Results4. Experimental Results

Page 34: Classification and clustering methods by probabilistic latent semantic indexing model

No. 345. Concluding Remarks5. Concluding Remarks5. Concluding Remarks

(1) We have proposed a classification method for a set of documents and extend it to a clustering method.

(2) The classification method exhibits its better performance for a document set with comparatively small size by using the idea of the PLSI model.

(3) The clustering method also has good performance. We show that it is applicable to documents with both fixed and free formats by introducing a weighting parameter λ.

(4) As an important related study, it is necessary to develop a method for abstracting the characteristics of each cluster [HIIGS03][IIGSH03-b].

(5) An extension to a method for a set of documents with comparatively large size also remains as a further investigation.