self organizing maps

27
IBM Confidential Self Organizing Map | October 26, 2004 Kohonen’s Self Organizing Maps | October 26, 2004 Kohonen’s Self Organizing Map Self Organizing Maps Mahendra Mani Ojha 01005024 Pranshu Sharma 01005026 Shivendra S. Meena 01005030 Under the Guidance of: Prof. Pushpak Bhattacharya

Upload: brie

Post on 12-Jan-2016

65 views

Category:

Documents


4 download

DESCRIPTION

Self Organizing Maps. Under the Guidance of: Prof. Pushpak Bhattacharya. Mahendra Mani Ojha01005024 Pranshu Sharma01005026 Shivendra S. Meena01005030. Overview. Terminology used Introduction of SOM Components of SOM Structure of the map Training algorithms of the map - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Self Organizing Maps

IBM ConfidentialSelf Organizing Map | October 26, 2004Kohonen’s Self Organizing Maps | October 26, 2004

Kohonen’s Self Organizing Map

Self Organizing Maps

Mahendra Mani Ojha 01005024Pranshu Sharma 01005026Shivendra S. Meena 01005030

Under the Guidance of:

Prof. Pushpak Bhattacharya

Page 2: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Overview

Terminology used Introduction of SOM Components of SOM Structure of the map Training algorithms of the map Advantages and disadvantages Proof of Convergence Applications Conclusion Reference

Page 3: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Terminology used

Clustering Unsupervised learning Euclidean Distance

n

1i

2ii

n21

n21

)qp(ED

)q,...,q,q(q

)p,...,p,p(p

Page 4: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Introduction of SOM

Introduced by Prof. Teuvo Kohonen in 1982 Also known as Kohonen feature map Unsupervised neural network Clustering tool of high-dimensional and complex data

Page 5: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Introduction of SOM contd…

Maintains the topology of the dataset Training occurs via competition between the

neurons Impossible to assign network nodes to specific

input classes in advance Can be used for detecting similarity and degrees

of similarity It is assumed that input pattern fall into

sufficiently large distinct groupings Random weight vector initialization

Page 6: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Components of SOM

Sample data

Weights

Output nodes

Page 7: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Structure of the map

2-dimensional or 1-dimensional grid

Each grid point represents a output node The grid is initialized with random vectors

Page 8: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Training Algorithm

Initialize Map For t from 0 to 1

– Select a sample

– Get best matching unit

– Scale neighbors

– Increase t a small amount

End for

)t(Ni

)]t(m)t(x)[t()t(m)1t(m

c

iii

Page 9: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Page 10: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Page 11: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Initializing the weights

SOMs are computationally very expensive. Good Initialization

– Less iterations

– Quality of Map

Page 12: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Get Best Matching Unit

Any method for vector distance i. e.–Nearest neighbor

–Farthest neighbor

–Distance between means

–Distance between medians Most common method is Euclidean distance.

More than one contestant, choose randomly

n

0i

2ix

Page 13: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Scale Neighbors

Determining Neighbors–Neighborhood size

Decreases over time

–Effect on neighbors

Learning

vector_positionr

tcoefficienlearning)t(

||)]rr(||x)3/2(exp[)t(

i

mi

)t(m)1t(m

,otherwise

)]t(m)t(x)[t()t(m)1t(m

),t(Ni

ii

iii

c

Page 14: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Necessary conditions

Amount of training data

Change of weights should be– In excited neighborhood

– Proportional to activation received

Page 15: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Advantages – Very easy to understand

– Works well Disadvantages

– computationally expensive

– every SOM is different

Page 16: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Proof of convergence Complete proof only for one dimension.

– Very trivial Almost all partial proofs are based on

– Markov chains Difficulties :

– No definition for “A correctly ordered configuration”

– Proved result : It is not possible to associate a “Global decreasing potential function” with this algorithm.

Page 17: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

WEBSOM

Page 18: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

WebSOM (overview)

Millions of Documents to be Searched Keywords or Key phrases used for searching DATA is clustered

– According to similarity

– Context It is kind of a Similarity graph of DATA For proper storage raw text documents must be

encoded for mapping.

Page 19: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Feature Vectors / Encoding

Can simply be histograms of words of the Document.

(Histogram may be the input vector but that makes it a very large Input vector, so there is a need of some kind of reduction)

Reduction– Reduction by random mapping

– Weighted word histogram (based on word frequency)

– By Latent Semantic Analysis

Page 20: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

WebSOM

Architecture– Word category Map

– Document category Map Modes of Operation

– Supervised

(some information about the class is given, for e.g. in the collection of Newsgroup articles maybe the name of news group is supplied)

– Unsupervised

(no information provided)

Page 21: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Word Category Map

Preprocessing– Remove unimportant data (like images, signatures)

– Remove articles prepositions etc.

– Words occurring less than some fixed no. of times are to be termed as don’t care !

– Replace synonymous words

Page 22: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Averaging Method

Word code vector–Each word represented by a unique vector (with dimension n ~ 100)

–Values may be random Context Vector

–For word at position i word vector is x(i)

where:

– E() = Estimate of expected value of x over text corpus

– ε = small scalar number

Page 23: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

(contd.)

Training: taking words with different x(i)’s

Input X(i)’s again . At the best matching node

write the corresponding word .

Similar context words come at same node

Example

Page 24: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Document Category Map

Encoded by mapping text word by word onto the WCM.

A histogram is formed based on the hits on WCM.

Use this histogram as fingerprint for DCM.

Page 25: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Summary:

Page 26: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

Demo

Page 27: Self Organizing Maps

Kohonen’s Self Organizing Map

Kohonen’s Self Organizing Map | October 26, 2004

References

[1] T.Honkela, S.Kaski, K.Lagus,T.Kohonen. WEBSOM- Self Organizing Maps of Document Collection. (1997) [2] T.Honkela, S.Kaski, K.Lagus,T.Kohonen. Exploration of full text Databases with Self Organizing Maps. (1996) [3] Teuvo Kohonen. Self-Organization of very large document

collections: State of the art (1998) [4] http://websom.hut.fi/websom