template design only ©copyright 2008 ohio universitymedia production 740.597-2521 spring quarter a...

1
ate design only ©copyright 2008 • Ohio University•Media Production • 740.597-2521 • Spring Quarter A hierarchical neural network structure for text learning is obtained through self-organization Similar representation for text based semantic network was used [2] An input layer takes in characters, then learns and activates words stored in memory Direct activation of words requires large computational cost for large dictionaries Extension to phrases, sentences or paragraphs would render such a network impractical due to associated computational cost Computer memory required would also be tremendously large This leads to a sparse hierarchical structure. The higher layers represent more complex concepts. Basic nodes in this network are capable of differentiating input sequences. Sequence learning is prerequisite to building spatio-temporal memories. This is performed using laminar minicolumn [3] LTM cells (Fig.1) In such networks, the interconnection scheme is naturally obtained through sequence learning and structural self-organization. No prior assumption about locality of connections or structure sparsity is made. Machine learns only inputs useful to its objectives, a process that is regulated by reinforcement signals and self organization. Hierarchical Neural Network for Text Based Learning Janusz A Starzyk, Basawaraj Ohio University, Athens, OH Introduction References Hierarchical Network Traditional approach is to describe semantic network structure and/or probabilities of transition in associated Markov models Biological networks learn Different Neural Network structures, but common goal Simple and efficient to solve the given problem Sparsity is essential Size of the network and time to train important for large data sets Hierarchical structure of identical processing units was proposed [1] Layered organization and sparse structure is biologically inspired Neurons on different layers interact through trained links Mountcastle, V. B., et. al, Response Properties of Neurons of Cat’s Somatic Sensory Cortex to Peripheral Stimuli, J. Neurophysiology, vol. 20, 1957, pp. 374- 407. Rogers, T. T., McClelland, J. L., Semantic Cognition text: A parallel Distributed Processing Approach, 2004, MIT Press . Grossberg, S., How does the cerebral cortex work? Learning, attention and grouping by the laminar circuits of visual cortex. Spatial Vision, 12, 163-186,1999. Starzyk J.A., Liu Y., Hierarchical spatio-temporal memory for machine learning based on laminar minicolumn structure, 11 th ICCNS, Boston, 2007. Network Simplification Proposed approach uses intermediate neurons to lower the computational cost Intermediate neurons decrease number of activations associated with higher level neurons This concept can be extended to associations of words Small number of rules for concurrent processing are used We can arrive at local optimum of network structure / performance The network topology is self-organizing through addition and removal of neurons and redirecting of neuron connections Neurons are described by their sets of input and output neurons Local optimization criteria are checked by searching the set SL A before the structure is updated when creating or merging the neurons. where IL is the input list of A OL X X A IL SL C OL OL OL C IL IL IL C IL IL IL B A OL IL IL IL C X X C B B C A A C B A C i i Fig. 3 If create a new node C. 3 B A IL IL B A B Fig. 4 Neuron “A” with single output is merged with Neuron “B”, and “A” is removed. A B B IL IL IL B A A Fig. 5 Neuron “B” with single input is merged with Neuron “A”, and “B” is removed. B A A OL OL OL Batch Mode: All words used for training are available at initiation. Network simplification & optimization is done by processing - all the words in the training set. Total number of neurons is 23% higher than the reference (6000) Dynamic Mode: Words used for training are increased incrementally, - one word at a time Simplification & optimization is done by processing - one word at a time. Total number of neurons is 68% higher than the reference (6000) Implementation Results and Conclusion Tests were run with dictionary up to 6000 words The percent reduction in number of interconnections increases (by up to 65 – 70%) as the number of words increase. The time required to process network activation for all the words used decreases as the number of words increases (reduction by a factor of 55, in batch mode; and 35, in dynamic mode; for 6000 words). Dynamic implementation takes longer compared to the batch implementation, mainly due to the additional overhead required for bookkeeping. The savings (connections and activations) obtained in case of dynamic implementation are less compared to the batch implementation Combination of both methods is advisable for continuous learning and self-organization. A B X 4 X 1 X 2 X 3 A B X 4 C X 1 X 2 X 3 Rules for Self- Organization Fig. 2 If create a new node C. 3 B A OL OL C IL IL IL C OL OL OL C OL OL OL B A IL OL OL OL C X X C B B C A A C B A C i i X 1 B X 3 X 4 X 2 A B A C X 4 X 3 X 2 X 1 Few simple rules used for self organization Rules Contd. 0 1000 2000 3000 4000 5000 6000 10 -1 10 0 10 1 10 2 D ecrease in P rocessing Tim e v/s N o.ofW ords N o.ofW ords D ecrease in P rocessing T im e Tbatch /Treference Tdynam ic /Treference 0 1000 2000 3000 4000 5000 6000 35 40 45 50 55 60 65 70 75 80 85 FinalC onnection as percentofO riginalC onnections v/s N o.ofW ords N o.ofW ords FinalC onnections as percentofO riginalC onnections B atch Dynam ic 0 1000 2000 3000 4000 5000 6000 0 200 400 600 800 1000 1200 1400 N etwork Sim plification Tim e v/s N o.ofW ords No.ofW ords Netw ork S im plification Tim e (sec) B atch Dynam ic 0 1000 2000 3000 4000 5000 6000 0 10 20 30 40 50 60 70 D ecrease in A ctivation v/s N o.ofW ords N o.of W ords D ecrease in A ctivation B atch Dynam ic Fig. 1 LTM cell with minicolumns [4] PN LTM cell (“ARRAY”) “A” “R” “Y” Layer 6 Layer 2 Layer 4 Layer 3

Upload: silvester-rich

Post on 16-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Template design only ©copyright 2008 Ohio UniversityMedia Production 740.597-2521 Spring Quarter  A hierarchical neural network structure for text learning

Tem

pla

te d

esi

gn o

nly

©co

pyr

ight

200

8 •

Ohi

o U

niv

ersi

ty•M

edia

P

rodu

ctio

n •

74

0.59

7-25

21 •

S

prin

g Q

uart

er

A hierarchical neural network structure for text learning is obtained through self-organization Similar representation for text based semantic network was used [2] An input layer takes in characters, then learns and activates words stored in memory Direct activation of words requires large computational cost for large dictionaries Extension to phrases, sentences or paragraphs would render such a network impractical due to associated computational cost Computer memory required would also be tremendously large

This leads to a sparse hierarchical structure. The higher layers represent more complex concepts. Basic nodes in this network are capable of differentiating input sequences. Sequence learning is prerequisite to building spatio-temporal memories. This is performed using laminar minicolumn [3] LTM cells (Fig.1)

In such networks, the interconnection scheme is naturally obtained through sequence learning and structural self-organization. No prior assumption about locality of connections or structure sparsity is made. Machine learns only inputs useful to its objectives, a process that is regulated by reinforcement signals and self organization.

Hierarchical Neural Network for Text Based LearningJanusz A Starzyk, Basawaraj

Ohio University, Athens, OH

Introduction

References

Hierarchical Network

Traditional approach is to describe semantic network structure and/or probabilities of transition in associated Markov models Biological networks learn Different Neural Network structures, but common goal

Simple and efficient to solve the given problem Sparsity is essential

Size of the network and time to train important for large data sets Hierarchical structure of identical processing units was proposed [1] Layered organization and sparse structure is biologically inspired Neurons on different layers interact through trained links

Mountcastle, V. B., et. al, Response Properties of Neurons of Cat’s Somatic Sensory Cortex to Peripheral Stimuli, J. Neurophysiology, vol. 20, 1957, pp. 374-407. Rogers, T. T., McClelland, J. L., Semantic Cognition text: A parallel Distributed Processing Approach, 2004, MIT Press . Grossberg, S., How does the cerebral cortex work? Learning, attention and grouping by the laminar circuits of visual cortex. Spatial Vision, 12, 163-186,1999. Starzyk J.A., Liu Y., Hierarchical spatio-temporal memory for machine learning based on laminar minicolumn structure, 11 th ICCNS, Boston, 2007.

Network Simplification Proposed approach uses intermediate neurons to lower the computational cost Intermediate neurons decrease number of activations associated with higher level neurons This concept can be extended to associations of words Small number of rules for concurrent processing are used We can arrive at local optimum of network structure / performance The network topology is self-organizing through addition and removal of neurons and redirecting of neuron connections Neurons are described by their sets of input and output neurons Local optimization criteria are checked by searching the set SLA before the structure is updated when creating or merging the neurons.

where ILX is the input list of neuron X and

OLA is the output list of neuron A

AOLX

XA ILSL

COLOLOL

CILILIL

CILILIL

BAOL

ILILIL

CXX

CBB

CAA

C

BAC

ii

Fig. 3 If create a new node C.3BA ILIL

B

AB

Fig. 4 Neuron “A” with single output is merged with Neuron “B”, and “A” is removed.

ABB ILILIL

B

AA

Fig. 5 Neuron “B” with single input is merged with Neuron “A”, and “B” is removed.

BAA OLOLOL

Batch Mode: All words used for training are available at initiation. Network simplification & optimization is done by processing

- all the words in the training set. Total number of neurons is 23% higher than the reference (6000)

Dynamic Mode: Words used for training are increased incrementally,

- one word at a time Simplification & optimization is done by processing

- one word at a time. Total number of neurons is 68% higher than the reference (6000)

Implementation

Results and Conclusion Tests were run with dictionary up to 6000 words The percent reduction in number of interconnections increases

(by up to 65 – 70%) as the number of words increase. The time required to process network activation for all the words used decreases as the number of words increases (reduction by a factor of 55, in batch mode; and 35, in dynamic mode; for 6000 words). Dynamic implementation takes longer compared to the batch implementation, mainly due to the additional overhead required for bookkeeping. The savings (connections and activations) obtained in case of dynamic implementation are less compared to the batch implementation Combination of both methods is advisable for continuous learning and self-organization.

A B

X4X1 X2 X3

A B

X4

C

X1 X2 X3

Rules for Self-Organization

Fig. 2 If create a new node C.3BA OLOL

CILILIL

COLOLOL

COLOLOL

BAIL

OLOLOL

CXX

CBB

CAA

C

BAC

ii

X1

B

X3X4X2

ABA

C

X4X3X2X1

Few simple rules used for self organization

Rules Contd.

0 1000 2000 3000 4000 5000 600010

-1

100

101

102

Decrease in Processing Time v/s No. of Words

No. of Words

Decre

ase in P

rocessin

g T

ime

Tbatch / Treference

Tdynamic / Treference

0 1000 2000 3000 4000 5000 600035

40

45

50

55

60

65

70

75

80

85Final Connection as percent of Original Connections v/s No. of Words

No. of Words

Fin

al C

onnections a

s p

erc

ent

of

Origin

al C

onnections

Batch

Dynamic

0 1000 2000 3000 4000 5000 60000

200

400

600

800

1000

1200

1400Network Simplification Time v/s No. of Words

No. of Words

Netw

ork

Sim

plif

ication T

ime (

sec)

Batch

Dynamic

0 1000 2000 3000 4000 5000 60000

10

20

30

40

50

60

70Decrease in Activation v/s No. of Words

No. of Words

Decre

ase in A

ctivation

Batch

Dynamic

Fig. 1 LTM cell with minicolumns [4]

PN …

LTM cell (“ARRAY”)

“A” “R” “Y”

Layer 6

Layer 2

Layer 4

Layer 3