learning ling 411 – 19. operations in relational networks relational networks are dynamic ...

Learning

Ling 411 – 19

Operations in relational networks

Relational networks are dynamic Activation moves along lines and through nodes Links have varying strengths

• A stronger link carries more activation, other things being equal

All nodes operate on two principles:• Integration

Of incoming activation• Broadcasting

To other nodes

REVIEW

Operation of the Networkin terms of cortical columns

The linguistic system operates as distributed processing of multiple individual components• “Nodes” in an abstract model• These nodes are implemented as cortical columns

Columnar Functions • Integration: A column is activated if it receives enough

activation from other columns Can be activated to varying degrees Can keep activation alive for a period of time

• Broadcasting: An activated column transmits activation to other columns

Exitatory – contribution to higher level Inhibitory – dampens competition at same level

Review

Additional operations: Learning

Links get stronger when they are successfully used (Hebbian learning)• Learning consists of strengthening them • Hebb 1948

Threshold adjustment• When a node is recruited its threshold increases• Otherwise, nodes would be too easily satisfied

Neural processes for learning

Basic principle: when a connection is successfully used, it becomes stronger• Successfully used if another connection to same

node is simultaneously active Mechanisms of strengthening

• Biochemical changes at synapses• Growth of dendritic spines• Formation of new synapses

Weakening: when neurons fire independently of each other their mutual connections (if any) weaken

Neural processes for learning

If connections AC and BC are active at the same time, and if their joint activation is strong enough to activate C, they both get strengthened

(adapted from Hebb)

Synapses here get strengthened

Requirements that must be assumed(implied by the Hebbian learning principle)

Prerequisites: • Initially, connection strengths are very weak

Term: Latent Links• They must be accompanied by nodes

Term: Latent Nodes• Latent nodes and latent connections must

be available for learning anything learnable The Abundance Hypothesis

• Abundant latent links • Abundant latent nodes

Abundance is a property of biological systems generally

Cf.: Acorns falling from an oak tree Cf.: A sea tortoise lays thousands of eggs

• Only a few will produce viable offspring Cf. Edelman: “silent synapses”

• The great preponderance of cortical synapses are “silent” (i.e., latent)

Electrical activity sent from a cell body to its axon travels to thousands of axon branches, even though only one or a few of them may lead to downstream activation

Learning – The Basic Process

Latent nodes

Latent links

Dedicated nodes and links

Latent nodes

Let these links get activated

Latent nodes

Then these nodes will get activated

That will activate these links

Learning – The Basic Process This node gets enough activation to satisfy its threshold

These links get strengthened and the node’s threshold gets raised

This node is therefore recruited

Learning – The Basic ProcessThis node is now dedicated to function AB

LearningNext time it gets activated it will send activation on these links to next level

Learning: more terms

Child nodes

Potential Actual

Parent nodes

Learning: Deductions from the basic process

Learning is generally bottom-up The knowledge structure as learned by the cognitive

network is hierarchical — has multiple layers Hierarchy and proximity:

• Logically adjacent levels in a hierarchy can be expected to be locally adjacent

Excitatory connections are predominantly from one layer of a hierarchy to the next

Higher levels will tend to have larger numbers of nodes than lower levels

Learning in cortical networks:A Darwinian process

The abundance hypothesis• Needed to allow flexibility of learning• Abundant latent nodes

Must be present throughout cortex• Abundant latent connections of a node

Every node must have abundant latent links A trial-and-error process:

• Thousands of connection possibilities available The abundance hypothesis

• Strengthen those few that succeed Cf. natural selection “Neural Darwinism” (Edelman)

Anatomical support for the hypothesis of abundant latent links

A typical pyramidal node has • thousands of incoming synapses

connecting to its dendrites and its cell body• thousands of output synapses

from multiple branches of its axon But only a very few of these are recruited for a specific

function• For example, the typical node in a functional web has

perhaps only dozens or maybe up to 100 or so links By far the great preponderance of these are latent

• Edelman: “silent synapses”

Learning – Enhanced understanding

This “basic process” is not the full story The nodes of the above depiction:

• Are they minicolumns, maxicolumns, or what?• Most likely, a bundle of contiguous columns• Often a maxicolumn or hypercolumn

Columns of different sizes

Minicolumn• Basic anatomically described unit• 70-110 neurons (avg 75-80)• Diameter barely more than that of pyramidal cell body (30-50 μ)

Maxicolumn (term used by Mountcastle)• Diameter 300-500 μ• Bundle of 100 or more contiguous minicolumns

Hypercolumn – up to 1 mm diameter• Can be long and narrow rather than cylindrical• Bundle of contiguous maxicolumns

Functional column• Intermediate between minicolumn and maxicolumn• A contiguous group of minicolumns

REVIEW

Hypercolums: Modules of maxicolumns

A homotypical area in the temporal lobe of a macaque monkey

REVIEW

Functional columns vis-à-vis minicolumns and maxicolumns

Maxicolumn• About 100 minicolumns• About 300-500 microns in diameter

Functional column• A group of one to several contiguous

minicolumns within a maxicolumn• Established during learning• Initially it might be an entire maxicolumn

Learning in a system with columns of different sizes

At early learning stage, maybe a whole hypercolumn gets recruited

Later, maxicolumns for further distinctions Still later, functional columns as subcolumns within

maxicolumns New term: Supercolumn – a group of minicolumns of

whatever size, hypercolumn, maxicolumn, functional column• Any supercolumn is potentially divisible

Links between supercolumns will thus consist of multiple fibers

Functional columns in phonological recognition:A hypothesis

Demisyllable (e.g. /de-/) activates a maxicolumn Different functional columns within the maxicolumn

for syllables with this demisyllable• /ded/, /deb/, /det/, /dek/, /den/, /del/

REVIEW

Functional columns in phonological recognitionA hypothesis

A maxicolumn (ca. 100 minicolumns)

Divided into functional columns

(Note that all respond to /de-/)

ded den de- det del

REVIEW

Functional columns in phonological recognitionA hypothesis

This one learned first Then, subdivisions are established

ded den de- det del

Adjacent maxicolumns in phonological cortex?

ge- ke-

be- pe-

te- de-A module of contiguous

maxicolumns

Each of these maxicolumns is

divided into functional columns

Note that the entire module responds to [-e-]

Hypercolum

REVIEW

Latent super-columns

Bundles of latent links

Dedicated super-columns and links

Revisit the basic learning diagram: Let each node represent a supercolumn

Let these links get activated

Learning – The Basic Process:Refined view

Then these supercolumns get activated

Learning – The Basic Process:Refined view

That will activate these links

Learning – Refined view This supercolumn gets enough activation to satisfy its threshold

Learning – Refined viewThis super-column is recruited for function AB

Learning:Refined view

Next time it gets activated it will send activation on these links to next level

LearningRefined view

Can get subdivided for finer distinctions

Learning: Refined view

Hypercolumn composed of 3 maxicolumns –Can get subdivided for finer distinctions

A further enhancement

Minicolumns within a supercolumn have mutual horizontal excitatory connections

Therefore, some minicolumns can get activated from their neighbors even if they don’t receive activation from outside

Learning: refined viewIf, later, C is activated along with A and B, then maxicolumn ABC is recruited for ABC

Learning: refined view And the

connection from C to ABC is strengthened –it is no longer latent

Learning phonological distinctions:A hypothesis

ge- ke-

be- pe-

te- de-1. In learning, this hypercolumn gets established first,

responding to [-e-]

2. It gets subdivided into maxicolumns for demisyllables

ded den de- det del

3. The maxicolumn gets divided into functional columns

Remaining question – learning lateral inhibition

When a hypercolumn is first recruited, no lateral inhibition among its internal subdivisions• (Or very little)

Later, when finer distinctions are learned, they get reinforced by lateral inhibition• Latent inhibitory neurons become activated

Question: How does this process work?• I.e., what makes these inhibitory neurons

change from latent to active?

“Evolutionary Learning” andthe Proximity Principle

Related functions tend to be in close proximity• If very closely related, they tend to be adjacent

Areas which integrate properties of different subsystems (e.g., different sensory modalities) tend to be in locations intermediate between those subsystems

Evolutionary Learning and the Proximity Principle

Start with the observation:• Related areas tend to be adjacent to each other

Primary auditory and Wernicke’s area V1 and V2, etc. Wernicke’s area and lexical-conceptual

information – angular gyrus, MTG• Thus we have the ‘proximity principle’

Question: Why – How to explain?

How to Explain the Proximity Principle?

Factors responsible for observations of proximity in cortical structure

1. Economic necessity2. Genetic factors3. Experience – provides details of localization

within the limits imposed by genetic factors

Proximity: Economic necessity

Question: Could a given column be connected to any other column anywhere in the cortex?

That would require a huge number of available latent connections

Way more than are present Hence there are strict limits on intercolumn connectivity Therefore, proximity is necessary just for economy of

representation

Limits on intercolumn connectivity

Number of cortical minicolumns: • If 27 billion neurons in entire cortex• If avg. 77 neurons per minicolumn• Then 350 million minicolumns in the cortex

Extent of available latent connections to other columns• Perhaps 35,000 to 350,000 • Do the math..

A given column has available latent connections to between 1/1000 and 1/10000 of the other columns in the cortex

Locations of available latent connections

Local • Surrounding area• Horizontal connections (grey matter)

Intermediate• Short-distance fibers in white matter• For example from one gyrus to neighboring gyrus

Long-distance• Long-distance fiber bundles• At ends, considerable branching

The role of long-distance fibers

Arcuate fasciculus• Genetically determined• Limits location of phonological recognition area

Interhemispheric fibers• Also genetically determined• Wernicke’s area – RH homolog of W’s area• Broca’s area – RH homolog of B’s area• Etc.

Cortical connectivity properties(Cf. Pulvermüller 2002:17)

Probability of adjacent areas being connected: >70% • But if we count by columns instead of cells the

figure is probably higher, maybe close to 100% Probability of distant areas being connected: 15-30%

• Distant areas: at least one intervening area• In Macaque monkey, most areas have links to 10

or more other areas within same hemisphere

Cortical connectivity properties

Probability of adjacent areas being connected: >70% (Pulvermüller p. 17)• But if we count by minicolumns instead of cells the

figure is probably higher, maybe close to 100% Probability of distant areas being connected: 15-30%

(p. 17)• Distant areas: at least one intervening area• In Macaque monkey, most areas have links to 10

or more other areas within same hemisphere

More cortical connectivity properties

Most areas are connected to homotopic area of opposite hemisphere

Most connections between areas are reciprocal Primary areas not directly connected to one

another, except for motor-somatosensory• Connections under central sulcus

Degrees of separationbetween cortical neurons or columns

For neurons of neighboring columns: 1 For distant neurons in same hemisphere

• Range: 1 to about 5 or 6 (estimate)• Mostly 1, 2, or 3, especially if functionally

closely related• Average about 3 (estimate)

For opposite hemisphere• Add 1 to figures for same hemisphere

Probably, for any two columns anywhere in the cortex, whether functionally related or not, fewer than 6 degrees of separation

Some long-distance fiber bundles(schematic)

Two Factors in Localization

Genetic factors determine general area for a particular type of knowledge

Within this general area the learning-based proximity factors select a more narrowly defined location

Thus the exact localization depends on experience of the individual

When part of the system is damaged, learning-based factors can take over and result in an abnormal location for a function – plasticity

Genetically determined proximity

Genetically-determined proximity would have developed over a long period of evolution• Many features are shared with other mammals

This process could be called ‘evolutionary learning’ According to standard evolutionary theory..

• A process of trial-and-error: Trial

• Produce varieties Error:

• Most varieties will not survive/reproduce• The others – the best among them – are selected

Other genetic factors supplement proximity• Long-distance fiber bundles

Some innate factors relating to localization

Primary areas

Long-distance fiber bundles

Innate factors relating to primary areas

Location • Genetically determined locations

But there are exceptions • Malformation• Damage

Structure • Genetically determined structures adapted to

sensory modality (they have to be where they are) Heterotypical structures

• Found in primary areasPrimary visualPrimary auditory

A Heterotypical (i.e., genetically built-in) structureVisual motion perception

An area in the posterior bank of the superior temporal sulcus of a macaque monkey (“V-5”)

A heterotpical area

Albright et al. 1984400-500 μ

REVIEW

A Heterotypical structure:Auditory areas in a cat’s cortex

AAF – Anterior auditory fieldA1 – Primary auditory field PAF – Posterior auditory fieldVPAF – Ventral posterior auditory field

REVIEW

Innate factors relating to localization

The primary areas Long-distance fiber bundles

• Interhemispheric – via corpus callosum• Longitudinal – from front to back

Arcuate fasciculus is part of the superior longitudinal fasciculus

They allow for exceptions to proximity• Areas closely related yet not neighboring

Implications of the proximity principle

System level• Functionally related subsystems will tend to be close to

one another• Neighboring subsystems will probably have related

functions Cortical column level

• Nodes for similar functions should be physically close to one another

• Nodes that are physically close to one another probably have similar functions Therefore..

• Neighboring nodes are likely to be competitors• They need to have mutually inhibitory connections

Applying the proximity principle

For both types (genetic and experience-based) we can make predictions of where various functions are most likely to be located, based on the proximity principle• Broca’s area near the inferior precentral gyrus• Wernicke’s area near the primary auditory area

Such predictions are possible even in cases where we don’t know whether genetics or learning is responsible• maybe both

Deriving location from proximity hypothesis

The cortex has to provide for “decoding” speech input Speech input enters the cortex in the primary auditory

area Results of the “decoding” (recognition of syllables etc.)

are represented in Wernicke’s area Why is Wernicke’s area where it is?

Speech Recognition in the Left Hemisphere

Primary AuditoryArea

PhonologicalRecognition

PhonologicalProduction

Wernicke’s Area

Exercise: Location of Wernicke’s area

Why is phonological recognition in the posterior superior temporal gyrus?• Alternatives to consider:

Anterior to primary auditory cortex• Advantage: would be close to phonological production

Inferior to primary auditory cortex (There are two reasons)

Answer: Location of Wernicke’s area

Wernicke’s area pretty much has to be where it is to take advantage of the arcuate fasciculus

The location of W.’s area makes it close to angular gyrus, likely area for noun lemmas (morphemes and complex morphemes)

Also, close to SMG, presumed area for phonological monitoring• (Why?

Because it is adjacent to primary somatosensory area)

More exercises

Explaining likely locations of morphemes• verb morphemes in the frontal lobe• noun morphemes in the angular gyrus

and/or middle temporal gyrus The dorsal (where) pathway of visual perception

Experience-based proximity

Can be expected to be operative • more at higher (more abstract) levels, less at

lower levels• for areas of knowledge that have developed

too recently for evolution to have played a role Reading Writing Higher mathematics Physics, computer technology, etc.

Innate features that support language

Columnar structure Coding of frequencies in Heschl’s gyrus Arcuate fasciculus Interhemispheric connections (via corpus callosum)

– e.g., connect Wernicke’s area with RH homolog Spread of myelination from primary areas to

successively higher levels Left-hemisphere dominance for grammar etc.

Consequences of the Proximity Principle

Nodes in close competition will tend to be neighbors• And their mutual competition is preordained even

though the properties they are destined to integrate will only be established through the learning process

Therefore, inhibitory connections should exist predominantly among nodes of the same hierarchical level• Confirmed by neuroanatomy• The presence of their mutual inhibitory connections is

presumably specified genetically

Variation in threshold strength

Thresholds are not fixed• They vary as a result of use – learning

Nor are they integral What we really have are threshold functions,

such that• A weak amount of incoming activation

produces no response• A larger degree of activation results in

weak outgoing activation• A still higher degree of activation yields

strong outgoing activation • S-shaped (“sigmoid”) function

N.B. All of these properties are found in neural structures

Threshold function

--------------- Incoming activation -------------------

learning ling 411 – 19. operations in relational networks relational networks are dynamic ...

nodes threshold

latent nodeslatent nodes

basic processthis node

equalall nodes

learning links

basic processthese links

latent connections

basic processlearning

Documents

relational conflict across networks in the advertising...

mixed membership stochastic...

collective spammer detection in evolving multi-relational...

relational inductive biases, deep learning, and graph...

pei-ling chiu mobile sensor networks. pei-ling chiu2 outline...

hierarchical relational models for document...

extracting semantic networks from text via relational...

composition-based multi-relational graph convolutional...

the relational self-portrait: selfies meet social networks

relational leadership in multi-actor governance networks ·...

hierarchical relational models for document...

relational graph attention networks - arxiv · relational...

1 ling wang advisor: elke a. rundensteiner co-advisor: kathi...

extracting semantic networks from text via relational...

1 relational dependency networks - purdue university ·...

relational topic...

linguistic evidence for relational networks ling 411 – 15

modeling and inference with relational dynamic bayesian...

an introduction to markov logic networks and their use in...

cerias tech report 2007-83 relational dependency networks...