learning ling 411 – 19. operations in relational networks relational networks are dynamic ...
TRANSCRIPT
Learning
Ling 411 – 19
Operations in relational networks
Relational networks are dynamic Activation moves along lines and through nodes Links have varying strengths
• A stronger link carries more activation, other things being equal
All nodes operate on two principles:• Integration
Of incoming activation• Broadcasting
To other nodes
REVIEW
Operation of the Networkin terms of cortical columns
The linguistic system operates as distributed processing of multiple individual components• “Nodes” in an abstract model• These nodes are implemented as cortical columns
Columnar Functions • Integration: A column is activated if it receives enough
activation from other columns Can be activated to varying degrees Can keep activation alive for a period of time
• Broadcasting: An activated column transmits activation to other columns
Exitatory – contribution to higher level Inhibitory – dampens competition at same level
Review
Additional operations: Learning
Links get stronger when they are successfully used (Hebbian learning)• Learning consists of strengthening them • Hebb 1948
Threshold adjustment• When a node is recruited its threshold increases• Otherwise, nodes would be too easily satisfied
Neural processes for learning
Basic principle: when a connection is successfully used, it becomes stronger• Successfully used if another connection to same
node is simultaneously active Mechanisms of strengthening
• Biochemical changes at synapses• Growth of dendritic spines• Formation of new synapses
Weakening: when neurons fire independently of each other their mutual connections (if any) weaken
Neural processes for learning
A
B
C
If connections AC and BC are active at the same time, and if their joint activation is strong enough to activate C, they both get strengthened
(adapted from Hebb)
Synapses here get strengthened
Requirements that must be assumed(implied by the Hebbian learning principle)
Prerequisites: • Initially, connection strengths are very weak
Term: Latent Links• They must be accompanied by nodes
Term: Latent Nodes• Latent nodes and latent connections must
be available for learning anything learnable The Abundance Hypothesis
• Abundant latent links • Abundant latent nodes
Abundance is a property of biological systems generally
Cf.: Acorns falling from an oak tree Cf.: A sea tortoise lays thousands of eggs
• Only a few will produce viable offspring Cf. Edelman: “silent synapses”
• The great preponderance of cortical synapses are “silent” (i.e., latent)
Electrical activity sent from a cell body to its axon travels to thousands of axon branches, even though only one or a few of them may lead to downstream activation
Learning – The Basic Process
Latent nodes
Latent links
Dedicated nodes and links
Latent nodes
Let these links get activated
Learning – The Basic Process
Learning – The Basic Process
Latent nodes
Then these nodes will get activated
Learning – The Basic Process
That will activate these links
Learning – The Basic Process This node gets enough activation to satisfy its threshold
Learning – The Basic Process
These links get strengthened and the node’s threshold gets raised
AB
This node is therefore recruited
Learning – The Basic ProcessThis node is now dedicated to function AB
AB
AB
LearningNext time it gets activated it will send activation on these links to next level
AB
AB
Learning: more terms
Child nodes
Potential Actual
Parent nodes
AB
AB
Learning: Deductions from the basic process
Learning is generally bottom-up The knowledge structure as learned by the cognitive
network is hierarchical — has multiple layers Hierarchy and proximity:
• Logically adjacent levels in a hierarchy can be expected to be locally adjacent
Excitatory connections are predominantly from one layer of a hierarchy to the next
Higher levels will tend to have larger numbers of nodes than lower levels
Learning in cortical networks:A Darwinian process
The abundance hypothesis• Needed to allow flexibility of learning• Abundant latent nodes
Must be present throughout cortex• Abundant latent connections of a node
Every node must have abundant latent links A trial-and-error process:
• Thousands of connection possibilities available The abundance hypothesis
• Strengthen those few that succeed Cf. natural selection “Neural Darwinism” (Edelman)
Anatomical support for the hypothesis of abundant latent links
A typical pyramidal node has • thousands of incoming synapses
connecting to its dendrites and its cell body• thousands of output synapses
from multiple branches of its axon But only a very few of these are recruited for a specific
function• For example, the typical node in a functional web has
perhaps only dozens or maybe up to 100 or so links By far the great preponderance of these are latent
• Edelman: “silent synapses”
Learning – Enhanced understanding
This “basic process” is not the full story The nodes of the above depiction:
• Are they minicolumns, maxicolumns, or what?• Most likely, a bundle of contiguous columns• Often a maxicolumn or hypercolumn
Columns of different sizes
Minicolumn• Basic anatomically described unit• 70-110 neurons (avg 75-80)• Diameter barely more than that of pyramidal cell body (30-50 μ)
Maxicolumn (term used by Mountcastle)• Diameter 300-500 μ• Bundle of 100 or more contiguous minicolumns
Hypercolumn – up to 1 mm diameter• Can be long and narrow rather than cylindrical• Bundle of contiguous maxicolumns
Functional column• Intermediate between minicolumn and maxicolumn• A contiguous group of minicolumns
REVIEW
Hypercolums: Modules of maxicolumns
A homotypical area in the temporal lobe of a macaque monkey
REVIEW
Functional columns vis-à-vis minicolumns and maxicolumns
Maxicolumn• About 100 minicolumns• About 300-500 microns in diameter
Functional column• A group of one to several contiguous
minicolumns within a maxicolumn• Established during learning• Initially it might be an entire maxicolumn
Learning in a system with columns of different sizes
At early learning stage, maybe a whole hypercolumn gets recruited
Later, maxicolumns for further distinctions Still later, functional columns as subcolumns within
maxicolumns New term: Supercolumn – a group of minicolumns of
whatever size, hypercolumn, maxicolumn, functional column• Any supercolumn is potentially divisible
Links between supercolumns will thus consist of multiple fibers
Functional columns in phonological recognition:A hypothesis
Demisyllable (e.g. /de-/) activates a maxicolumn Different functional columns within the maxicolumn
for syllables with this demisyllable• /ded/, /deb/, /det/, /dek/, /den/, /del/
REVIEW
Functional columns in phonological recognitionA hypothesis
[de-]
A maxicolumn (ca. 100 minicolumns)
Divided into functional columns
(Note that all respond to /de-/)
deb
ded den de- det del
dek
REVIEW
Functional columns in phonological recognitionA hypothesis
[de-]
This one learned first Then, subdivisions are established
deb
ded den de- det del
dek
Adjacent maxicolumns in phonological cortex?
ge- ke-
be- pe-
te- de-A module of contiguous
maxicolumns
Each of these maxicolumns is
divided into functional columns
Note that the entire module responds to [-e-]
Hypercolum
REVIEW
Latent super-columns
Bundles of latent links
Dedicated super-columns and links
Revisit the basic learning diagram: Let each node represent a supercolumn
Let these links get activated
Learning – The Basic Process
Learning – The Basic Process:Refined view
Then these supercolumns get activated
Learning – The Basic Process:Refined view
That will activate these links
Learning – Refined view This supercolumn gets enough activation to satisfy its threshold
Learning – Refined viewThis super-column is recruited for function AB
AB
AB
Learning:Refined view
Next time it gets activated it will send activation on these links to next level
AB
AB
LearningRefined view
Can get subdivided for finer distinctions
AB
AB
Learning: Refined view
Hypercolumn composed of 3 maxicolumns –Can get subdivided for finer distinctions
AB
AB
A further enhancement
Minicolumns within a supercolumn have mutual horizontal excitatory connections
Therefore, some minicolumns can get activated from their neighbors even if they don’t receive activation from outside
Learning: refined viewIf, later, C is activated along with A and B, then maxicolumn ABC is recruited for ABC
AB
AB
C
ABC
Learning: refined view And the
connection from C to ABC is strengthened –it is no longer latent
AB
AB
C
ABC
Learning phonological distinctions:A hypothesis
ge- ke-
be- pe-
te- de-1. In learning, this hypercolumn gets established first,
responding to [-e-]
2. It gets subdivided into maxicolumns for demisyllables
deb
ded den de- det del
dek
3. The maxicolumn gets divided into functional columns
Remaining question – learning lateral inhibition
When a hypercolumn is first recruited, no lateral inhibition among its internal subdivisions• (Or very little)
Later, when finer distinctions are learned, they get reinforced by lateral inhibition• Latent inhibitory neurons become activated
Question: How does this process work?• I.e., what makes these inhibitory neurons
change from latent to active?
“Evolutionary Learning” andthe Proximity Principle
Related functions tend to be in close proximity• If very closely related, they tend to be adjacent
Areas which integrate properties of different subsystems (e.g., different sensory modalities) tend to be in locations intermediate between those subsystems
Evolutionary Learning and the Proximity Principle
Start with the observation:• Related areas tend to be adjacent to each other
Primary auditory and Wernicke’s area V1 and V2, etc. Wernicke’s area and lexical-conceptual
information – angular gyrus, MTG• Thus we have the ‘proximity principle’
Question: Why – How to explain?
How to Explain the Proximity Principle?
Factors responsible for observations of proximity in cortical structure
1. Economic necessity2. Genetic factors3. Experience – provides details of localization
within the limits imposed by genetic factors
Proximity: Economic necessity
Question: Could a given column be connected to any other column anywhere in the cortex?
That would require a huge number of available latent connections
Way more than are present Hence there are strict limits on intercolumn connectivity Therefore, proximity is necessary just for economy of
representation
Limits on intercolumn connectivity
Number of cortical minicolumns: • If 27 billion neurons in entire cortex• If avg. 77 neurons per minicolumn• Then 350 million minicolumns in the cortex
Extent of available latent connections to other columns• Perhaps 35,000 to 350,000 • Do the math..
A given column has available latent connections to between 1/1000 and 1/10000 of the other columns in the cortex
Locations of available latent connections
Local • Surrounding area• Horizontal connections (grey matter)
Intermediate• Short-distance fibers in white matter• For example from one gyrus to neighboring gyrus
Long-distance• Long-distance fiber bundles• At ends, considerable branching
The role of long-distance fibers
Arcuate fasciculus• Genetically determined• Limits location of phonological recognition area
Interhemispheric fibers• Also genetically determined• Wernicke’s area – RH homolog of W’s area• Broca’s area – RH homolog of B’s area• Etc.
Cortical connectivity properties(Cf. Pulvermüller 2002:17)
Probability of adjacent areas being connected: >70% • But if we count by columns instead of cells the
figure is probably higher, maybe close to 100% Probability of distant areas being connected: 15-30%
• Distant areas: at least one intervening area• In Macaque monkey, most areas have links to 10
or more other areas within same hemisphere
Cortical connectivity properties
Probability of adjacent areas being connected: >70% (Pulvermüller p. 17)• But if we count by minicolumns instead of cells the
figure is probably higher, maybe close to 100% Probability of distant areas being connected: 15-30%
(p. 17)• Distant areas: at least one intervening area• In Macaque monkey, most areas have links to 10
or more other areas within same hemisphere
More cortical connectivity properties
Most areas are connected to homotopic area of opposite hemisphere
Most connections between areas are reciprocal Primary areas not directly connected to one
another, except for motor-somatosensory• Connections under central sulcus
Degrees of separationbetween cortical neurons or columns
For neurons of neighboring columns: 1 For distant neurons in same hemisphere
• Range: 1 to about 5 or 6 (estimate)• Mostly 1, 2, or 3, especially if functionally
closely related• Average about 3 (estimate)
For opposite hemisphere• Add 1 to figures for same hemisphere
Probably, for any two columns anywhere in the cortex, whether functionally related or not, fewer than 6 degrees of separation
Some long-distance fiber bundles(schematic)
Two Factors in Localization
Genetic factors determine general area for a particular type of knowledge
Within this general area the learning-based proximity factors select a more narrowly defined location
Thus the exact localization depends on experience of the individual
When part of the system is damaged, learning-based factors can take over and result in an abnormal location for a function – plasticity
Genetically determined proximity
Genetically-determined proximity would have developed over a long period of evolution• Many features are shared with other mammals
This process could be called ‘evolutionary learning’ According to standard evolutionary theory..
• A process of trial-and-error: Trial
• Produce varieties Error:
• Most varieties will not survive/reproduce• The others – the best among them – are selected
Other genetic factors supplement proximity• Long-distance fiber bundles
Some innate factors relating to localization
Primary areas
Long-distance fiber bundles
Innate factors relating to primary areas
Location • Genetically determined locations
But there are exceptions • Malformation• Damage
Structure • Genetically determined structures adapted to
sensory modality (they have to be where they are) Heterotypical structures
• Found in primary areasPrimary visualPrimary auditory
A Heterotypical (i.e., genetically built-in) structureVisual motion perception
An area in the posterior bank of the superior temporal sulcus of a macaque monkey (“V-5”)
A heterotpical area
Albright et al. 1984400-500 μ
REVIEW
A Heterotypical structure:Auditory areas in a cat’s cortex
AAF – Anterior auditory fieldA1 – Primary auditory field PAF – Posterior auditory fieldVPAF – Ventral posterior auditory field
A1
REVIEW
Innate factors relating to localization
The primary areas Long-distance fiber bundles
• Interhemispheric – via corpus callosum• Longitudinal – from front to back
Arcuate fasciculus is part of the superior longitudinal fasciculus
They allow for exceptions to proximity• Areas closely related yet not neighboring
Implications of the proximity principle
System level• Functionally related subsystems will tend to be close to
one another• Neighboring subsystems will probably have related
functions Cortical column level
• Nodes for similar functions should be physically close to one another
• Nodes that are physically close to one another probably have similar functions Therefore..
• Neighboring nodes are likely to be competitors• They need to have mutually inhibitory connections
Applying the proximity principle
For both types (genetic and experience-based) we can make predictions of where various functions are most likely to be located, based on the proximity principle• Broca’s area near the inferior precentral gyrus• Wernicke’s area near the primary auditory area
Such predictions are possible even in cases where we don’t know whether genetics or learning is responsible• maybe both
Deriving location from proximity hypothesis
The cortex has to provide for “decoding” speech input Speech input enters the cortex in the primary auditory
area Results of the “decoding” (recognition of syllables etc.)
are represented in Wernicke’s area Why is Wernicke’s area where it is?
Speech Recognition in the Left Hemisphere
Primary AuditoryArea
PhonologicalRecognition
PhonologicalProduction
Wernicke’s Area
Exercise: Location of Wernicke’s area
Why is phonological recognition in the posterior superior temporal gyrus?• Alternatives to consider:
Anterior to primary auditory cortex• Advantage: would be close to phonological production
Inferior to primary auditory cortex (There are two reasons)
Answer: Location of Wernicke’s area
Wernicke’s area pretty much has to be where it is to take advantage of the arcuate fasciculus
The location of W.’s area makes it close to angular gyrus, likely area for noun lemmas (morphemes and complex morphemes)
Also, close to SMG, presumed area for phonological monitoring• (Why?
Because it is adjacent to primary somatosensory area)
More exercises
Explaining likely locations of morphemes• verb morphemes in the frontal lobe• noun morphemes in the angular gyrus
and/or middle temporal gyrus The dorsal (where) pathway of visual perception
Experience-based proximity
Can be expected to be operative • more at higher (more abstract) levels, less at
lower levels• for areas of knowledge that have developed
too recently for evolution to have played a role Reading Writing Higher mathematics Physics, computer technology, etc.
Innate features that support language
Columnar structure Coding of frequencies in Heschl’s gyrus Arcuate fasciculus Interhemispheric connections (via corpus callosum)
– e.g., connect Wernicke’s area with RH homolog Spread of myelination from primary areas to
successively higher levels Left-hemisphere dominance for grammar etc.
Consequences of the Proximity Principle
Nodes in close competition will tend to be neighbors• And their mutual competition is preordained even
though the properties they are destined to integrate will only be established through the learning process
Therefore, inhibitory connections should exist predominantly among nodes of the same hierarchical level• Confirmed by neuroanatomy• The presence of their mutual inhibitory connections is
presumably specified genetically
Variation in threshold strength
Thresholds are not fixed• They vary as a result of use – learning
Nor are they integral What we really have are threshold functions,
such that• A weak amount of incoming activation
produces no response• A larger degree of activation results in
weak outgoing activation• A still higher degree of activation yields
strong outgoing activation • S-shaped (“sigmoid”) function
N.B. All of these properties are found in neural structures
Threshold function
--------------- Incoming activation -------------------
Out
goin
g a
ctiva
tion
end