(1) learning (2) the proximity principle and “evolutionary learning” ling 411 – 17

70
(1) Learning (2) The proximity principle and “evolutionary learning” Ling 411 – 17

Upload: derek-smith

Post on 26-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

(1) Learning

(2) The proximity principle and

“evolutionary learning”

Ling 411 – 17

Schedule of Presentations

DelclosPlanum Temp

BanneyerCategories

Ruby TsoWriting

BosleySynesthesia

Rasmussen2nd language

BrownBilingualism

TsaiTones

Tu Apr 13 Th Apr 15 Tu Apr 20 Th Apr 22

Operations in relational networks

Relational networks are dynamic Activation moves along lines and

through nodes Links have varying strengths

•A stronger link carries more activation, other things being equal

All nodes operate on two principles:• Integration

Of incoming activation•Broadcasting

To other nodes

REVIEW

Operation of the Networkin terms of cortical columns

The linguistic system operates as distributed processing of multiple individual components• “Nodes” in an abstract model

• These nodes are implemented as cortical columns

Columnar Functions • Integration: A column is activated if it receives

enough activation from other columns Can be activated to varying degrees Can keep activation alive for a period of time

• Broadcasting: An activated column transmits activation to other columns Exitatory – contribution to higher level Inhibitory – dampens competition at same level

Review

Additional operations: Learning

Links get stronger when they are successfully used (Hebbian learning)•Learning consists of strengthening them

•Hebb 1948

Threshold adjustment•When a node is recruited its threshold

increases

•Otherwise, nodes would be too easily satisfied

Requirements that must be assumed(implied by the Hebbian learning

principle) Links get stronger when they are

successfully used (Hebbian learning)•Learning consists of strengthening them

Prerequisites: • Initially, connection strengths are very weak

Term: Latent Links

•They must be accompanied by nodes Term: Latent Nodes

•Latent nodes and latent connections must be available for learning anything learnable The Abundance Hypothesis

•Abundant latent links

•Abundant latent nodes

Support for the abundance hypothesis

Abundance is a property of biological systems generally•Cf.: Acorns falling from an oak tree

•Cf.: A sea tortoise lays thousands of eggs Only a few will produce viable offspring

•Cf. Edelman: “silent synapses” The great preponderance of cortical

synapses are “silent” (i.e., latent)

•Electrical activity sent from a cell body to its axon travels to thousands of axon branches, even though only one or a few of them may lead to downstream activation

Learning – The Basic Process

Latent nodes

Latent links

Dedicated nodes and links

Latent nodes

Let these links get activated

Learning – The Basic Process

Learning – The Basic Process

Latent nodes

Then these nodes will get activated

Learning – The Basic Process

That will activate these links

Learning – The Basic Process This node gets enough activation to satisfy its threshold

Learning – The Basic Process

These links now get strengthened and the node’s threshold gets raised A

B

This node is therefore recruited

Learning – The Basic Process

This node is now dedicated to function AB

AB

AB

LearningNext time it gets activated it will send activation on these links to next level

AB

AB

Learning: more terms

Child nodes

Potential Actual

Parent nodes

AB

AB

Learning: Deductions from the basic process

Learning is generally bottom-up. The knowledge structure as learned by the

cognitive network is hierarchical — has multiple layers

Hierarchy and proximity:• Logically adjacent levels in a hierarchy can be

expected to be locally adjacent

Excitatory connections are predominantly from one layer of a hierarchy to the next

Higher levels will tend to have larger numbers of nodes than lower levels

Learning in cortical networks:A Darwinian process

A trial-and-error process:•Thousands of possibilities available

The abundance hypothesis•Strengthen those few that succeed

“Neural Darwinism” (Edelman) The abundance hypothesis

•Needed to allow flexibility of learning•Abundant latent nodes

Must be present throughout cortex•Abundant latent connections of a node

Every node must have abundant latent links

Learning – Enhanced understanding

This “basic process” is not the full story The nodes of this depiction:

•Are they minicolumns, maxicolumns, or what?

•Most likely, a bundle of contiguous columns

•Perhaps usually a maxicolumn or hypercolumn

Columns of different sizes

Minicolumn• Basic anatomically described unit• 70-110 neurons (avg 75-80)• Diameter barely more than that of pyramidal cell

body (30-50 μ) Maxicolumn (term used by Mountcastle)

• Diameter 300-500 μ• Bundle of 100 or more contiguous minicolumns

Hypercolumn – up to 1 mm diameter• Can be long and narrow rather than cylindrical• Bundle of contiguous maxicolumns

Functional column• Intermediate between minicolumn and

maxicolumn• A contiguous group of minicolumns

REVIEW

Hypercolums: Modules of maxicolumns

A homotypical area in the temporal lobe of a macaque monkey

REVIEW

Functional columns vis-à-vis minicolumns and maxicolumns

Maxicolumn•About 100 minicolumns

•About 300-500 microns in diameter

Functional column•A group of one to several contiguous

minicolumns within a maxicolumn

•Established during learning

• Initially it might be an entire maxicolumn

Learning in a system with columns of different sizes

At early learning stage, maybe a whole hypercolumn gets recruited

Later, maxicolumns for further distinctions Still later, functional columns as

subcolumns within maxicolumns New term: Supercolumn – a group of

minicolumns of whatever size, hypercolumn, maxicolumn, functional column

Links between supercolumns will thus consist of multiple fibers

Question on cortical columns

E-mail from Kelly Banneyer:

…. I understand that a minicolumn is the smallest unit and maxicolumns are composed of minicolumns and functional columns are intermediate in size while hypercolumns are composed of several maxicolumns. I wonder if there can exist a minicolumn or functional column in the brain that is not part of a larger type of column. For example, I know that there exists hierarchical structure, but is there maybe some concept so exact and unrelated to anything else that a mini/functional column exists that is not part of a maxicolumn?

Functional columns in phonological recognition:A hypothesis

Demisyllable (e.g. /de-/) activates a maxicolumn

Different functional columns within the maxicolumn for syllables with this demisyllable• /ded/, /deb/, /det/, /dek/, /den/, /del/

REVIEW

Functional columns in phonological recognition

A hypothesis

[de-]

A maxicolumn (ca. 100 minicolumns)

Divided into functional columns

(Note that all respond to /de-/)

deb

ded den de- det del

dek

REVIEW

Phonological hypercolumns (a hypothesis)

Maybe we have •Hypercolumn of contiguous maxicolumns for

/e/

•With maxicolumns for /de-/, /be-/, etc.

•Each such maxicolumn subdivided into functional columns for different finals /det/, /ded/, /den/, /deb/, /dem/. /dek/

(N.B.: This is just a hypothesis)•Maybe someday soon we’ll be able to test

with sensitive brain imaging

REVIEW

Adjacent maxicolumns in phonological cortex?

ge- ke-

be- pe-

te- de-A module of contiguous

maxicolumns

Each of these maxicolumns is

divided into functional columns

Note that the entire module responds to [-e-]

Hypercolum

REVIEW

Adjacent maxicolumns in phonological cortex?

ge- ke-

be- pe-

te- de-A module of

six contiguous maxicolumns

The entire module responds to [-e-]

deb

ded den de- det del

dek

The entire maxicolumn responds to [de-]

REVIEW

Latent super-columns

Bundles of latent links

Dedicated super-columns and links

Revisit the diagram: Each node of the diagram represents a group of minicolumns – a supercolumn

Let these links get activated

Learning – The Basic Process

Learning – The Basic Process:Refined view

Then these supercolumns get activated

Learning – The Basic Process:Refined view

That will activate these links

Learning – Refined view This supercolumn gets enough activation to satisfy its threshold

Learning – Refined viewThis super-column is recruited for function AB

AB

AB

Learning:Refined view

Next time it gets activated it will send activation on these links to next level

AB

AB

LearningRefined view

Can get subdivided for finer distinctions

AB

AB

A further enhancement

Minicolumns within a supercolumn have mutual horizontal excitatory connections

Therefore, some minicolumns can get activated from their neighbors even if they don’t receive activation from outside

Learning: Refined view

Hypercolumn composed of 3 maxicolumns Can get subdivided for finer distinctions

AB

AB

Learning: refined viewIf, later, C is activated along with A and B, then maxicolumn ABC is recruited for ABC

AB

AB

C

ABC

Learning: refined view

And the connection from C to ABC is strengthened –it is no longer latent

AB

AB

C

ABC

Learning phonological distinctions:A hypothesis

ge- ke-

be- pe-

te- de-1. In learning,

this hypercolumn

gets established

first, responding to

[-e-]2. It gets subdivided into maxicolumns for demisyllables

deb

ded den de- det del

dek

3. The maxicolumn gets divided into functional columns

Remaining problems – lateral inhibition

When a hypercolumn is first recruited, no lateral inhibition among its internal subdivisions

Later, when finer distinctions are learned, they get reinforced by lateral inhibition

Problem: How does this work?

Hypothesis applied to conceptual categories

A whole maxicolumn gets activated for the category•Example: DRINKING-VESSEL

Different functional columns within the maxicolumn for subcategories• CUP, GLASS, etc.

Adjacent maxicolumns for categories related to DRINKING VESSEL

• BOWL, JAR, etc.

REVIEW

Locating Functions:The Proximity Principle

Related functions tend to be in close proximity• If very closely related, they tend to be

adjacent 

Areas which integrate properties of different subsystems (e.g., different sensory modalities) tend to be in locations intermediate between those subsystems

Consequences of the Proximity Principle

Nodes in close competition will tend to be neighbors•And their mutual competition is preordained

even though the properties they are destined to integrate will only be established through the learning process

Therefore, inhibitory connections should exist predominantly among nodes of the same hierarchical level•The presence of their mutual inhibitory

connections could be genetically specified

Learning and the Proximity Principle

Start with the observation:• Related areas tend to be adjacent to each

other Primary auditory and Wernicke’s area V1 and V2, etc. Wernicke’s area and lexical-conceptual

information – angular gyrus, SMG, MTG

Thus we have the ‘proximity principle’ Question: Why – How to explain?

Two aspects of the proximity principle

1. A node that integrates a combination of properties of different subsystems can be expected to lie in a location intermediate between those subsystems

2. A node that integrates a combination of properties of the same subsystem should be within the same subsystem, and maximally close to the properties it integrates

How to Explain the Proximity Principle?

Factors responsible for observations of proximity in cortical structure1. Economic necessity2. Genetic factors3. Experience – provides details of localization

within the limits imposed by genetic factors

Proximity: Economic necessity

Question: Could a given column be connected to any other column anywhere in the cortex?

That would require a huge number of available latent connections

Way more than are present Hence there are strict limits on

intercolumn connectivity Therefore, proximity is necessary just for

economy of representation

Limits on intercolumn connectivity

Number of cortical minicolumns: • If 27 billion neurons in entire cortex

• If avg. 77 neurons per minicolumn

•Then 350 million minicolumns in the cortex

Extent of available latent connections to other columns•Perhaps 35,000 to 350,000

•Do the math.. A given column has available latent

connections to between 1/1000 and 1/10000 of the other columns in the cortex

Locations of available latent connections

Local •Surrounding area•Horizontal connections (grey matter)

Intermediate•Short-distance fibers in white matter•For example from one gyrus to neighboring

gyrus Long-distance

•Long-distance fiber bundles•At ends, considerable branching

The role of long-distance fibers

Arcuate fasciculus•Genetically determined

•Limits location of phonological recognition area

Interhemispheric fibers•Also genetically determined

•Wernicke’s area – RH homolog of W’s area

•Broca’s area – RH homolog of B’s area

•Etc.

Two Factors in Localization

Genetic factors determine general area for a particular type of knowledge

Within this general area the learning-based proximity factors select a more narrowly defined location

Thus the exact localization depends on experience of the individual

When part of the system is damaged, learning-based factors can take over and result in an abnormal location for a function – plasticity

Genetically determined proximity

Genetically-determined proximity would have developed over a long period of evolution• Many features are shared with other mammals

This process could be called ‘evolutionary learning’

According to standard evolutionary theory..• A process of trial-and-error:

Trial • Produce varieties

Error: • Most varieties will not survive/reproduce

• The others – the best among them – are selected

Other genetic factors supplement proximity• Long-distance fiber bundles

Some innate factors relating to localization

Primary areas

Long-distance fiber bundles

Innate factors relating to primary areas

Location •Genetically determined locations

But there are exceptions •Malformation

•Damage

Structure •Genetically determined structures adapted to

sensory modality (they have to be where they are) Heterotypical structures

•Found in primary areas» Primary visual» Primary auditory

A Heterotypical (i.e., genetically built-in) structure

Visual motion perception

An area in the posterior bank of the superior temporal sulcus of a macaque monkey (“V-5”)

A heterotpical area

Albright et al. 1984

400-500 μ

REVIEW

A Heterotypical structure:Auditory areas in a cat’s cortex

AAF – Anterior auditory fieldA1 – Primary auditory field PAF – Posterior auditory fieldVPAF – Ventral posterior auditory field

A1

REVIEW

Innate factors relating to localization

The primary areas Long-distance fiber bundles

• Interhemispheric – via corpus callosum

•Longitudinal – from front to back Arcuate fasciculus is part of the superior

longitudinal fasciculus

They allow for exceptions to proximity•Areas closely related yet not neighboring

Applying the proximity principle

For both types (genetic and experience-based) we can make predictions of where various functions are most likely to be located, based on the proximity principle•Broca’s area near the inferior precentral gyrus

•Wernicke’s area near the primary auditory area

Such predictions are possible even in cases where we don’t know whether genetics or learning is responsible• maybe both

Implications of the proximity principle

System level•Functionally related subsystems will tend to

be close to one another

•Neighboring subsystems will probably have related functions

Cortical column level•Nodes for similar functions should be

physically close to one another

•Nodes that are physically close to one another probably have similar functions Therefore..

•Neighboring nodes are likely to be competitors

•They need to have mutually inhibitory connections

Deriving location from proximity hypothesis

The cortex has to provide for “decoding” speech input

Speech input enters the cortex in the primary auditory area

Results of the “decoding” (recognition of syllables etc.) are represented in Wernicke’s area

Why is Wernicke’s area where it is?

Speech Recognition in the Left Hemisphere

Primary AuditoryArea

PhonologicalRecognition

PhonologicalProduction

Wernicke’s Area

Exercise: Location of Wernicke’s area

Why is phonological recognition in the posterior superior temporal gyrus?•Alternatives to consider:

Anterior to primary auditory cortex•Advantage: would be close to phonological

production

Inferior to primary auditory cortex

(There are two reasons)

Answer: Location of Wernicke’s area

Wernicke’s area pretty much has to be where it is to take advantage of the arcuate fasciculus

The location of W.’s area makes it close to angular gyrus, likely area for noun lemmas (morphemes and complex morphemes)

Also, close to SMG, presumed area for phonological monitoring• (Why?

Because it is adjacent to primary somatosensory area)

More exercises

Explaining likely locations of morphemes•verb morphemes in the frontal lobe

•noun morphemes in the angular gyrus and/or middle temporal gyrus

The dorsal (where) pathway of visual perception

Experience-based proximity

Can be expected to be operative •more at higher (more abstract) levels, less

at lower levels

• for areas of knowledge that have developed too recently for evolution to have played a role Reading Writing Higher mathematics Physics, computer technology, etc.

Innate features that support language

Columnar structure Coding of frequencies in Heschl’s gyrus Arcuate fasciculus Interhemispheric connections (via corpus

callosum) – e.g., connect Wernicke’s area with RH homolog

Spread of myelination from primary areas to successively higher levels

Left-hemisphere dominance for grammar etc.

end