from neurons to neural networks - home | institute for ... neurons to neural networks jeff knisley...

From Neurons to Neural Networks

Jeff KnisleyEast Tennessee State University

Mathematics of Molecular and Cellular Biology Seminar

Institute for Mathematics and its ApplicationsInstitute for Mathematics and its Applications, April 2, 2008

Outline of the TalkOutline of the Talk

Brief Description of the NeuronBrief Description of the NeuronA “Hot-Spot” Dendritic Model

Cl i l H d ki H l (HH) M d lClassical Hodgkin-Huxley (HH) ModelA Recent Approach to HH Nonlinearity

Artificial Neural Nets (ANN’s)1957 – 1969: Perceptron Models1980’s – soon: MLP’s and Others1990’s – : Neuromimetic (Spiking) Neurons1990 s : Neuromimetic (Spiking) Neurons

Components of a NeuronComponents of a Neuron

Soma AxonSynaptic Terminals

Dendrites nucleus

Myelin Sheaths

Pre-Synaptic to Post-SynapticPre Synaptic to Post Synaptic

If threshold exceededIf threshold exceeded,then neuron “fires,” sending a signal l ialong its axon.

Signal Propagation along AxonSignal Propagation along Axon

Signal is electricalgMembrane depolarization from resting -70 mVMyelin acts as an insulator

Propagation is electro-chemicalSodium channels open at breaks in myelin

SMuch higher external Sodium ion concentrationsPotassium ions “work against” sodiumChloride, other influences also very importantChloride, other influences also very important

Rapid depolarization at these breaksSignal travels faster than if only electrical

Signal Propagation along AxonSignal Propagation along Axon

reversal+++- - -

reversalreversal+++- - - +++- - -reversal

- - -+++

reversal

Action PotentialsAction Potentials

Sodium ion channels open and closeWhich causes

Potassium ion channels to open and close

Action PotentialsAction Potentials

Model “Spike”Model Spike

Actual Spike TrainActual Spike Train

Post-Synaptic may be SubThresholdPost Synaptic may be SubThreshold

Signals Decay at Soma if below a Certain threshold

M d l b iModels beginwith section of a dendrite.

Derivation of the ModelDerivation of the Model

Some AssumptionsSome AssumptionsAssume Neuron separates R3 into 3 regions—interior (i), exterior (e), and boundary membrane surface (m)Assume El is electric field and Bl is magnetic flux density, where l = e, i

Maxwell’s Equations:Maxwell s Equations: Assume magnetic induction is negligible

∂B 0ll t

−∂∇× = =

∂BE

Ee = – ∇Ve and Ei = – ∇Vi for potentials Vl , l = i,e

Current Densities ji and jCurrent Densities ji and je

Let σl = conductivity 2-tensor, l = i eLet σl conductivity 2 tensor, l i, eIntracellular homogeneous; small radiusExtracellular: Ion Populations!Extracellular: Ion Populations!

Ohm’s Law (local): lll Ej σ=j

Charges (ions) collecton outside of boundarysurface (especially Na+)

ji

je surface (especially Na )

h bme I∝⋅∇ j

+ + + +

L0where Im = membranecurrents. Thus,

( ) IV ∝∇∇ σ0=⋅∇ ij 02 =∇ iV

( ) mee IV ∝∇⋅∇ σ

Assume: Circular Cross-sectionsAssume: Circular Cross sectionsLet V = Vi – Ve – Vrest be membrane potential difference, and let i e rest

Rm, Ri , C be the membrane resistance, intracellular resistance, membrane capacitance, respectively. Let Isyn be a “catch all” for ion channel activityfor ion channel activity.

ionm

m ItVC

RVI +

∂∂

+=

d

d V V VC I⎛ ⎞∂ ∂ ∂

= + +⎜ ⎟Lord Kelvin: I( ) ( )4 syni m

C Ix R x x R x t

= + +⎜ ⎟∂ ∂ ∂⎝ ⎠Cable EquationIion

Dimensionless CablesR d

Let and let and τm= RmC constant4

m

i

R dx XR

=

2

2 m m synV V V R I

X tτ∂ ∂

− − =∂ ∂

IionX t∂ ∂

Tapered Cylinders: Z instead of X and a taper constant K.

2

2 m m synV V VK V R I

Z Zτ∂ ∂ ∂

+ − − =∂ ∂ ∂

Iion2 m m synZ Z t∂ ∂ ∂ ion

Rall’s Theorem for UntaperedRall s Theorem for Untapered

d htIf at each branching the parent diameter and the daughter cylinder

parentdaughters

diameters satisfy

3/2 3/2d d∑3/2 3/2parent j

j daughtersd d

∈

= ∑ Equivalent Cylinder

then the dendritic tree can be reduced to a single equivalent cylinder. g q y

Dendritic ModelsDendritic Models

SomaSoma

Full Arbor ModelTapered Equivalent Cylinder

Tapered Equivalent CylinderTapered Equivalent Cylinder

Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder

“Assume “hot spots” at x0, x1, …, xm

Soma

. . .

0 x0 x1 . . . xm l

Ion Channel Hot SpotsIon Channel Hot Spots

(Poznanski) Ij due to ion channel(s) at the jth hot spot

( ) ( )2 nR d V V∂ ∂ ∑ ( ) ( )2

14m

m j jji

R d V VR C V I t x xR x t

δ=

∂ ∂− − = −

∂ ∂ ∑

Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0

Plus boundary conditions and Initial conditionsGreen is solution to Equivalent Cylinder model

Equivalent Cylinder Model (Ii = 0)For Tapered Equivalent Cylinder

M d l ti i f th fEquivalent Cylinder Model (Iion 0)

2V V∂ ∂

Model, equation is of the form

( ) 02 ∂∂

+∂ VVVZFV τ

2 0mV V V

X tV

τ∂ ∂− − =

∂ ∂∂

( ) 02 =−∂

−∂

+∂

VtZ

ZFZ mτ

( ), 0 ( )V L t no current through endX∂

=∂

( ) ( ) ( ) ( )tanh 0,0, 0, s

L V tV t V tX t

τρ

⎛ ⎞∂∂= +⎜ ⎟∂ ∂⎝ ⎠

Soma: V (0,t) = Vclamp (voltage clamp)

( ),0 . .

X t

V x Steady State from const curr

ρ∂ ∂⎝ ⎠=

PropertiesProperties

Spectrum is solely non-negative eigenvaluesp y g gEigenvectors are orthogonal in Voltage ClampEigenvectors are not orthogonal in original

Solutions are multi-exponential decays

( ) ( )∑∞

−= /, tk

keXCtXV τ

Linear Models useful for subthreshold activation

( ) ( )∑=1

,k

k eXCtXV

Linear Models useful for subthreshold activation assuming nonlinearities (Iion) are not arbitrarily close to soma (and no electric field (ephaptic) effects)

Somatic Voltage RecordingSomatic Voltage Recording

Saturate to Steady State

Experimental ArtifactMultiExponential Decayp y

Ionic Channel Effects

0 10ms

Hodgkin-Huxley: Ionic CurrentsHodgkin Huxley: Ionic Currents

1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn)sy

From Numerous Voltage Clamp Experiments with squid giant axon (0.5-1.0 mm in diameter)Produces Action Potentials

Ionic Channelsn = potassium activation variablem = sodium activation variableh = sodium inactivation variable

Hodgkin-Huxley EquationsHodgkin Huxley Equations

( ) ( ) ( )2

4 3d V V∂ ∂ ( ) ( ) ( )

( ) ( )

4 324

1 1

m l K NaK Nai

d V VC g V V g n V V g m h V VR x t

n mβ β

∂ ∂− − − = − + −

∂ ∂

∂ ∂( ) ( )

( )

1 , 1 ,

1

n n m mn mn n m mt t

h h h

α β α β

α β

∂ ∂= − − = − −

∂ ∂∂

= ( )1h hh ht

α β= − −∂

where any V with subscript is constant, any g with a bar is d h f h ’ d β’ f i il fconstant, and each of the α’s and β’s are of similar form:

( ) ( ) /8010 1 VVV V eα β −−= =( ) ( ) ( )10 /10

,8100 1

n nVV V e

eα β

−= =

⎡ ⎤−⎣ ⎦

HH combined with “Hot Spots”HH combined with Hot Spots

The solution to the equiv cylinder with hotspots is

( ) ( ) ( )n t

V t V G t I d+∑∫where Ij is the restriction of V to jth “hot spot”.

( ) ( ) ( )0

0, , ,initial j j

jV x t V G x x t I dτ τ τ

=

= + −∑∫j j p

At a hot-spot, V satisfies ODE of the form

( ) ( ) ( )4 3VC V V V V h V V∂

where m n and h are functions of V

( ) ( ) ( )4 3m l K NaK NaC g V V g n V V g m h V V

t= − + − + −

∂where m, n, and h are functions of V.

Brief description of an Approach to HH ion channel nonlinearities

Goal: Accessible Approximations that still produce action potentials.Can be addressed using Linear Embedding which isCan be addressed using Linear Embedding, which is closely related to the method of Turning Variables.

Maps an finite degree polynomially nonlinear dynamical system into an infinite degree linear systemsystem into an infinite degree linear system.The result is an infinite dimensional linear system which is as unmanageable as the original nonlinear equation.

N l t ith ti f i lNon-normal operators with continua of eigenvaluesDifficult to project back to nonlinear system (convergence and stability are thorny)

B t till th h h l ( ti t ti l )But still the approach has some value (action potentials).

The Hot-Spot Model “Qualitatively”The Hot Spot Model Qualitatively

n

( ) ( ) ( )0

00, 0, ,

n t

j jV t G x t I dτ τ τ= −∑∫0j=

Inputs fromOther Neurons

and ion channels

From Subthreshold (Rall Eq. Cyl or

Full Arbor) and ion channels

Key Features: Summation of Synaptic Inputs. If V(0,t)

Full Arbor)

is large, action potential travels down axon.

Artificial Neural Network (ANN)Artificial Neural Network (ANN)

Made of artificial neurons, each of whichMade of artificial neurons, each of whichSums inputs xi from other neuronsCompares sum to thresholdCompares sum to thresholdSends signal to other neurons if above threshold

Synapses have weightsModel relative ion collectionsModel efficacy (strength) of synapse

Artificial NeuronArtificial Neuron

th thi jw synaptic weight betweeni and j neuron=

thj threshold of j neuronθ =( ) " "firing function that maps state to outputσ =j f j

1x2x 1iw

2iwNonlinear firing function

( )i i ix sσ θ= −i ij js w x= Σ3x 2i

3iww

..

.

nx inw

First Generation: 1957 - 1969First Generation: 1957 1969

Best Understood in terms of ClassifiersBest Understood in terms of ClassifiersPartition a data space into regions containing data points of the same classification. The regions are predictions of the classification of new data points.

Simple Perceptron ModelSimple Perceptron Model

Given 2 classes – Reference and SampleGiven 2 classes Reference and Sample

⎧w1

⎩⎨⎧

=referencefromifsamplefromif

Output01w2

w

Firing function (activation function) has only two

wn

Firing function (activation function) has only two values, 0 or 1. “Learning” is by incremental updating of weights g y p g gusing a linear learning rule

Perceptron LimitationsPerceptron Limitations

Cannot Do XOR (1969, Minsky and Papert)Cannot Do XOR (1969, Minsky and Papert)Data must be linearly separable

1970’ ANN’ “Wild E i ” l1970’s: ANN’s “Wilderness Experience” – only a handful working and very “un-neuron-like”

Support Vector Machine: Perceptron on a Feature Space

Data is projected into a high-dimensionalData is projected into a high dimensional Feature Space, separated with a hyperplane

Choice of Feature Space (kernel) is key. p ( ) yPredictions based on location of hyperplane

Second Generation: 1981 - SoonSecond Generation: 1981 Soon

Big Ideas from other FieldsgJ. J. Hopfield compares neural networks to Ising Spin Glass models. Uses statistical Mechanics to prove that ANN’s minimize aMechanics to prove that ANN’s minimize a total energy functional.Cognitive Psychology provides new insights g y gy p ginto how neural networks learn.

Big Ideas from MathKolmogorov’s Theorem

AND

Firing Functions are SigmoidalFiring Functions are Sigmoidal

jκjκ

j

θ jθ

( ) 1sσ θ =( ) ( )1 j j jj j s

se κ θ

σ θ− −

− =+

3 Layer Neural Network3 Layer Neural Network

The output layer mayconsist of a single neuron

Output

Hidden

Input

Hidden(is usually much larger)

Multilayer Network

( )1 1 1tσ θ ξ− =w x( )

1x2x 1α

2αN

ξ∑3x...

...

2α

α1

j jj

out α ξ=

=∑

nx

( )tσ θ ξ− =w x

Nα

( )N N Nσ θ ξ=w x

( )N

tout α σ θ= −∑ w x( )1

j j jj

out α σ θ=

= −∑ w x

Hilbert’s Thirteenth ProblemHilbert s Thirteenth Problem

Original: “Are there continuous functions of 3Original: Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”Modern: Can a continuous function of nvariables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?of 1 variable?

Kolmogorov’s TheoremKolmogorov s Theorem

Modified Version: Any continuous function fModified Version: Any continuous function f of n variables can be written

2 1n n+ ⎛ ⎞

where only h and w’s depend on f

( ) ( )2 1

11 1

, ,n n

n ij j ij i

f s s h g sω+

= =

⎛ ⎞= ⎜ ⎟

⎝ ⎠∑ ∑K

where only h and w s depend on f

(That is the g’s are fixed)(That is, the g s are fixed)

Cybenko (1989)Cybenko (1989)

Let σ be any continuous sigmoidal function, y g ,and let x = (x1,…,xn). If f is absolutely integrable over the n-dimensional unit cube, then for all ε>0,there exists a (possibly very large ) integer N andvectors w1,…,wN such that

( ) ( )1

NT

j j jj

f α σ θ ε=

− − <∑x w x

where α1,…,αN and θ1,…,θN are fixed parameters.

Multilayer Network (MLP’s)

( )1 1 1tσ θ ξ− =w x( )

1x2x 1α

2αN

ξ∑3x...

...

2α

α1

j jj

out α ξ=

=∑

nx

( )tσ θ ξ− =w x

Nα

( )N N Nσ θ ξ=w x

( )N

tout α σ θ= −∑ w x( )1

j j jj

out α σ θ=

= −∑ w x

ANN as a Universal ClassifierANN as a Universal Classifier

Designs a function f : Data -> ClassesDesigns a function f : Data ClassesExample: f ( Red ) = 1, f ( Blue) = 0Support of f defines the regionsSupport of f defines the regions

Data is used to train (i.e., design ) function fsupp(f)supp(f)

Example – Predicting Trees that are or are not RNA-like

D d-t d-a d-L d-D Lamb-2 E-ratio Randics

0.333333 0.666667 0.666667 0.5 0.666667 0.2679 0.8 2.914214

0.333333 0.5 0.5 0.5 0.666667 0.3249 1 2.770056

0.5 0.5 0.5 0.5 0.5 0.382 1 2.80806

0 166667 0 333333 0 5 0 833333 0 833333 1 2 2 236068

RNALike

0.166667 0.333333 0.5 0.833333 0.833333 1 2 2.236068

0.333333 0.333333 0.333333 0.666667 0.666667 0.4384 1.2 2.642734

0 333333 0 333333 0 333333 0 666667 0 666667 0 4859 1 4 2 56066

NotRNA 0.333333 0.333333 0.333333 0.666667 0.666667 0.4859 1.4 2.56066RNALike

Construct Graphical InvariantsConstruct Graphical InvariantsTrain ANN using known RNA-treesPredict the othersPredict the others

2nd Generation: Phenomenal Success2 Generation: Phenomenal Success

Data Mining of Micro-array dataData Mining of Micro array dataStock and commodities trading: ANN’s are an important part of “computerized trading” p p p gPost office mail sorting

This tiny 3-Dimensional Artificial Neural Network, modeled after neural networksmodeled after neural networksin the human brain, is helpingmachines better visualize their surroundings.

The Mars RoversThe Mars RoversANN decides between “rough” and “smooth”

“rough” and “smooth”areare ambiguousLearningvia manyvia many“examples”

A d l t kAnd a neural network can lose up to 10% of its neurons without significant loss in performance!

ANN LimitationsANN Limitations

Overfitting: e.g, if Training Set is “unbalanced”g g, g

Overfitting may

Produce

Mislabeled data can lead to slow (or no)

ProduceIsolatedRegions

Mislabeled data can lead to slow (or no) convergence or incorrect results.Hard Margins: No “fuzzing” of the boundaryHard Margins: No fuzzing of the boundary

Problems on the HorizonProblems on the Horizon

Limitations are becoming very limitingLimitations are becoming very limitingTrained networks often are poor learners (and self-learners are hard to train))In real neural networks, more neurons imply better networks (not so in ANNs ).Temporal data is problematic – ANN’s have no concept or a poor concept of time

“H b idi d ANN’ ” b i th l“Hybridized ANN’s” becoming the ruleSVM’s probably the tool of choice at presentSOFM’ F ANN’ C ti iSOFM’s, Fuzzy ANN’s, Connectionism

Third Generation: 1997 -Third Generation: 1997

Back to Bio: Spiking Neural Networks (SNN)Back to Bio: Spiking Neural Networks (SNN)Asynchronous, action-potential driven ANN’s have been around for some time.SNN’s show “promise” but results beyond current ANN’s have been elusive

Simulating actual HH equations (neuromimetic) has to date not been enoughTime is both a promise and a curseTime is both a promise and a curse

A Possible Approach: Use current dendritic models to modify existing ANN’s. y g

ANN’s with Multiple Time ScalesANN s with Multiple Time Scales

SNN that reduces to ANN & preserves Kolmogorov Thmp gThe solution to the equiv cylinder with hotspots is

( ) ( ) ( )n t

∑∫f th “ ”

( ) ( ) ( )0

0

0, 0, ,t

initial j jj

V t V G x t I dτ τ τ=

= + −∑∫where Ij is the restriction of V to jth “hot spot”.Equivalent Artificial Neuron:

( ) ( ) ( )∑∫≠

−=ij

t

jji dxtts τττω0

Incorporating MultiExponentialsIncorporating MultiExponentials

G (0,x,t) is often a multi-exponential decay.( , , ) p yIn terms of time constants τk

∫∞

⎞⎛n t ( )∑ ∫∑=

−

=

⎟⎠⎞⎜

⎝⎛=

j

t

jut

jkk

duxeews kk

10

//

1

τττ

wjk are synaptic “weights” τk from electrotonic and morphometric datak p

Rate of taper, Length of dendritesBranching, capacitance, resistance

Approximation and SimplificationApproximation and Simplification

If xj(u) approx 1 or xj(u) approx 0, thenj( ) pp j( ) pp ,

( ) ( )txews j

nt

kjkk∑∑ −

∞

−= /1 ττ

A Special Case (k is a constant)j k= =1 1

( )( ) j

n

j

ktjj xepws ∑

=

−−+=1

1

t = 0 yields the standard Neural Net ModelStandard Neural Net as initial Steady State

j

Modify with time-dependent transient

Artificial NeuronArtificial Neuron

thj threshold of j neuronθ =

wij, pij = synaptic weights

( ) " "firing function that maps state to outputσ =j f j

1x2x

Nonlinear firing functionwi1, pi1

( )i i ix sσ θ= −3x...

( )( ) j

n

j

ktijiji xepws ∑

=

−−+=1

1

w pnx win, pin

Steady State and TransientSteady State and Transient

Sensitivity and Soft Marginsy gt = 0 is a perceptron with weights wijt = ∞ is a perceptron with weights wij + pijj jFor all t in (0, ∞), a traditional ANN with weights between wij and wij + pij

Transient is a perturbation schemeTransient is a perturbation schemeMany predictions over time (soft margins)

AlgorithmgPartition training set into subsetsTrain at t=0 for initial subsetTrain at t > 0 values for other subsets

Training the NetworkTraining the Network

Define an energy functionDefine an energy function

( )∑ −−

=n

iiiE 21 πξα

π vectors are the information to be “learned”

( )∑=i

iiiE12

πξα

π vectors are the information to be learnedNeural networks minimize energyThe “information” in the network isThe information in the network is equivalent to the minima of the total squared energy functionenergy function

Back PropagationBack Propagation

Minimize Energy E E∂ ∂gy

Choose wj and αj so thatIn practice, this is hard

0, 0ij j

E Ew α∂ ∂

= =∂ ∂

Back Propagation with cont. sigmoidalFeed Forward, Calculate E, modify weights

( )( )−−=+=n

jjjjjjjnewj

newj yyy 1, πδξλδαα

Repeat until E is sufficiently close to 0

( )∑=

−+=n

jjjkjjjj

newj

newj xww

1

δαπξξλ

Repeat until E is sufficiently close to 0

Back Propagation with TransientBack Propagation with Transient

Train Network Initially (choose wj and αj)y ( j j)Each “synapse” given a transient weight pij

−ktnewnew )1(ξλδ

( )∑−

−

−−+=

−+=n

jjkjjjkt

jnew

joutputnew

jhidden

ktjj

newjoutput

newjoutput

expp

epp

,,

,,

)1(

),1(

δαπξξλ

ξλδ

Algorithm Addressing Over-fitting/SensitivityW i ht t b i d i iti l l

( )∑=j

jjkjjjjjoutputjhidden1

,, )( ξξ

Weights must be given random initial valuesWeights pij also given random initial valuesSeparate Training of w and α and pSeparate Training of wj and αj and pijameliorates over-fitting during the training sequence

Observations/ResultsObservations/Results

Spiking does occurSpiking does occurBut only if network is properly “initiated”Spikes only resemble Action PotentialsSpikes only resemble Action Potentials

This is one approach to SNN’sNot likely to be the final wordNot likely to be the final wordOther real neuron features may be necessary (e.g., tapering axons can limit frequency of action potentials: also—branching! )

This approach does show promise in handling temporal information

Any Questions?

Thank you!a you

from neurons to neural networks - home | institute for ... neurons to neural networks jeff knisley...

Documents