chapter 1: basics of statistical mechanics. the curie...
TRANSCRIPT
(TENTATIVE) PLAN OF THE COURSE Introduction Chapter 1: Basics of statistical mechanics.
The Curie-Weiss modelChapter 2: Neural networks for associative memory and pattern recognition.Chapter 3: The Hopfield model
Hopfield model with low-load and solution via log-constrained entropy.Self-average, spurious states, phase diagramHopfield model with high-load and solution via stochastic stability.
Chapter 4: Beyond the Hebbian paradigmaChapter 5: A gentle introduction to machine learning
Maximum likelihoodRosenblatt and Minsky&Papert perceptrons.
Chapter 6: Neural networks for statistical learning and feature discovery.Supervised Boltzmann machines. Bayesian equivalence between Hopfield retrieval and Boltzmann learning.
Chapter 7: A few remarks on unsupervised learning, “complex” patterns, deep learningUnsupervised Boltzmann machines.Non-Gaussian priors.Multilayered Boltzmann machines and deep learning.
Seminars: Numerical tools for machine learning; Non-mean-field neural networks; (Bio-)Logic gates; Maximum entropy approach, Hamilton-Jacobi techniques for mean-field models, …
Hopfield model at low load
Self-consistency equations for the Mattis magnetizations (a straightforward way without SM)
Replace the computation of extensive thermodynamic variables, by an average over an ensemble of
systems which are distributed according to the probability distribution of the random variables.
Recalling
the internal contribution can be written as
that is, it scales with <mμ>.
h�ii = (+1)P (+1) + (�1)P (�1) =e�hi
e�hi + e��hi� e��hi
e�hi + e��hi= tanh(�hi)
) h�ii = tanh(�hi)
hmµi =1
N
NX
i=1
⇠µi h�ii
hinti =
1
N
NX
j=1
Jijh�ji =PX
µ=1
⇠µi hmµi
Combining the previous expressions hmµi =1
N
NX
i=1
⇠µi tanh
"�
PX
µ=1
⇠µi hmµi+ hext
!#
Field hi = hiext + hiint acting upon spin i
Recalling the “Glauber” probability
we get
Prob[�i] =1
2[1 + �i tanh(�hi)] =
e�i�hi
e�hi + e��hi
1
N
NX
i=1
G(⇠µi ) !1
2P
X
⇠µ
P (⇠µ)G(⇠µ) = hGi⇠
hmµi =*⇠µ tanh
"�
PX
µ=1
⇠µhmµi+ hext
!#+
⇠
Self-averaging of the free energy
The free energy in the TDL only depends on the distribution of patterns, but not on which particular set of patterns we have sampled from this distribution.This holds for both low and high storage regimes.
E(f(⇠)) = 1
2P
X
{⇠}
f(⇠)
Theorem. In the TDL the free energy evaluated for a given pattern configuration converges in probability to the average value with respect to pattern sampling.
P (||fN � EfN || > ✏) ! 0
Examples of non-self-averaging observables: overlaps q in SK model (see Parisi plateau) and in high-load Hopfield model
Replace the computation of extensive thermodynamic variables, by an average over an ensemble of systems which are distributed according to the probability distribution of the random variables.
If 2P ≪ N this is a consequence of law of large numbersIf P grows large deviations can be found → non self-averaging quantities
71
Signal-to-noise Does the Hebbian rule actually stabilize the inscribed pattern ξμ?Would a network in a state that is a stored pattern σi = ξi1 be dynamically stable?σi = sgn(hi) → condition σi hi >0 (i=1, 2, …, N)
Jij =1
N
PX
µ=1
⇠µi ⇠µj
hi =1
N
NX
j=1j 6=i
PX
µ=1
⇠µi ⇠µj �j
Stability condition when σ = ξ1
signal term(unitary in the TDL)
noise termsum of (N-1)(p-1) ≈ Np uncorrelated bits ± 1
⇠11h1 = 1 +R
|R| ⇡r
P
N
If P is kept constant as N is made very large, the noise becomes negligible in comparison with the signal→ every pattern is a fixed point
72
⇠11h1 =N � 1
N+
1
N
NX
j=1j 6=i
PX
µ=2
⇠11⇠µ1 ⇠
µj ⇠
1j > 0
S: reinforcement due to the fact that other spins are as well retrieving pattern 1R: slow noise due to other patterns
⇠11h1 = 1 +R
|R| ⇡r
P
N
If P is kept constant as N is made very large, the noise becomes negligible in comparison with the signal→ every pattern is a fixed point
In fact, the patterns are very stable fixed points.Suppose a finite fraction d of spins is flipped away from one of the patterns, randomly, ⇒ S = 1-2d, R ≈ N-1/2 still negligible. hi = (1-2d)ξi1 + R → sgn(hi)=sgn(ξi1)The network will immediately align itself with the pattern⇒ the patterns have very large basins of attraction
73
Ener
gy
Hamming distance < N/2
Spurious states via signal-to-noiseAlthough the Hebbian kernel has been constructed to guarantee that certain specified patterns are fixed points of the dynamics, the non-linearity of the dynamical process induces additional attractors
Symmetric 3-mixture (majority rule)�i = sgn(⇠1i + ⇠2i + ⇠3i )
Check stability
Stability is mostly threatened by the sites which have the lowest values of the signal (i.e., 0.5)R2 ≈ (P-3)/N ⇒ P/N as N → ∞, keeping P fixed, R → 0 ⇒ the symmetric 3-mixture is stable.It has a rather large basin, but still much reduced compared to the pure patterns (i. stability is guaranteed by signals of magnitude one half only, ii. initial state must have large overlap with three patterns rather than one)
74
N ≫ 1
�ihi = S +R
S = �i
3X
µ=1
mµ⇠µi = 0.5(⇠1i + ⇠2i + ⇠3i )sgn(⇠
1i + ⇠2i + ⇠3i )
R =1
N
NX
j=1
X
µ>3
�i⇠µ1 ⇠
µj �j
m⌫ =1
N
NX
i=1
�i⇠⌫i ! h�⇠⌫i⇠ = 0.5, for ⌫ = 1, 2, 3
Noiseless symmetric spurious states
m = mn (1, 1, …, 1, 0, 0, …, 0) Symmetry of the solution under permutations of the components as well as under the change of sign of any number of them
There are overall 2n✓P
n
◆
sign
selection of non-zero
components
m = h⇠ tanh(�⇠ ·m)i⇠ ) mn =1
nhzn tanh(�mnzn)i⇠
In the noiseless limit mn =1
nh|zn|i⇠
mn =1
22k
✓2k
k
◆
En = �1
2Nnm2
n
E1 < E3 < E5 < ... < E1 < ... < E6 < E4 < E2
These mixture states are solutions of the MF eqs., but not necessarily attractors → test their stability.Compute the second variations of the energy at each of the stationationy points: those that are positive definite correspond to true attractorsAll the odd mixtures are attractors (minima)All the even mixtures are unstable (saddle pts)
zin =nX
µ=1
⇠µi
Pr(zn = 2k � n) = 2�n
✓n
k
◆
n
n=2k, n evenn=2k+1, n odd
75
Noiseless Hopfield network, α = 0The stored patterns, as well as their reversed twins, are absolute minima of an energy that is the (Lyapunov) landscape function
Each memory has an enormous basin of attraction
There is a large number of spurious attractors, referred to as spurious states.≈2n, n ≤ P symmetric mixture fixed points (even mixtures are unstable, odd mixtures are stable)
Noisy Hopfield network, α = 0System endowed with stochastic dynamical process obeying db relaxes towards a Boltzmann distribution. Free energy is the potential to look at for describing the behavior of the system.
T>1 noise prevails, there is but a single minimum, at which all overlaps vanish (paramagnetic state)
T=Tc=1 second order phase transition → qualitative modification in the shape of the landscape (recall Landau)
T<1 paramagnetic state becomes unstable (max for F) and 2P “valleys” emerge, each corresponding to a state which has a single non-vanishing average overlap
T ≪ 1 valleys deepen, move away from the point m=0, towards pure states
T< 0.461 spurious states become successively stableThe pure pattern attractors remain the absolute minima in the landscape all the way down to T=0; they also always have the largest basins of attraction
High load and the way to complexity
Def. The Hamiltonian of the mean-field spin-glass model is defined as
The main ingredient: Random frustration/competition
Main differences with classic systems- an (at least) extensive (in N) number of free energy minima- emerging ultrametric organization of states- a dynamical spread over time scales (infinite for N → ∞)- aging
77
Sherrington-Kirkpatrick model is prototype model for complex systems
HN (�; ⇠) = � 1
N
NX
i,j
P=↵NX
µ=1
⇠µi ⇠
µj �i�j ����!
P!1�
p↵pN
NX
i,j
Jij�i�j , Jij ⇠ N (0, 1)
HN (�;J) = � 1pN
NX
i,j
Jij�i�j , Jij ⇠ N (0, 1)&Jii = 0
Wick’s theorem
78
Let g be a centered Gaussian random variable with variance ν2 and let us denote the density function of its distribution by
given a continuosly differentiable function F: ℝ→ ℝ, we can formally integrate by parts,
Since
if the limits and the expectations on both sides are finite.
Therefore
This computation can be generalized to Gaussian vectors.
A new (set of) order parameter
describes correlation between spins belonging to two “replicas”, namely to two different systems endowed with the same couplings J
under replica-symmetry ansatz(which, by the way, is proven to be false)
metastable states
deep valleys79
Overlap qab =1
N
NX
i=1
�ai �
bi
q↵� = q, ↵ 6= �
q↵� = 1, ↵ = �
hHSKN (�;J)i = ��N
4
�1� hq2abi
�
) limN!1
P (hqabi) = �(hqabi � q̄), 8(a, b), a 6= b
Let us prove that
A new (set of) order parameter
Overlap
under replica-symmetry ansatz(which, by the way, is proven to be false)
80
Each RSB step adds a new set of valleys hierarchically embedded in the existing ones.
qab =1
N
NX
i=1
�ai �
bi
describes correlation between spins belonging to two “replicas”, namely to two different systems endowed with the same couplings J
q↵� = q, ↵ 6= �
q↵� = 1, ↵ = �
) limN!1
P (hqabi) = �(hqabi � q̄), 8(a, b), a 6= b
Let us prove that hHSKN (�;J)i = ��N
4
�1� hq2abi
�
The replica trick
log(x) = limn!0
�xn � 1
�
n
��Ef(�) = limN!1
1
NE logZN (�, J).
��Ef(�) = limN!1
limn!0
EZnN (�, J)� 1
Nn
The trick consists in calculating ZNn for n ∈ ℕ and in making an analytic continuation for n → 0.Serious mathematical problems:- analytic continuation is not ensured in the TDL- uniqueness of the limit (Carleson theorem does not hold)- interchange of limits
81
EZn
N(�, J) = exp
(J�2
4N
�nN2 � n2N
�)Z Y
a<b
dqab
r�2JN
2⇡·
· exp(��JN
2
X
a<b
q2ab
+N lnX
�
eJ�2 P
a<b qab�(a)
�(b)
)=
=
Z Y
a<b
dqab
r2⇡
�2JNe�NHeff(qab).
limN!1
ZnN (�, J) = exp(�N min
q[He↵(q)])
He↵(q) = ��2J(N � n)
4N+
�2J
2
X
a<b
q2ab � log
X
{�}
exp
J�
2X
a<b
qab�(a)
�(b)
!
RS ansatz
q̄RS =
Zdzp2⇡
e�z2/2 tanh2(z�pJq̄RS)
SK model CW model
ZN (�, h) =X
{�}
exp
2
4 �
2N
NX
i=1
�i
!2
+ �h
NX
i=1
�i
!3
5
ZN (�, h) =
rN�
2⇡
Z +1
�1dx exp
⇢�N
�x2
2� log 2 cosh[�(h+ x)]
��
To evaluate Z we need to solve a single integralFrom its log-derivative get mean observables
limN!1
ZN (�, J) = exp(�N minx
[Fe↵(x)])
x̄ = tanh[�(x̄+ h)]
Fe↵(x) =�x2
2� log 2 cosh[�(h+ x)]
Rigorous contributions from (not an exhaustive list!)Guerra: interpolation techniquesAizenmann: stochastic stability Pastur, Scherbina, Tirozzi: martingales, cavity methodsTalagrand, Panchenko: concentration of measureBovier: gaussian processes…
“Spin glasses are heaven for mathematicians”M. Talagrand
Solution at high load via interpolation techniquesat the RS level assuming one binary pattern and P-1 Gaussian patterns
Exploit “universality of noise”
q̄ =
ZdM(z) tanh2
p�q̄�zµ
(1� �(1� q̄))
�.
q̄ =
ZdM(z) tanh2
�m̄+
p�q̄�zµ
(1� �(1� q̄))
�.
Hopfield with Gaussian patterns (α>0)
Hopfield with digital patterns (α>0)
Slow noise 83
↵(�,�) = limN!1
↵N (t = 1) =
limN!1
↵N (t = 0) +
Z 1
0@t↵N (t)dt
�
↵N (t = 0) ! 1� body
@t↵N (t) = I + II + III + IV + V + V I
@t↵N =�
2E!t
⇥(m� m̄)2
⇤� 1
2�m̄2� �N�
2h(q12� q̄)(p12� p̄)it�
�N�
2p̄(1� q̄)
@↵N
@t(t) =E!t
"�
2
✓m2 � 2
�m
◆#+
1
2N
�� �B2 � C
� P�1X
µ=1
E!t(z2µ)+
� �N�
2hq12p12it �
A2
2(1� hq12it) +
�NB2
2hp12it.
limN!1
(hmit � m̄) = 0
limN!1
(hq12it � q̄) = 0
limN!1
(hp12it � p̄) = 0
see F. Guerra “Central limit theorem for fluctuations in the high temperature region” (2002) and references therein for the validity of such limits
↵N (t = 0) =� �N�
2+ log 2 + E log cosh
��m̄+
p�N�p̄✓
�� �N
2log
⇥1� �(1� q̄)
⇤+
+�N�
2· q̄
1� �(1� q̄).
↵(�,�) = limN!1
↵N (t = 0) +
Z 1
0@t↵N (t)dt
�
85
ZN (�; t).=ZN (t) = e�
�2N
Pi
Pµ(⇠
µi )
2 X
{�}
ZdM(z) exp
(t�
2Nm2
)·
· exp {(1� t) Nm} · exp(pt
r�
N
NX
i=1
P�1X
µ=1
⇠µi �izµ
)·
· exp(Ap1� t
NX
i=1
⌘i�i
)· exp
(Bp1� t
P�1X
µ=1
✓µzµ
)·
· exp((1� t)
C
2
P�1X
µ=1
z2µ
)
ψ mimics the field provided by mAηi mimics the field provided by ξiμ zμ Bθμ mimics the field provided by ξiμ σiCzμ2 accounts for z gaussianity
effective field
86
Theorem. The replica symmetric thermodynamic limit of the free energy density of the Hopfield neural network with N spins σi ∈{-1,+1} ∀i=1, …,N, one binary pattern and a high load of P-1 real patters is determined by the minimum value of the following function:
f(�,�) = � 1
�↵(�,�)
where
↵(�,�) =� ��
2+ log 2 + E log cosh
✓�m̄+
p��p̄✓
◆� 1
2�m̄2+
� �
2log
�1� �(1� q̄)
�+
��q̄
2�1� �(1� q̄)
� � ��
2p̄(1� q̄).
where
Require stationarity of f(λ, β) with respect to order parameters
q̄ =
Z
RdM(✓) tanh2
��m̄+
p��p̄✓
�
p̄ =�q̄
�1� �(1� q̄)
�2
m̄ =
Z
RdM(✓) tanh
��m̄+
p��p̄✓
�.
Left picture: RS amplitudes m of the pure states of the Hopfield model as a function of temperature. From top to bottom: α=0.000…0.125 (Δα =0.025).Right picture, solid lines: free energies of the pure states. From top to bottom: α =0.000…0.125 (Δα =0.025). Dashed lines and their solid continuations: free energies of the spin glass state m=0 for the same values of α, shown for comparison.
Phase diagram of the Hopfield model. P: paramagnetic phase, m=q=0. SG: spin-glass phase, m=0, q>0. F: pattern recall phase, where the pure states with m≠0 and q>0 minimize f. In region M, the pure states are local but not global minima of f. Dashed: the AT instability for the retrieval solution (TR). Inset: close-up of the low temperature region
88