neural networks lesson 8 - blind source...
TRANSCRIPT
IntroductionOther BSS techniques
Neural NetworksLesson 8 -
Blind Source Separation
Prof. Michele Scarpiniti
INFOCOM Dpt. - “Sapienza” University of Rome
http://ispac.ing.uniroma1.it/scarpiniti/index.htm
Rome, 19 November 2009
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 1 / 69
IntroductionOther BSS techniques
1 IntroductionBasic ConceptsThe InfoMax algorithm
2 Other BSS techniquesNonlinear BSSComplex-value BSS
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 2 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Introduction
Introduction
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 3 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
General mixing model
The general mixing model is shown by the figure
x [n] = F {s [n]} = F {s1 [n] , ..., sN [n]}
where:
s[n] is the vector of the N original and independent sources;
F is a mixing operator;
x[n] is the vector of the observed signals, called mixtures.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 4 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
General de-mixing model
The general de-mixing model is shown by the figure
u [n] = G {x[n]} = G ◦ F {s [n]}
where:
G is the recovering or de-mixing operator ;
u[n] is the source estimate.
One can use only the statistically independence of the sources: ⇒Independent Component Analysis (ICA).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 5 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Ambiguity of the solution
If F has not a particular structure, separation could be not possible:in particular one can have infinity solutions G.
For this reason we consider only linear and instantaneous models F orPost Nonlinear (PNL).
Remembering the trivial ambiguities of the ICA algorithms, it is:
u [n] = PΛs [n]
where:
P is a permutation matrix;Λ is a diagonal or scaling matrix.
For the nonlinear case it is present an off-set ambiguity too.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 6 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
History of ICA
Classic ICA: linear mixtures1 ICA was born on 1986 (Herault)√
Neurometric approach2 Organic development by 1994 (Comon, Jutten, Cardoso, etc.)
a Statistical approach (HOS)√Cumulant based method (Amari & al.)
1 Kurtosis maximization (Amari & al., Jutten)
b Information theoretic approach√FAST ICA (Hyvarinen & al.)√INFOMAX (Bell & Seynowski)
1 Maximization of Joint Entropy (ME)2 Minimization of Mutual Information (MMI)
c Different heuristics√Geometric approach (Babaieh-Zadeh)
Nonlinear ICA1 Organic development by 1999 (Jutten & Taleb)
2 Open problem
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 7 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Different ICA approaches
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 8 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Different ICA approaches
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 9 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Different ICA approaches
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 10 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Linear BSS
The simplest problem formulation is in the linear and instantaneous environment: weobserve N random variables x1, . . . , xN , which are modeled as linear combinations of Nrandom variables s1, . . . , sN :
x = As (1)
where aij are the entries of the mixing A matrix.
The aim of BSS algorithms is to identify a de-mixing matrix W, in order to obtain thatthe components of the output vector
u = Wx (2)
are as statistically independent as possible.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 11 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Linear mixing model
Giving a look to the linear (and instantaneous) mixing model more in depth,we can obtain the following model:
x1 = a11s1 + a12s2 + . . .+ a1NsN
x2 = a21s1 + a22s2 + . . .+ a2NsN...
xN = aN1s1 + aN2s2 + . . .+ aNNsN
We want a similar model for the linear de-mixing model too.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 12 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Linear de-mixing model
The solution to this problem can be addressed by the well-known ICA algorithms,like FastICA, JADE, etc. Another well-performing algorithm is the InfoMax, pro-posed by Bell & Sejnowski in 1995.
InfoMax addresses the problem of maximizing the mutual informationI (y; x), between the input vector x and an invertible nonlinear transform ofit, y obtained as
y = h(u) = h(Wx)
where W is the N × N de-mixing matrix and h(u) = [h1(u1), . . . , hN(uN)]T
is a set of N nonlinear function.
Note that the network used bythe Bell & Sejnowski algorithmis a single layer neural network.In this way the set of the non-linear functions are the activa-tion function of the neural net-work. For this reason the func-tions hi (ui ) are usually calledactivation function or AF.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 13 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Linear BSS system
The whole system is shown in the following learning scheme:
A cost function measuring the statistic independence of the network output y is
optimized, then the network free parameters (matrix weights, or nonlinear
function parameters) are changed during the learning.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 14 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The InfoMax algorithm
We can analyze as the InfoMax algorithm works. Because the mappingbetween input and output is deterministic, maximizing I (y, x) is the samethat maximizing the joint entropy H(y). In fact the following relation holds:
I (y, x) = H (y)− H (y| x)
where H(y|x) is whatever entropy the output has which did not come fromthe input. In the case that we do not know what is noise and what is signalin the input, the mapping between x and y is deterministic and H(y|x) hasits lowest possible value (it diverges to −∞). The above equation can bedifferentiated as follows, with respect to a parameter, w involved in themapping from x to y:
∂
∂wI (y, x) =
∂
∂wH (y) (3)
because H(y|x) does not depend on w .In this way InfoMax is equivalent to the entropy maximization or ME.M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 15 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The InfoMax algorithm
In order to derive the learning algorithm let we pose px (x) and py (y) the probabilitydensity functions (pdf) of the network input and output respectively which have to satisfythe relation:
py (y) =px (x)
|det J|where |•| denotes the absolute value and J the Jacobian matrix of the transformation:J = [∂yi/∂xj ]ij .Since the joint entropy of network output is defined as H (y) = −E {ln py (y)}, we obtain:
H (y) = E {ln |det J|}+ H (x) .
Now we can note that ∂yi∂xj
= ∂yi∂ui
∂ui∂xj
= h′i (ui ) · wij , so we obtain
ln |det J| = ln det W +N∑
i=1
ln∣∣h′i ∣∣.
Hence, the expression of the joint entropy H(y) (ignoring the expected value operatorE {•}, replacing by instantaneous values) is:
H (y) = H (x) + ln det W +N∑
i=1
ln∣∣h′i ∣∣. (4)
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 16 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The InfoMax algorithm
The maximization (or minimization) of a generic cost function L{Φ} withrespect a parameter Φ can be obtained by the application of the stochasticgradient method at (l + 1)-th iteration
Φ (l + 1) = Φ (l) + ηΦ∂L {Φ (l)}
∂Φ= Φ (l) + ηΦ∆Φ (l)
where ηΦ is the learning rate.Remembering that H(x) is not affected by the parameters that we are learn-ing, it is possible to write the learning rule for the matrix W as follows:
∆W =∂H (y)
∂W= W−T + ΨxT (5)
where W−T =(W−1
)T, Ψ = [Ψ1, . . . ,ΨN ]T and Ψk = h′′k (uk)/h′k (uk).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 17 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The InfoMax algorithm: choice of nonlinearity
The nonlinear transformations hi (ui ) are necessary for bounding the entropy in a finiterange. Indeed, when hi (ui ) is bounded c ≤ hi (ui ) ≤ d , for any random variable ui theentropy of yi = hi (ui ) has an upper bound:
H(yi ) ≤ ln(d − c)
Therefore, the joint entropy of the transformed output vector is upper bounded:
H(y) ≤N∑
i=1
H(yi ) ≤ N ln(d − c)
In fact, the above inequality holds for any bounded transforms, so the global maximum
of the entropy H(y) exists. H(y) may also have many local maxima determined by the
functions hi used to transform u. If we chose c = 0 and d = 1, the the global maximum
is reached at H(y) = 0.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 18 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The InfoMax algorithm: relation to ICA
But what is the relationship between INFOMAX and ICA?For this scope we can introduce the mutual information of the linear outputs u as the
Kullback-Leibler divergence of the output distribution I (u) = E{
ln(pu (u)
/∏Ni=1 pui (ui )
)},
where pu(u) is the joint pdf of the output vector u and pui (ui ) are the marginal pdfs. Then:
H (y) = −∫
py (y) ln py (y)dy = −E {ln py (y)} =
= −E
ln pu(u)N∏
i=1|h′
i |
= −E {ln pu (u)}+ E
{ln
N∏i=1
|h′i |}
=
= −E {ln pu (u)}+ E
{ln
N∏i=1
pui (ui )
}− E
{ln
N∏i=1
pui (ui )
}+ E
{ln
N∏i=1
|h′i |}
=
= −E
ln pu(u)N∏
i=1pui
(ui )
+ E
{N∑
i=1
ln|h′
i |pui
(ui )
}=
= −I (u) + E
{N∑
i=1
ln|h′
i |pui
(ui )
}and we obtain:
H (y) = −I (u) + E
{N∑
i=1
ln|h′i |
pui (ui )
}M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 19 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The InfoMax algorithm: ME and MMI approaches
H (y) = −I (u) + E
{N∑
i=1
ln|h′i |
pui (ui )
}(6)
Thus if |h′i | = pui (ui ) (∀i) then maximizing the joint entropy H(y) isequivalent to minimizing the mutual information, that is theKullback-Leibler divergence (which is a measure of the independence of theui signals) and so the ICA problem is solved. In this way hi (ui ) should bethe cumulative density function (cdf) of the i-th estimated source. The useof an adaptive AF can successfully fulfills the matching of hi (ui ) to the cdfof the i-th source.
Moreover the use of a cdf -like function for the hi functions allows the jointentropy H(y) to have in H(y) = 0 its global maximum (because c = 0 andd = 1).
From (6) the InfoMax algorithm can be performed by two equivalentapproach: maximizing the joint entropy of the network output (MEapproach) or minimizing the mutual information (MMI approach).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 20 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The flexible AF: spline neuron
In order to obtain an adaptive or flexible activation function one can usethe spline neuron.
The spline neuron can be described by the following figure:
In the second block the terms Mi are the columns of the M matrix,which depends by the particular used basis.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 21 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The flexible AF: spline neuron
A matrix formulation of the output value yi can be described as:
yi = hi (u) = T ·M ·Qi
where
T =[u3 u2 u 1
],
Qi =[
Qi Qi+1 Qi+2 Qi+3]T
e
M = 12
−1 3 −3 12 −5 4 −1−1 0 1 00 2 0 0
Matrix M determines the characteristic of theinterpolant curve: CR-Spline or B-Spline
In order to grant a monotonic characteristic of the overall function, the followingconstraint must be imposed: Q1 < Q2 < . . . < QN .
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 22 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Different spline basis
1 B-Spline:
M =1
6
−1 3 −3 13 −6 3 0−3 0 3 01 4 1 0
2 Bezier Spline:
M =1
6
−1 3 −3 13 −6 3 0−3 3 0 01 0 0 0
3 Catmull-Rom Spline:
M =1
2
−1 3 −3 12 −5 4 −1−1 0 1 00 2 0 0
4 Hermite Spline:
M =
2 −2 1 1−3 3 −2 −10 0 1 01 0 0 0
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 23 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
A new InfoMax algorithm
A new algorithm can be derived using the InfoMax principle: maximizing the joint entropyof the network output. Now we have two sets of free parameters of the network: entriesof the W matrix and the spline control points. We have to derive two learning rules.
1 We can derive a similar learning rule for the matrix weights:
∆W =∂H (y)
∂W= W−T + ΨxT
But now it is
Ψi =h′′i (ui )
h′i (ui )=
1
∆
TMQi
TMQi
where ∆ = Qi+1 − Qi is the distance between two adjacent control point,T =
[3u2 2u 1 0
]and T =
[6u 2 0 0
].
2 For the learning rule of the spline control points, we have:
∆Qmi ∝ ∂H(y)
∂Qmi
=∂
[N∑
i=1ln h′
i (ui )
]∂Qm
i= 1
h′i (ui )
∂h′i (ui )
∂Qmi
=
= ∆
TMQi· ∂(TMQi)
∆∂Qmi
= TMm
TMQi, m = 0, . . . , 3
where Mm is the m-th column of the matrix M.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 24 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The Natural Gradient
The stochastic gradient is a very simple approach, but if data lie in a Riemannianspace S = ϑ ∈ C, convergence can be very slow or one can obtain local solutions.
In order to avoid these problems, given a generic functional L(ϑ) defined in S , theconcept of natural gradient was introduced:
∇L(ϑ) = G−1(ϑ)∇L(ϑ)
where ∇L(ϑ) is the classical stochastic gradient, G−1(ϑ) is the inverse of theRiemann curvature tensor and ∇L(ϑ) is the natural gradient.
The Riemann curvature tensor expresses cur-vature of Riemannian manifolds. In this waythe natural gradient can evaluate a gradi-ent considering the curvature of the spacein which data are defined. A classical exam-ple is the distance between two cities in theworld: the “true” distance is an arc on thesurface of the world and not a line betweenthe two points.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 25 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The Natural Gradient
Amari has demonstrated that the inverse of the Riemann curvature tensor isvery simple for the set of non-singular matrices Wn×n, denoted by GL(n,R).We have:
∇L(W ) = ∇L(W )W T W
The InfoMax algorithm can be reformulated in this new way:
∆W = η(
W−T + ΨxT)
WT W = η(
I + ΨxT WT)
W =
= η(
I + ΨuT)
W
This algorithm is more efficient than the classic one, essentially for tworeasons:
1 it avoids the local minima;2 it avoids the evaluation of the inverse of the W matrix.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 26 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The Natural Gradient
In addition four new Riemannian metrics have been introduced by
Arcangeli et al., 2004 and Squartini et al., 2005, for the improving of theconvergence speed.
In this way we can obtain 5 new expressions for the natural gradient:
∇RL(W ) = ∇L(W )W T W (7)
∇LL(W ) = WW T∇L(W ) (8)
∇LRL(W ) = (WW T )∇L(W )(W T W ) (9)
∇RRL(W ) = ∇L(W )(W T W T WW ) (10)
∇LLL(W ) = (WWW T W T )∇L(W ) (11)
defined as right natural gradient (7) (that is the standard naturalgradient), left natural gradient (8), right/left natural gradient (9),right/right natural gradient (10) and left/left natural gradient (11).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 27 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The Natural Gradient
Using the previous expressions, we can obtain the new formulationsfor the learning rules of the de-mixing matrix W: fro the equation
∆W = W−T + ΨxT
we obtain the five rules:
∆W = (I + ΨuT )W (12)
∆W = W(I + WT ΨxT ) (13)
∆W = WWT (I + ΨuT )W (14)
∆W = (I + ΨuT )WT WW (15)
∆W = WWWT (I + WT ΨxT ) (16)
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 28 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The Natural Gradient
Not all these new gradient definitions work efficiently for BSS problem: someof these equations do not satisfy the equivariance property (Cardoso & Laheld, 1996).
Definition
An estimator A of A (A = A(X)) is equivariant if for every invertiblematrix M is:
A (MX) = MA (X)
This implies that, defined C = WA, the natural gradient algorithm(12) depends only from C:
∆W · A = ∆C = (I + ΨuT )WA = (I + ΨuT )C
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 29 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
The Natural Gradient
The equivariance property is not satisfied by all algorithms (13)-(16).
If we consider that the mixing matrix A is unitary (AAT = AT A = I),then:
1 ∆W · A = W(I + ΨxT )A2 ∆W · A = WWT (I + ΨuT )WA = WAAT WT (I + ΨuT )C =
CCT (I + ΨuT )C3 ∆W · A = (I + ΨuT )WT WWA = (I + ΨuT )WT AT AWC =
(I + ΨuT )CT CC4 ∆W · A = WWWT (I + ΨxT )A
It is possible to note that the left (13) and left/left (16) algorithmsdo not satisfy the equivariance property: we suppose that they do notwork efficiently in BSS solution.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 30 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Convergence and universal nonlinearity
The InfoMax algorithm with the natural gradient, converges if (Mathis et al., 2002)
E{
Ψ′ (u)}
E{
u2}
+ E{
Ψ (u) uT}> 0 (17)
Mathis has demonstrated that there not exist an “universal” nonlinear func-tion, which satisfy the previous equation for every input signal.
Using spline functions the previous constraint is always satisfied.
Function Ψi∆=
p′i (x)pi (x) ≡
h′′i (u)
h′i (u) is defined as score function. Because
we use an adaptive function, at convergence the activation function,must assume the profile of the cdf of the input signal, in order tominimize the last terms (6). In this way the functions Ψi tends to thescore functions and can be demonstrated that condition (17) alwaysholds.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 31 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Under-determined BSS
If the number M of sensors is less than the number N of sources, the problem(known as under-determined or over-complete BSS) is more complicated.A useful approach in this case could be the use of a geometric algorithm.
This approach is particular performing if the source are sparse.Sparsity is a measurement of zero elements in the source signal.Greater the sparsity is, more super-gaussian the signal is.If signals are enough sparse, the scatter-plot of the mixtures revealssome “characteristic” directions. From these directions the mixingmatrix A can be estimated as follows from 2 sensors and N sources:
A =
[cosα1 cosα2 · · · cosαN
sinα1 sinα2 · · · sinαN
]
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 32 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Under-determined BSS
In order to improve the sparsity of the mixtures it is possible to perform some time-frequency transformation on the time signal. A good choice is
1 the Short-Time Fourier Transform (STFT): the function to be transformed ismultiplied by a window function which is nonzero for only a short period of time.The Fourier transform (a one-dimensional function) of the resulting signal is takenas the window is slid along the time axis, resulting in a two-dimensionalrepresentation of the signal, written as:
STFT {x (t)} ≡ X (ω, τ) =
∞∫−∞
x (t)w (t − τ) e−jωtdt
2 some Wavelet Transformation. A wavelet is a wave-like oscillation with anamplitude that starts out at zero, increases, and then decreases back to zero. Awavelet transform is used to divide a time signal into wavelets. It possesses theability to construct a time-frequency representation of a signal that offers verygood time and frequency localization. The wavelet transform at scale a > 0 andtranslation value b ∈ R is
Xω (a, b) =1√a
∞∫−∞
x (t)Ψ∗(
t − b
a
)dt
where Ψ∗(t) is a continuous function in both the time domain and the frequencydomain called the mother wavelet.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 33 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Under-determined BSS
The sparsity enhancement is evident from the following pdf plot of a time-domain and a STFT-domain signal
After this preprocessing step,a clustering algorithm, like a K-means, can beused in order to detect the angles αi for the estimation of the mixing matrix(Blind Mixing Model Recovery (BMMR) step);Another problem is the recovering the original sources, solving thecorresponding under-determined (non-square) system (Blind SourceRecovery (BSR) step). In literature there exist several ways to solve such asystem.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 34 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Blind Source Extraction (BSE)
BSS can be computationally very demanding if the number of source signals islarge. Fortunately, Blind Source Extraction (BSE) overcomes this difficulty byattempting to recover only a small subset of desired independent components froma large number of sensor signals: most of the existing BSE criteria and associatedalgorithms only recover one independent component at a time.
A single processing unit (a neuron) is used in the first step to extract oneindependent source signal with the specified stochastic properties. In thenext step a deflection technique is used in order to eliminate the alreadyextracted signals from the mixtures.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 35 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Blind Source Extraction (BSE)
A learning rule for the extraction of the first independent component, can be derived,after whitening, minimizing the following cost function:
J1(w1) = −1
4|k(y1)| = −β
4k(y1)
where k(y1) is the kurtosis of the first output y1 and β = ±1 determines the sign of thekurtosis of the extracted signal. Applying the standard gradient descent, we have:
∆w1 = −µ1∂J1w1
∂w1= µ1β
m4(y1)
m32(y1)
[m2(y1)
m4(y1)E{
y 31 x1
}− E {y1x1}
]where µ1 > 0 is the learning rate, m2(y1) = E
{|y |2}
and m4(y1) = E{|y |4}
. The termm4(y1)/m3
2(y1) is alway positive and can be absorbed in the learning rate. Let us pose
ϕ(y1) = βm4(y1)
m32(y1)
[m2(y1)
m4(y1)y 3
1 − y1
]we obtain the following learning rule:
∆w1 = µ1ϕ(y1)x1 (18)
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 36 / 69
IntroductionOther BSS techniques
Basic ConceptsThe InfoMax algorithm
Blind Source Extraction (BSE)
After successful extraction of the first source signal y1 ≈ s1, we can applythe deflation procedure which removes previously extracted signals from themixtures. This procedure may be recursively applied to extract all sourcesignals:
xj+1 = xj − wjyj j = 1, 2, . . .
where the filter wj can be optimally estimated by minimization of the costfunction
Jj (wj) = E{
xTj+1xj+1
}= E
{xTj xj
}− 2wT
j E {xjyj}+ wTj wjE
{y 2j
}with respect to wj leads to the simple updating equation:
wj =E {xjyj}
E{
y 2j
} =E{
xjxTj
}wj
E{
y 2j
}M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 37 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Other BSS techniques
Other BSS techniques
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 38 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Nonlinear BSS
Unfortunately linear instantaneous mixing models are too unrealistic and
unsatisfactory in many applications. A more realistic mixing system inevitably
introduces a nonlinear distortion in the signals. In this way the possibility of
taking into account these distortions can give better results in signal separation.
The problem is that in the nonlinear case the uniqueness of the solution is not
guaranteed. The solution becomes easier in a particular case, called Post
Nonlinear (PNL) mixture, well-known in literature in the case of the real domain.
In this context the solution is unique too.
The mixing procedure is realized in twosteps: first there is a linear mixing, theneach mixtures is obtained as a nonlineardistorted version of the linear mixtures.The PNL mixture model is quite probablein many practical situations.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 39 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Nonlinear BSS
If the mixing-separating system is nonlinear and no other assumption is given for themixing operator, a generic de-mixing model does not assure the existence and uniquenessof the solution. In order to better illustrate this aspect, see the following example.
Example
Consider two independent random variables s1 with uniform distribution in [0, 2π) and s2 with
Rayleigh distribution, so that its pdf is ps2 (s2) = s2
σ22e−
s22/2 with variance σ2
2 = 1.
Given the two nonlinear transformations y1 = s2 cos s1 and y2 = s2 sin s1, the random variablesy1 and y2 are still independent but are Gaussian distributed, so they cannot be separated as aconsequence of the Darmois-Skitovich’s theorem. In fact the Jacobian J of this transformationis:
det (J) = det
(∂y1∂s1
∂y1∂s2
∂y2∂s1
∂y2∂s2
)= det
(−s2 sin s1 cos s1
s2 cos s1 sin s1
)= −s2
so the joint pdf of y = [y1, y2] can be expressed as
py1,y2 (y1, y2) =ps1,s2
(s1,s2)
|det(J)| = 12π
exp
(− y2
1 +y22
2
)=
=
(1√2π
exp
(− y2
12
))(1√2π
exp
(− y2
22
))≡ py1 (y1) · py2 (y2)
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 40 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Nonlinear BSS: the mirror model
In order to solve the BSS problem in nonlinear environment, using PNLmixtures, the following de-mixing model is adopted:
This model is known as mirror model , because the de-mixing model is themirror image of the mixing one. In formulas
v = Asx = F (v) , xi = fi (vi )r = G (x) , ri = gi (xi )u = Wr
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 41 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Nonlinear BSS system
The whole system is shown in the following learning scheme:
The learning architecture must adapt the matrix weights wij , the nonlinearactivation function hi (ui ) and the compensating functions gi (xi ) of thedistorting functions fi (vi ).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 42 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Nonlinear BSS: a flexible solution
In order to obtain an adaptive function representation for activation functionand compensating the distorting functions, we can adopt splines. In thisway we have three sets of free parameters:
1 the entries wij of the de-mixing matrix;
2 the control points Qhi of the AFs;
3 the control points QGi of the compensating the distorting functions
gi (xi ).
Using the InfoMax algorithm, we have to maximize the following cost func-tion:
∂
∂ΦL{y,Φ} =
∂
∂Φ
[ln |det (W)|+
N∑k=1
ln yk +N∑
k=1
ln rk
]where the free parameters, are Φ =
{wij ,Qh
i ,QGi
}.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 43 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Nonlinear BSS: a flexible solution
Maximization of the previous equation by the stochastic gradient method yields tothree learning rules:
1 The learning rule for the network’s weights is:
∆W = W−T + ΨrT
where Ψi = h′′i (ui )/h′i (ui ) as usual.
2 The learning rule for the spline activation functions is:
∆Qhi+m =
TMm
TMQh
i+m
where Mm is the m-th column of the M matrix.
3 The learning rule for the spline compensating functions is:
∆QGi+m =
TMm
TMQG
i+m
+ ΨWHk TMm
where Wk is a vector composed by the k-th column of the matrix W.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 44 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Indeterminacy of nonlinear BSS
In PNL BSS we have not only the usual scaling and permutation ambiguity
Definition
Let A ∈ Gl(n) be an invertible matrix, then A = (aij)i ,j=1...n is said to beabsolutely degenerate if there are two columns l 6= m such that a2
il = λa2im
for a λ 6= 0, i.e. the the normalized columns differ only by the signs of theentries.
A not so obvious indeterminacy occurs if A is absolutely degenerate:only the matrix A but not the nonlinearities can be recovered.
An additional indeterminacy come into play because of translation:the distorting functions fi can only be recovered up to a constantoff-set.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 45 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Other BSS environment
The nature of some problem involves a natural solution in a complex environment,due to the need of frequency domain signal processing which is quite common intelecommunication and biomedical applications.The problem formulation and solution is similar to the real-value one, except thatall quantities involved in model are complex-value. For example let us consider avector s = [s1, . . . , sN ]T of N complex sources (s ∈ CN). The k-th source can beexpressed as sk = sRk + jsIk , where sRk and sIk are the real and imaginary partsof the k-th complex-valued source signal and j =
√−1 is the imaginary unit. The
goal of complex BSS is to recover the complex signal s from observations of thecomplex mixture x = [x1, . . . , xN ]T , where the k-th mixture can be expressed asxk = xRk + jxIk , xRk and xIk are its real and imaginary part. In this way the modelin real-valued case are still valid but the mixing matrix A and the de-mixing matrixW are complex matrices (aij ∈ C and wij ∈ C):
x = Asu = Wx
In this way all the real-valued algorithm, like the InfoMax, can be generalized to
the complex-valued environment.M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 46 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Example 1: coctail-party problem
A first example is the well-known coctail-party problem in reverberant envi-ronment:
x (t) = A (t) ∗ s (t)STFT⇔ X (f , t) = A (f ) S (f , t)
A convolutive BSS problem can be addressed as a set on N instantaneousproblem, on for each frequency bin.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 47 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Example 1: coctail-party problem
An example of spectrogram of two sources and two mixtures.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 48 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Example 2: fMRI problem
A second example is the Functional Magnetic Resonance Imaging or fMRI
Two orthogonal scanners acquire a slice of temporal voxels in order tomeasure the Blood-oxygen-level dependent (BOLD).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 49 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Example 2: fMRI problem
Example of fMRI images, we have 5 super-gaussian, 1 gaussian and 2 sub-gaussiansignals:
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 50 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Example 3: communication problem
A third example is the band-pass transmission
Usually it is used the complex envelope
x (t) = xR (t) + jxS (t)
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 51 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Example 3: comunication problem
Example of an 8-PSK, a 16-QAM, a 4-QAM and a uniform noise signal.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 52 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The complex activation function (AF)
One of the main issues in designing complex neural networks is the presence ofcomplex nonlinear activation functions h(z) (where z = zR + jzI ) involved in thelearning processing.
The main challenge is the dichotomy between boundedness and analyticityin the complex domain, as stated by the Liouville’s theorem: complexfunctions, bounded on the whole complex plane, are either constant or notanalytic. Thus this kind of complex nonlinear functions are not suitable asactivation functions of neural networks.
In particular Georgiou & Koutsougeras defined the following set ofproperties:
1 h(z) is nonlinear in zR e zI ;2 h(z) is bounded: |h(z)| ≤ c <∞;3 partial derivatives hR
zR, hR
zI, hI
zRe hI
zIexist and are bounded;
4 h(z) is not entire;5 hR
zRhI
zI6= hR
zIhI
zR.
where hR(zR , zI ) and hI (zR , zI ) are known as the real part function andimaginary part function of the complex function h(z) respectively.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 53 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The complex activation function (AF)
In this context Kim and Adali proposed the use of the so-called elementary transcen-dental functions (ETF). They classified the ETFs into two categories of unboundedfunctions, depending on which kind of singularities they possess:
1 Circular functions: tan (z), sin (z) and cot (z);
2 Inverse circular functions: tan−1 (z), sin−1 (z) and cos−1 (z);
3 Hyperbolic functions: tanh (z), sinh (z) and coth (z);
4 Inverse hyperbolic functions: tanh−1 (z), sinh−1 (z) and cosh−1 (z).
As expected the trigonometric and the corresponding hyperbolic functions behave
very similarly.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 54 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The complex activation function (AF): the splittingsolution
According to the properties listed above, in order to overcome the dichotomy be-tween boundedness and analyticity, complex nonlinear splitting functions have beenintroduced. In this approach real and imaginary parts are processed separately byreal-valued nonlinear functions. The splitting function
h (z) = h (zR , zI ) = hR (zR) + jhI (zI )
avoids the problem of unboundedness of complex nonlinearities, as stated above,but it cannot be analytic.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 55 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The complex activation function (AF): the generalizedsplitting solution
The splitting model of a nonlinear complex valued function is not realistic becauseusually the real and imaginary part are correlated. According to this issue it isuseful to perform a more realistic model of the nonlinear functions. In this way,Vitagliano et al. (2003) proposed a complex neural network based on a couple ofbi-dimensional functions called generalized splitting function:
h (z) = h (zR , zI ) = hR (zR , zI ) + jhI (zR , zI )
In this way h(z) is bounded but it is not analytic. The Cauchy-Riemann conditions(hR
zR= hI
ZI, hI
zR= −hR
ZI) are not satisfied by the complex function itself, but can
be imposed by an algorithm constraint during the learning process.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 56 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Spline implementation of splitting functions
The new idea is to use spline-based functions for imple-menting the splitting activa-tion function. Thus, remem-bering the spline matrix for-mulation we can firstly eval-uate the indexes span iR andiI , and the local parameters νR
and νI , then we have
yk = hRk (uRk) + jhI
k(uIk) = TR ·M ·QRiR
+ jTI ·M ·QIiI
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 57 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The generalized spline neuron
A matrix formulation of the output yi can be written as:
yi = hi (uR , uI ) = Tν,2 ·M · (Tν,1 ·M ·Qi )T
dove
Tν,k =[u3k u2
k uk 1]
, k = 1, 2
Qki =
Q
(i1,i2)j Q
(i1,i2+1)j Q
(i1,i2+2)j Q
(i1,i2+3)j
Q(i1+1,i2)j Q
(i1+1,i2+1)j Q
(i1+1,i2+2)j Q
(i1+1,i2+3)j
Q(i1+2,i2)j Q
(i1+2,i2+1)j Q
(i1+2,i2+2)j Q
(i1+2,i2+3)j
Q(i1+3,i2)j Q
(i1+3,i2+1)j Q
(i1+3,i2+2)j Q
(i1+3,i2+3)j
e
M = 12
−1 3 −3 12 −5 4 −1−1 0 1 00 2 0 0
The matrix M shows the characteristic of the interpolant
curve: CR-Spline or B-Spline
In order to satisfy a monotonic characteristic of the overall function, the followingconstraint must be imposed: Qk
1 < Qk2 < . . . < Qk
N .
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 58 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Spline implementation of generalized splitting functions
Unfortunately the real andimaginary parts of a complexsignal are usually correlated,not split in separate channels.In this way we need a bet-ter model of the complex AF.We can render each of the twobi-dimensional real functionshR(ur , uI ) and hI (ur , uI ) withbi-dimensional splines: oneplays the role of the real partand one the imaginary part ofthe complex activation func-tion
yRk = hRk (uRk , uIk) = TνI (uIk) ·M · (TνR(uRk) ·M ·Q(iR ,iI )
[2]R )T ,
yIk = hIk(uRk , uIk) = TνI (uIk) ·M · (TνR(uRk) ·M ·Q(iR ,iI )
[2]I )T ,
yk = yRk + jyIk
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 59 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The de-mixing algorithm
In order to derive the de-mixing algorithm, let us pose
x [n] =
[xR [n]xI [n]
]=
[xR1(n) . . . xRN(n)xI1(n) . . . xIN(n)
]
u [n] =
[uR [n]uI [n]
]=
[uR1(n) . . . uRN(n)uI1(n) . . . uIN(n)
]y [n] =
[yR [n]yI [n]
]=
[h (uR [n])h (uI [n])
]
We utilize the InfoMax algorithm.
As cost function we use the joint entropy (ME approach).
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 60 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The de-mixing algorithm
Let us pose Φ the free parameters, then the cost function is:
L{y[n],Φ} = H (y[n]) = H (x)− E{
log(
det(
J))}
=
= H(x) + log |det W|+ 2 ·N∑
n=1
E [log∣∣∣h′n(un)
∣∣∣]The free parameters for our architecture are:
Φ ={
wij ,Qh}
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 61 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The de-mixing algorithm
Evaluating the derivatives with respect the de-mixing matrix, weobtain:
∂ log |det W|∂W
= W−H
2∂∑
n log∣∣∣h′n(un)
∣∣∣∂W
= Ψ(u)xH
Ψ(u) is a function depending on spline parameters.
We obtain the following learning rule that is a generalization of theBell & Sejnowski algorithm:
∆W =∂L{y,Φ}∂W
= η(
W−H + Ψ (u) xH)
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 62 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The de-mixing algorithm
The function Ψ(u) = Ψ(uR, uI) + jΨ(uR, uI) is formed as:ψiR = 2
∂yiR∂uiR
∂2yiR∂u2
iR
+∂yiR∂uiI
∂∂uiR
∂yiR∂uiI(
∂yiR∂uiR
)2+(∂yiR∂uiI
)2
ψiI = 2
∂yiR∂uiI
∂2yiR∂u2
iI
+∂yiR∂uiR
∂∂uiI
∂yiR∂uiR(
∂yiR∂uiR
)2+(∂yiR∂uiI
)2
and in matrix notation:
ψiR = 2∆
( (TIi ·M·(TRi ·M·QRi)
T)(
TIi ·M·(TRi ·M·QRi)T)
(TIi ·M·(TRi ·M·QRi)
T)2
+(TIi ·M·(TRi ·M·QRi )T )2
+
+(TIi ·M·(TRi ·M·QRi )
T )(
TIi ·M·(TRi ·M·QRi)T)
(TIi ·M·(TRi ·M·QRi)
T)2
+(TIi ·M·(TRi ·M·QRi )T )2
)and similar for ψiI .
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 63 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The de-mixing algorithm
Evaluating the derivatives with respect the spline control points, we obtain:
2∂∑
n log |h′n (un)|∂Qj,iR +mR ,iI +mI
=
0 i 6= j
2
∂yjR∂ujR
∂∂Qj,iR +mR ,iI +mI
∂yjR∂ujR
+∂yjR∂ujI
∂∂Qj,iR +mR ,iI +mI
∂yjR∂ujI(
∂yjR∂ujR
)2+
(∂yjR∂ujI
)2
i = j
and in matrix notation:
∆Qj,iR +mR ,iI +mI = 2ηQ
( (TIj ·M·(TRj ·M·QRj)
T)(
TIj ·MmI·(TRj ·MmR )T
)(
TIj ·M·(TRj ·M·QRj)T)2
+(
TIj ·M·(TRj ·M·QRj)T)2 +
+
(TIj ·M·(TRj ·M·QRj)
T)(
TIj ·MmI·(TRj ·MmR )T
)(
TIj ·M·(TRj ·M·QRj)T)2
+(
TIj ·M·(TRj ·M·QRj)T)2
)where Mk is a zero matrix, except the k-th column, that is equal to the k-thecolumn of the M matrix.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 64 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
The de-mixing algorithm for PNL case
A similar architecture can be found for the PNL mixtures case. In this newarchitecture a third learning rule can be derived for the adaptation of thespline contol point of the compensating nonlinear function:
∆QGR,i+m = TRMm
TRMQGR,i+m
+ Re{
ΨWHk
}TRMm,
∆QGI,i+m = TI Mm
TI MQGI,i+m
+ Im{
ΨWHk
}TI Mm
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 65 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Phase recovery
The scaling ambiguity of BSS algorithm is reflected in a phase ambiguity (rotation)in the complex domain. We ca analyze as recover this information.
Assume that x = [x1, . . . , xN ]T e xi = [xRi , xIi ]T , and s = [s1, . . . , sN ]T e si = [sRi , sIi ]
T , we obtain:
x1Rx1Ix2Rx2I
.
.
.xNRxNI
=
a11R −a11I a12R −a21I · · · a1NR −a1NIa11I a11R a12I a12R · · · a1NI a1NRa21R −a21I a22R −a22I · · · a2NR −a2NIa21I a21R a22I a22R · · · a2NI a2NR
.
.
.
.
.
.
.
.
.
.
.
.. . .
.
.
.
.
.
.aN1R −aN1I aN2R −aN2I · · · aNNR −aNNIaN1I aN1R aN2I aN2R · · · aNNI aNNR
·
s1Rs1Is2Rs2I
.
.
.sNRsNI
that is:
x1x2
.
.
.xN
=
A11 A12 · · · A1NA21 A22 · · · A2N
.
.
.
.
.
.. . .
.
.
.AN1 AN2 · · · ANN
·
s1s2
.
.
.sN
(19)
where Aij =
[aijR −aijIaijI aijR
]e det
(Aij)
= a2ijR + a2
ijI =∣∣aij
∣∣2.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 66 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Phase recovery
We desire a similar de-mixing model
u1Ru1Iu2Ru2I
.
.
.uNRuNI
=
w11R −w11I w12R −w21I · · · w1NR −w1NIw11I w11R w12I w12R · · · w1NI w1NRw21R −w21I w22R −w22I · · · w2NR −w2NIw21I w21R w22I w22R · · · w2NI w2NR
.
.
.
.
.
.
.
.
.
.
.
.. . .
.
.
.
.
.
.wN1R −wN1I wN2R −wN2I · · · wNNR −wNNIwN1I wN1R wN2I wN2R · · · wNNI wNNR
·
x1Rx1Ix2Rx2I
.
.
.xNRxNI
that is
u1u2
.
.
.uN
=
W11 W12 · · · W1NW21 W22 · · · W2N
.
.
.
.
.
.. . .
.
.
.WN1 WN2 · · · WNN
·
x1x2
.
.
.xN
(20)
where Wij =
[wijR −wijIwijI wijR
]e det
(Wij)
= w2ijR + w2
ijI =∣∣wij
∣∣2.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 67 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
Phase recovery
So we can have a phase recovery with the following constraint:
Wij =1
2
{Wij + det (Wij) ·Wij
−T}
(21)
It can be demonstrated that the previous constraint is equivalent to utilizea splitting activation function:
h (z) = hR (zR) + jhI (zI )
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 68 / 69
IntroductionOther BSS techniques
Nonlinear BSSComplex-value BSS
References
A. Hyvarinen, J. Karhunen, E. Oja.
Independent Component Analysis.Wiley, 2001.
A. Cichocki and S.-I. Amari.
Adaptive Blind Signal and Image Processing.Wiley, 2003.
A.J. Bell and T.J. Sejnowski.
An information-maximisation approach to blind separation and blind deconvolution.Neural Computation, Vol. 7, pp. 1129-1159, 1995.
H.H. Yang and S.-I. Amari.
Adaptive On-Line Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information.Neural Computation, Vol. 9, pp. 1457-1482, 1997.
S-I. Amari.
Natural gradient works efficiently in learning.Neural Computation, Vol. 10, pp. 251-276, 1998.
M. Scarpiniti, D. Vigliano, R. Parisi and A. Uncini.
Generalized Splitting Functions for Blind Separation of Complex Signals.Neurocomputing, Vol. 71, N. 10-12, pp. 2245-2270, 2008.
D. Vigliano, M. Scarpiniti, R. Parisi and A. Uncini.
Flexible Nonlinear Blind Signal Separation in the Complex Domain.International Journal of Neural System, Vol. 18, N. 2, pp. 105-122, April 2008.
M. Scarpiniti Neural Networks Lesson 8 - Blind Source Separation 69 / 69