-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
1/14
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009 763
Adaptive Fuzzy Filtering in a Deterministic SettingMohit Kumar, Member, IEEE, Norbert Stoll, and Regina Stoll
AbstractMany real-world applicationsinvolve the filtering andestimation of process variables. This study considers the use of in-terpretable Sugeno-type fuzzy models for adaptive filtering. Ouraim in this study is to provide different adaptive fuzzy filteringalgorithms in a deterministic setting. The algorithms are derivedand studied in a unified way without making any assumptions onthe nature of signals (i.e., process variables). The study extends,in a common framework, the adaptive filtering algorithms (usu-ally studied in signal processing literature) andp-norm algorithms(usually studied in machine learning literature) to semilinear fuzzymodels. A mathematical framework is provided that allows thedevelopment and an analysis of the adaptive fuzzy filtering algo-rithms. We study a class of nonlinear LMS-like algorithms for theonline estimation of fuzzy model parameters. A generalization ofthe algorithms to the p-norm is provided using Bregman diver-
gences (a standard tool for online machine learning algorithms).Index TermsAdaptive filtering algorithms, Bregman diver-
gences, p-norm, robustness, Sugeno fuzzy models.
I. INTRODUCTION
AREAL-WORLD complex process is typically character-
ized by a number of variables whose interrelations are
uncertain and not completely known. Our concern is to ap-
ply, in an online scenario, the fuzzy techniques for such pro-
cesses aiming at the filtering of uncertainties and the estimation
of variables. The adaptive filtering algorithms applications are
not only limited to the engineering problems but also, e.g., to
medicinal chemistry, where it is required to predict the biologi-cal activity of a chemical compound before its synthesis in the
laboratory [1]. Once a compound is synthesized and tested ex-
perimentally for its activity, the experimental data can be used
for an improvement of the prediction performance (i.e., online
learning of the adaptive system). Adaptive filtering of uncer-
tainties may be desired, e.g., for an intelligent interpretation of
medical data that are contaminated by the uncertainties arising
from the individual variations due to a difference in age, gender,
and body conditions [2].
We focus on a process model with n inputs (represented bythe vector x Rn ) and a single output (represented by the scalary). Adaptive filtering algorithms seek to identify the unknown
Manuscript received March 9, 2007; revised August 10, 2007; acceptedOctober 30, 2007. First published April 30, 2008; current version publishedJuly 29, 2009. This work was supported by the Center for Life Science Automa-tion, Rostock, Germany.
M. Kumar is with the Center for Life Science Automation, D-18119 Rostock,Germany (e-mail: [email protected]).
N. Stoll is with the Institute of Automation, College of Computer Scienceand Electrical Engineering, University of Rostock, D-18119 Rostock, Germany(e-mail: [email protected]).
R. Stoll is with the Institute of Preventive Medicine, Faculty of Medicine,University of Rostock, D-18055 Rostock, Germany (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TFUZZ.2008.924331
TABLE I
EXAMPLES OF (e) FROM [6][8]
parameters of a model (being characterized by a vector w)using inputoutput data pairs {x(j), y(j)} related via
y(j) = M(x(j), w) + nj
where M(x(j), w) is the model output for an input x(j),and nj
is theunderlying uncertainty. If the chosenmodel Mis nonlinearin parameter vector w (as is the casewithneural and fuzzy mod-els), the standard gradient-descent algorithm is mostly used for
an online estimation ofw via performing following recursions:
wj = wj 1
E r(w, j)
w
wj 1
Er(w, j) =1
2[y(j) M(x(j), w)]2 (1)
where is the step size (i.e., learning rate).If we relax our model to be linear in w i.e., inputoutput data
are related via
y(j) = GTj w + nj , where Gj is the regressor vector
then a variety of algorithms are available in the literature for an
adaptive estimation of linear parameters [3]. The most popular
algorithm is the LMS because of its simplicity and robustness
[4], [5]. Many LMS-like algorithms have been studied for linear
models [3], [4] while addressing the robustness, convergence,
and steady-state error issues. A particular class of algorithms
takes the update form as
wj = wj 1 (GT
j wj 1 y(j))Gj
where is a nonlinear scalar function such that different choicesof the functional form lead to the different algorithms, as stated
in Table I. The generalization of LMS algorithm to p-norm(2 p < ) is given by the update rule [9], [10]:
wj = f1 (f(wj 1 ) [G
Tj wj 1 y(j)]Gj ). (2)
Here, f (a p indexing for f is understood), as defined in [10], isthe bijective mapping f : RK RK such that
f = [f1 fK]T, fi (w) =
sign(wi )|wi |q1
wq2q(3)
where w = [w1 wK]T RK, q is dual to p (i.e., 1/p +
1/q = 1), and q denotes the q-norm defined as
wq = i |wi |q1/q
.
1063-6706/$26.00 2009 IEEE
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
2/14
764 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
The inverse f1 : RK RK is given as
f1 =
f11 f1K
T, f1i (v) =
sign(vi )|vi |p1
vp2p(4)
where v = [v1 vK]T RK.
Sugeno-type fuzzy models are linear in consequents and non-
linear in antecedents (i.e., membership functions parameters).
When it comes to the online estimation of fuzzy model param-
eters, the following two approaches, in general, are used.
1) The antecedent parameters are adapted using gradients-
descent and the consequent parameters by the recursive
least-squares algorithm [11], [12].
2) A combination of data clustering and recursive least-
squares algorithm is applied [13], [14].
The wide use of gradient-descent algorithm for adaptation of
nonlinear fuzzy model parameters (e.g., in [15]) is due to its sim-
plicity and low computational cost. However, gradient-descent-
based algorithms for nonlinear systems are not justified by rig-orous theoretical arguments [16]. Only a few papers dealing
with the mathematical analysis of the adaptive fuzzy algorithms
have appeared till now. The issue of algorithm stability has been
addressed in [17]. The authors in [18] introduce an energy gain
bounding approach for the estimation of parameters via min-
imizing the maximum possible value of energy gain from dis-
turbances to the estimation errors along the line ofH-optimalestimation. The algorithms for the adaptive estimation of fuzzy
model parameters based on least-squares and H-optimizationcriteria are provided in [19].
To the knowledge of the authors, the fuzzy literature still
lacks1) the development and deterministic mathematical analysis
(in terms of filtering performance) of the methods that
extend the Table I type algorithms (i.e., LMF, LMMN,
sign error, etc.) to the interpretable fuzzy models;
2) the generalization of the algorithms with error nonlinear-
ities (i.e., Table I type algorithms) to the p-norms that aremissing, even for linear in parameters models;
3) the development and deterministic mathematical analysis
(in terms of filtering performance) of the p-norm algo-rithms [e.g., of type (2)] for an adaptive estimation of the
parameters of an interpretable fuzzy model.
This paper is intended to provide the aforementioned studies
in a unified manner. This is done via solving a constrainedregularized nonlinear optimization problem in Section II.
Section III provides the deterministic analysis of the algorithms
with emphasis on filtering errors. Simulation studies are pro-
vided in Section IV followed by some remarks and, finally, the
conclusion.
II. ADAPTIVE FUZZY ALGORITHMS
Sugeno-type fuzzy models are characterized by two types of
parameters: consequents and antecedents. If we characterize the
antecedents using a vector and consequents using a vector
, then the output of a zero-order TakagiSugeno fuzzy model
could be characterized as
Fs (x) = GT (x, ), c h (5)
where G() is a nonlinear function (which is defined by the shapeof membership functions), and c h is a matrix inequality tocharacterize the interpretability of the model. The details of
(5) can be found in, e.g., [19] as well as in the Appendix. Astraightforward approach to the design of an adaptive fuzzy fil-
ter algorithm is to update, at time time j, the model parameters(j 1 , j 1 ) based on current data pair (x(j), y(j)), where weseek to decrease the loss term |y(j) GT (x(j), j )j |
2 ; how-
ever, we do not want to make big changes in initial parameters
(j 1 , j 1 ). That is
(j , j ) = arg min(,,c h)
1
2|y(j) GT (x(j), )|2
+1j
2 j 1
2 +1, j
2 j 1
2
(6)
where j > 0, , j > 0 are the learning rates for antecedentsand consequents, respectively, and denotes the Euclideannorm (i.e., we write the 2-norm of a vector instead of 2 as ). The terms j 1
2 and j 1 2 provide regular-
ization to the adaptive estimation problem. To study the different
adaptive algorithms in a unified framework, the following gen-
eralizations can be provided to the loss as well as regularization
term.
1) The loss term is generalized using a function Lj (, ).Some examples ofLj (, ) include
Lj (, ) =
|y(j) GT (x(j), )|12 |y(j) G
T(x(j), )|2
14 |y(j) G
T(x(j), )|4
a2 |y(j) G
T (x(j), )|2
+ b4 |y(j) GT (x(j), )|4 .
(7)
2) The regularization terms are generalized using Bregman
divergences [9], [20]. The Bregman divergence dF(u, w)[21], which is associated with a strictly convex twice dif-
ferentiable function F from a subset ofRK to R, is definedfor u, w RK, as follows:
dF(u, w) = F(u) F(w) (u w)T f(w)
where f = F denotes the gradient of F. Note thatdF(u, w) 0, which is equal to zero only for u = w andstrictly convex in u. Some of the examples of Bregmandivergences are as follows.
a) Bregman divergence associated to the squared
q-norm: If we define F(w) = (1/2)w2q, then thecorresponding Bregman divergence dq(u, w) is de-fined as
dq(u, w) =1
2u2q
1
2w2q (u w)
T f(w)
where f is given by (3). It is easy to see that for
q = 2, we have d2 (u, w) = (1/2)u w2 .
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
3/14
KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 765
b) Relative entropy: For a vector w = [w1 wK] RK (with wi 0), if we define F(w) =K
i=1 (wi ln wi wi ), then the Bregman diver-gence is the unnormalized relative entropy:
dRE (u, w) =K
i= 1 ui lnui
wi
ui + wi.Bregman divergences have beenwidely studied in learning
and informationtheory (see,e.g.,[22][28]).However,our
idea is to replace the regularization terms (1j /2)
j 1 2 and (1, j /2) j 1
2 in (6) by the generalized
terms 1j dF (, j 1 ) and 1 ,j dF(, j 1 ), respectively.
This approach, in the context of linear models, was intro-
duced for deriving predictive algorithms in [29] and filter-
ing algorithms in [9]. It is obvious that different choices
of function F result in the different filtering algorithms.Our particular concern in this text is to provide a p-normgeneralization to the filtering algorithms. For the p-norm(2 p < ) generalization of the algorithms, we con-sider the Bregman divergence associated to the squared
q-norm [i.e., F(w) = (1/2)w2q], where q is dual to p(i.e., 1/p + 1/q = 1).
In view of these generalizations, the adaptive fuzzy algo-
rithms take the form
(j , j ) = arg min(,,c h )
[Lj (, )
+ 1j dq(, j 1 ) + 1, j dq(, j 1 )]. (8)
For a particular choice Lj (, ) = (1/2)|y(j) GT (x(j),)|2 and q = 2, problem (8) reduces to (6). For a given valueof, we define
() = arg min
Ej (, )
Ej (, ) = Lj (, ) + 1
j dq(, j 1 )
so that the estimation problem (8) can be formulated as
j = arg min
[Ej ((), ) + 1, j dq(, j 1 ), c h] (9)j = (j ). (10)
Expressions (9) and (10) represent a generalized update rule for
adaptive fuzzy filtering algorithms that can be particularized for
a choice ofLj (, ) and q. For any choice of Lj (, ) listed in(7), Ej (, ) is convex in and, thus, could be minimized w.r.t. by setting its gradient equal to zero. This results in
1j f() 1
j f(j 1 )
y(j) GT (x(j), )
G(x(j), ) = 0 (11)
where function is given as
(e) =
sign(e), for sign error
e, for LMS
e3 , for LMF
ae + be3 , for LMMN.
(12)
The minimizing solution must satisfy (11). Thus = f1 (f(j 1 )
+j
y(j) GT (x(j), )G(x(j), ) . (13)For a given , (13) is implicit in
and could be solved nu-
merically. It follows from (13) that for a sufficient small value
of j , it is reasonable to approximate the term GT(x(j), )on the right-hand side of (13) with the term GT(x(j), )j 1 ,as has been done in [9] to obtain the explicitupdate. Thus, an
approximate but in closed form, the solution of (13) is given as
() = f1 (f(j 1 )+ j
y(j) GT (x(j), )j 1
G(x(j), )
. (14)
Here, () has been written to indicate the dependenceof the solution on . Since dq(, j 1 ) = (1/2)
2q
(1/2)j 1 2q ( j 1 )T f(j 1 ), (9) is equivalent to
j = arg min Ej ((), )+
1 ,j2
2q 1, j
Tf(j 1 ), c h
(15)
as the remaining terms are independent of . There is no harmin adding a -independent term in (15):
j = arg min
Ej ((), ) + 1 ,j
22q
1 ,j T f(j 1 ) +
1, j2
f(j 1 )2 , c h
. (16)
For any 2 p < , we have 1 < q 2, and thus, q .This makes
1, j2
2q 1 ,j
T f(j 1 ) +1, j
2f(j 1 )
2 (17)
1 ,j
2[2 2Tf(j 1 ) + f(j 1 )
2 ]
=1 ,j
2 f(j 1 )
2 . (18)
Solving the constrained nonlinear optimization problem (16),
as we will see, becomes relatively easy by slightly modifying(decreasing) the level of regularization being provided in the
estimation of j . The expression (17), i.e., last three terms ofoptimization problem (16), accounts for the regularization in
the estimation ofj . For a given value of, j , a decrease in thelevel of regularization would occur via replacing the expression
(17) in the optimization problem (16) by expression (18):
j = arg min
Ej ((), ) + 1, j
2 f(j 1 )
2 , c h
.
(19)
For a viewpoint of an adaptive estimation of vector , noth-
ing goes against considering (19) instead of (16), since any
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
4/14
766 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
desired level of regularization could be still achieved via adjust-
ing in (19) the value of free parameter , j . The motivation ofconsidering (19) is derived from the fact that it is possible to
reformulate the estimation problem as a least-squares problem.
To do so, define a vector
r() = Ej ((), )(1, j /2)1/2 ( f(j 1 )) where Ej ((), ) 0, and , j > 0. Now, it is possible torewrite (19) as
j = arg min
[r()2 , c h]. (20)
To compute j recursively based on (20), we suggest followingGaussNewton like algorithm:
j = j 1 + s(j 1 ) (21)
s() = arg mins
r() + r()s2 , cs h c
(22)
where r() is the Jacobian matrix of vector r w.r.t. computed
by the method of finite differences. Fortunately, r
() is a full-rank matrix, since ,j > 0. The constrained linear least-squaresproblem (22) can be solved by transforming it first to a least
distance programming (see [30] for details). Finally, (10) in
view of (14) becomes
j = f1 (f(j 1 )
+ j
y(j) GT (x(j), j )j 1
G(x(j), j )
. (23)
III. DETERMINISTIC ANALYSIS
We provide in this section a deterministic analysis of adap-
tive fuzzy filtering algorithms (21)(23) in terms of filtering
performance. For this, consider a fuzzy model that fits giveninputoutput data {x(j), y(j)}kj =0 according to
y(j) = GT (x(j), j ) + vj (24)
where is some true parameter vector (that is to be estimated),j is given by (21), and vj accommodates any disturbance dueto measurement noise, modeling errors, mismatch between jand global minima of (20), and so on. We are interested in
the analysis of estimating using (23) in the presence of adisturbance signal vj . That is, we take j as an estimate of
at thejth time instant and try to calculate an upper bound on thefiltering errors. In filtering setting, it is desired to estimate the
quantity GT
(x(j), j )
using an adaptive model. The j 1 isa priori estimate of at the jth time index, and thus, a priorifiltering error can be expressed as
ef ,j = GT (x(j), j )
GT (x(j), j )j 1 .
One would normally expect |GT (x(j), j ) GT (x(j),
j )j 1 |2 to be the performance measure of an algorithm. How-
ever, the squared error as a performance measure does not seem
to be suitable for a uniform analysis of all the algorithms. We
introduce a generalized performance measure P (y, y) that isdefined for scalars y and y as
P (y, y) = y
y((r) (y))dr (25)
Fig. 1. Value P (a, b) is equal to the area of the shaded region.
where is a continuous strictly increasing function with (0) =0.Itcanbeeasilyseenthatfor (e) = e, we have normal squarederror i.e., P (y, y) = (y y)
2 /2. A different but integral-basedloss function, called matching loss for a continuous, increasing
transfer function , was considered in [31] and [32] for a singleneuron model. The matching loss for was defined in [32] as
M (y, y) =
1 (y)1 (y)
((r) y)dr.
If we let (r) =
(r)dr, then
P (y, y) = (y) (y) (y y)(y). (26)
Note that in definition (25), the continuous function is not anarbitrary function but its integral (r) =
(r)dr must be a
strictly convex function. The strictly increasing nature of (r)[i.e., strict convexity of(r)] enables us to assess the mismatchbetween y and y using P (y, y). Fig. 1 illustrates the physicalmeaning ofP (a, b). The value P (a, b) is equal to the area ofthe shaded region in the figure. One could infer from Fig. 1 that
a mismatch between a and b could be assessed via calculatingthe area of the shaded region [i.e., P (a, b)] provided the givenfunction is strictly increasing. Usually, P is not symmetric,i.e., P (y, y) = P (y, y). One of such types of performancemeasures [i.e., M (y, y)] was considered previously in [22].
In our analysis, we assess the instantaneous filter-ing error [i.e., the mismatch between GT(x(j), j )j 1and GT (x(j), j )
] by calculating P (GT(x(j), j )j 1 ,GT(x(j), j )
). For the case (e) = e, we haveP
GT(x(j), j )j 1 , GT (x(j), j )
= |ef ,j |2 /2. The
filtering performance of an algorithm, which is run from j = 0to j = k, can be evaluated by calculating the sum
kj = 0
P
GT (x(j), j )j 1 , GT(x(j), j )
.
Similarly, the magnitudes of disturbances vj = y(j)
GT(x(j), j ), j = 0, . . . , k will be assessed by calculating
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
5/14
KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 767
the mismatch between y(j) and GT (x(j), j ) as follows:
kj = 0
P
y(j), GT (x(j), j )
.
Finally, the robustness of an algorithm (i.e., sensitivity of filter-
ing errors toward disturbances) can be assessed by calculating
an upper bound on the ratio, e.g., (29). The term dq(, 1 )in the denominator of (29) assesses the disturbance due to a
mismatch between initial guess 1 and the true vector .
Lemma 1: Let m be a scalar and Gj RK such that j =f1 (f(j 1 ) + mGj ), and then
dq(j 1 , j ) m2
2(p 1)Gj
2p .
Proof : See [10, Lemma 4]. Lemma 2: Ifj and j 1 are related via (23), then
dq(, j 1 ) dq(
, j ) + dq(j 1 , j )
= j y(j)GT (x(j), j )j 1 (j 1 )T G(x(j), j ).Proof: The proof follows simply by using the definitions of
dq(, j 1 ), dq(
, j ), and dq(j 1 , j ). In view of Lemma 1 and (23), we have
dq(j 1 , j ) 2
j
|
y(j) GT (x(j), j )j 1
|2
2
(p 1)G(x(j), j )2
p . (27)
Theorem 1: The estimation algorithm (21)(23) with beinga continuous strictly increasing function and (0) = 0, for any
2 p < , with a learning rate
0 < j 2P
y(j), GT(x(j), j )j 1
den
(28)
where
den =
y(j) GT (x(j), j )j 1
(p 1)
[(y(j)) (GT(x(j), j )j 1 )]G(x(j), j )2
p
achieves an upper bound on filtering errors such thatkj =0
aj P
GT (x(j), j )j 1 , G
T (x(j), j )
k
j =0 a
j P (y(j), GT (x(j), j )) + dq(, 1 )
1
(29)where qis dual to p, and aj > 0 is given as
aj = j
y(j) GT(x(j), j )j 1
(y(j)) (GT (x(j), j )j 1 ). (30)
Here, we assume that y(j) = GT(x(j), j )j 1 , since thereis no update of parameters (i.e., j = j 1 ) if y(j) =GT(x(j), j )j 1 .
Proof: Define j = dq(, j 1 ) dq(
, j ) and usingLemma 2, we have
j = j
y(j) GT (x(j), j )j 1
( j 1 )
T G(x(j), j )
dq(j 1 , j ). (31)
Using (27), we have
j j
y(j)GT(x(j), j )j 1
(j 1 )T G(x(j), j )
p 1
22j |
y(j) GT(x(j), j )j 1
|2 G(x(j), j )
2p .
That is
j
aj [(y(j)) (GT (x(j), j )j 1 )](
j 1 )T G(x(j), j )
p 1
2j
y(j) GT(x(j), j )j 1
aj
[(y(j)) (GT(x(j), j )j 1 )]G(x(j), j )2
p .
Since j satisfies (28), the previous inequality reduces to
j
aj [(y(j)) (GT (x(j), j )j 1 )](
j 1 )T G(x(j), j )
a
j P (y(j), GT
(x(j), j )j 1 ). (32)It can be verified using definition (26) that
[(y(j)) (GT(x(j), j )j 1 )]( j 1 )
TG(x(j), j )
= P
GT (x(j), j )j 1 , GT (x(j), j )
P (y(j), GT (x(j), j )
)+P (y(j), GT (x(j), j )j 1 )
and thus, inequality (32) is further reduced to
j a
j P
GT (x(j), j )j 1 , GT (x(j), j )
aj P
y(j), GT (x(j), j )
.
In additionk
j =0
j = dq(, 1 ) dq(
, k )
dq(, 1 ), since dq(
, k ) 0
resulting in
dq(, 1 )
kj = 0
aj P
GT (x(j), j )j 1 , GT (x(j), j )
k
j = 0 aj P y(j), GT (x(j), j )from which the inequality (29) follows.
Following inferences could be immediately made from
Theorem 1.
1) For the special case (e) = e and taking 1 = 0, theresults of Theorem 1 are modified as follows:k
j = 0 j |(GT (x(j), j )j 1 G
T (x(j), j )|2k
j = 0 j |y(j) GT(x(j), j )|2 + 2q
1
where
j 1
(p 1)G(x(j), j )2p.
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
6/14
768 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
Choosing
j =1
(p 1)U2p, where Up G(x(j), j )p
we get
k
j =0
|(GT
(x(j), j )j 1 GT
(x(j), j )
|2
k
j =0
|y(j) GT (x(j), j )|2 + (p 1)U2p
2q
which is formally equivalent to [9, Th. 2]
2) Inequality (29) illustrates the robustness property in the
sense that if, for the given positive values {aj }k
j = 0 , the
disturbances {vj }k
j =0 are small, i.e.,
k
j =0P
y(j), GT (x(j), j )
is small, then the filtering errors
kj =0
P
GT (x(j), j )j 1 , GT(x(j), j )
remain small. The positive value aj represents a weightgiven to the jth data pair in the summation.
3) If we define an upper bound on the disturbance signal as
vm ax = maxj
P
y(j), GT(x(j), j )
then it follows from (29) that
kj = 0
aj P GT (x(j), j )j 1 , GT (x(j), j ) (k + 1)am ax v
m ax + dq(
, 1 )
where ama x = maxj
aj . That is
1
k + 1
kj = 0
aj P
GT (x(j), j )j 1 , GT (x(j), j )
am ax vm ax +
dq(, 1 )
k + 1. (33)
As dq(, 1 ) is finite, we have
1k + 1
kj = 0
aj PGT (x(j), j )j 1 , GT (x(j), j )
ama x vma x , when k .
Inequality (33) shows the stability of the algorithm against
disturbance vj in the sense that if disturbance signalP
y(j), GT (x(j), j )
is bounded (i.e., vm ax is fi-nite), then the average value of filtering errors assessed
as
1
k + 1
kj = 0
aj P
GT (x(j), j )j 1 , GT (x(j), j )
remains bounded.
4) In the ideal case of zero disturbances (i.e., vj = 0), wehave P
y(j), GT (x(j), j )
= 0, andk
j = 0
aj P
GT (x(j), j )j 1 , GT (x(j), j )
dq(, 1 ).
Since dq(, 1 ) is finite and
aj P (G
T (x(j), j )
j 1 , GT (x(j), j )
) 0, there must exist a sufficientlarge index T such that
j T
aj P
GT (x(j), j )j 1 , GT(x(j), j )
= 0.
In other words
GT (x(j), j )j 1 = GT (x(j), j )
j T
since aj > 0. This shows the convergence of thealgorithmat time index T toward true parameters.
5) Theorem 1 cannot be applied for the case of (e) =
sign(e), since in this case is not a continuous strictlyincreasing function. However, one could instead choose
(e) = tanh(e) that has a shape similar to that of signfunction, and Theorem 1 still remains applicable. Note
that in this case, the loss term in (8) will be
Lj (, ) = ln(cosh(y(j) GT(x(j), ))).
Theorem 1 is an important result due to the generality of
function , and thus, offers a possibility of studying, in a unifiedframework, different fuzzy adaptive algorithms corresponding
to the different choices of continuous strictly increasing function
. Table II provides a few examples of (plotted in Fig. 2),leading to the different p-norms algorithms listed in the table asA1,p , A2,p , and so on.
IV. SIMULATION STUDIES
This section provides simulation studies to illustrate the
following.
1) Adaptive fuzzy filtering algorithms A1,p , A2,p , . . ., p 2perform better than the commonly used gradient-descent
algorithm.
2) For the estimation of linear parameters (keeping mem-
bership functions fixed), LMS is the standard algorithm
known for its simplicity and robustness. This study
provides a family of algorithms characterized by and
p. It will be shown that the p-norm generalization of thealgorithms corresponding to the different choices of may achieve better performance than the standard LMS
algorithm.
3) We will show the robustness and convergence of the fil-
tering algorithms.
4) For a given value of p, the algorithms corresponding tothe different choices of may prove being better than thestandard choice (e) = e (i.e., squared error loss term).
A. First Example
Consider the problem of filtering noise from a chaotic
time series. The time series is generated by simulating the
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
7/14
KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 769
TABLE IIFEW EXAMPLES OF ADAPTIVE FUZZY FILTERING ALGORITHMS
Fig. 2. Different examples of function .
MackeyGlass differential delay equation
dx
dt=
0.2x(t 17)
1 + x10 (t 17) 0.1x(t)
y(t) = x(t) + n(t)
where x(0) = 1.2, x(t) = 0, for t < 0, and n(t) is a ran-dom noise chosen from a uniform distribution on the in-
terval [0.2, 0.2]. The aim is to filter the noise n(t) fromy(t) to estimate x(t) by using a set of past values i.e.,[x(t 24), x(t 18), x(t 12), x(t 6)].
Assume that there exists an ideal fuzzy model characterized
by (, ) that models the relationship between input vec-tor [x(t 24) x(t 18) x(t 12) x(t 6)]T and output x(t).That is
x(t) = GT ([x(t 24) x(t 18) x(t 12) x(t 6)]T , )
y(t) = GT ([x(t 24) x(t 18) x(t 12) x(t 6)]T , )
+ n(t).
The fourth-order RungeKutta method was used for the simula-
tion of time series, and 500 inputoutput data pairs, starting fromt = 124 to t = 623, were extracted for an adaptive estimation ofparameters (, ) using different algorithms. If (t1 , t1 )denotes the a priori estimation at time t, then filtering error ofan algorithm is defined as
ef ,t = x(t) y(t)
where
y(t)
= GT [x(t24) x(t 18) x(t12) x(t6)]T , t1t1 .We consider a total of 30 differentalgorithms i.e., A1 ,p , . . . , A5,pfor p = 2, 2.2, 2.4, 2.6, 2.8, 3, running from t = 124 to t =623, for the prediction of the desired value x(t). We choose,for an example sake, the trapezoidal type of membership
functions (defined by 36) such that the number of mem-
bership functions assigned to each of the four inputs [i.e.,
x(t 24), x(t 18), x(t 12), x(t 6)]isequalto3.Theini-tial guess about model parameters was taken as
12 3 =
0
0.30.60.91.21.5
T
0
0.30.60.91.21.5
T
0
0.30.60.91.21.5
T
0
0.30.60.91.21.5
T
where 12 3 = [0]81 1 . The matrix c and vector h are chosen insuch a way that the two consecutive knots must remain separated
at least by a distance of 0.001 during the recursions of thealgorithms.
The gradient-descent algorithm (1), if intuitively applied to
the fuzzy model (5), needs following considerations.
1) The antecedent parameters of the fuzzy model should be
estimated with the learning rate , while the consequent
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
8/14
770 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
parameters with the learning rate . That is
jj
=
j 1j 1
00
Er(,,j)
j 1 ,j 1
Er(,,j)
j 1 ,j 1
Er(,,j) =1
2[y(j) GT(x(j), )]2 .
Introducing the notation j =
Tj T
j
T, the gradient-
descent update takes the form
j = j 1
00
Er(, j)
j 1
.
Here, may or may not be equal to . In general,
should be lesser than to avoid any oscillations of theestimated parameters.2) During the gradient-descent estimation of membership
functions parameters, in the presence of disturbances, the
knots (elements of vector ) may attempt to come close (oreven cross) one another. That is, inequalities (38) and (39)
do not hold good and, thus, result in a loss of interpretabil-
ity and estimation performance. For a better performance
of gradient-descent, the knots must be prevented from
crossing one another by modifying the estimation scheme
as
j = j 1 0
0 Er(, j) j 1 , if cj hj 1 , otherwise.
(34)
Each of the aforementioned algorithms and gradient-descent
algorithm (34) is run at = = 0.9. Thefiltering performanceof each algorithm is assessed via computing energy of the fil-
tering error signal ef ,t . The energy of a signal is equal to thesquared L2 -norm. The energy of filtering error signal ef ,t , fromt = 124 to t = 623, is defined as
62 3
t=124 |e
f ,t |2
.
A higher energy of filtering errors means the higher magnitudes
of filtering errors and, thus, a poor performance of the filtering
algorithm.
Fig. 3 compares the different algorithms by plotting their
filtering errors energies at different values of p. As seenfrom Fig. 3, all the algorithms A1,p , . . . , A5,p for p =2, 2.2, 2.4, 2.6, 2.8, 3 perform better than the gradient-descent since graident-descent method is associated to a higher
energy of the filtering errors. As an illustration, Fig. 4 shows the
time plot of the absolute filtering error |ef ,t | for gradient-descent
and the algorithm A4,p at p = 2.4.
Fig. 3. Filtering performance of different adaptive algorithms.
Fig. 4. Time plot of absolute filtering error |ef ,t | from t = 124 to t = 623for gradient-descent and the algorithm A4 ,2 .4 .
B. Second Example
Now, we consider a linear fuzzy model (membership func-
tions being fixed)
y(j) = [ A 1 1 (x(j)) A 2 1 (x(j)) A 3 1 (x(j)) A 4 1 (x(j)) ]
+ vj
where = [ 0.25 0.5 1 0.3 ]T , x(j) takesrandom val-ues from a uniform distribution on [1, 1], vj is a random noisechosen from a uniform distribution on the interval [0.2, 0.2],and the membership functions are defined by (37) taking
= (1, 0.3, 0.3, 1).Algorithm (23) was employed to estimate for different
choices of (listed in Table II) atp = 3 (asanexampleofp > 2).The initial guess is taken as 1 = 0. For comparison, the LMSalgorithm is also simulated. Note that the LMS algorithm is
just a particularization of (23) for p = 2 and (e) = e. That is,
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
9/14
KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 771
Fig. 5. Comparison of algorithms (A1 ,3 , A2 ,3 , A3 , 3 , A4 ,3 , A5 ,3 ) with theLMS.
A2,2 in Table II, in the context of linear estimation, is the LMSalgorithm. The performance of an algorithm was evaluated by
calculating the instantaneous a priori error in the estimation of
as j 1 2 . The learning rate j for each algorithm,including LMS, is chosen according to (28) as follows:
j
=2P
y(j), GTj j 1
y(j)GTj j 1
[(y(j))(GTj j 1 )](p1)Gj 2
p
where
Gj = [ A 1 1 (x(j)) A 2 1 (x(j)) A 3 1 (x(j)) A 4 1 (x(j)) ]T .
For an assessment of the expected error values i.e., E[ j 1
2 ], 500 independent experiments have been performed.The 500 independent time plots of values j 1
2 have
been averaged to obtain the time plot of E[ j 1 2 ],
shown in Fig. 5. The better performance of algorithms
(A1 ,3 , A2,3 , A3,3 , A4; 3 , A5,3 ) than the LMS can be seen inFig. 5.
This example indicates that the p-norm generalization of thealgorithms makes a sense since A2,p algorithm for p = 3 (avalue of p > 2) proved being better than the p = 2 case (i.e.,LMS).
C. Robustness and Convergence
To study therobustness properties of thealgorithms, we inves-
tigate the sensitivity of the filtering errors toward disturbances.
To make this more precise, we plot the curve between distur-
bance energy and filtering errors energy and analyze the curve.
In the aforementioned example (i.e., second example), the en-
ergy of filtering errors will be defined as
Ef(j) =
j
i= 0 GTi GTi i1 2
Fig. 6. Filtering errors energy as function of energy of disturbances. Thecurves are averaged over 100 independent experiments.
and the total energy of disturbances as
Ed (j) =
ji= 0
|vi |2 + 1
2
where the term 1 2 accounts for the disturbancedue to a mismatch between the initial guess and true
parameters. Fig. 6 shows the plot between the values
{Ed (j)}1000
j = 0 and {Ef(j)}1000
j = 0 for each of the algorithms
(A1 ,3 , A2,3 , A3,3 , A4,3 , A5,3 ). The curves for (A2,3 , A4,3 , A5,3 )are not distinguishable. The curves in Fig. 6 have been averaged
over 100 independent experiments. The initial guess 1 is cho-sen in each experiment randomly from a uniform distributionon [1, 1]. The curves in Fig. 6 show the robustness of the algo-rithms in the sense that a small energy of disturbances does not
lead to a large energy of filtering errors. Thus, if the disturbances
are bounded, then the filtering errors also remain bounded [i.e.,
bounded-input bounded-output (BIBO) stability].
To verify the convergence properties of the algorithms,
the plots of Fig. 5 are redrawn via running the algorithms
(A1 ,3 , A2,3 , A3,3 , A4,3 , A5,3 ) in an ideal case of zero distur-bances (i.e., vj = 0). Fig. 7 shows, in this case, the convergenceof the algorithms toward true parameters.
D. Third Example
Our third example has been taken from [9], where at time t,a signal yt is transmitted over a noisy channel. The recipientis required to estimate the sent signal yt out of the actuallyobserved signal
rt =k 1i=0
ui+ 1 yti + vt
where vt is zero-mean Gaussian noise with a signal-to-noiseratio of 10 dB. The signal is estimated using an adaptive filter
yt = GTt t1 , Gt = [rtm rt rt+m ] R2m + 1
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
10/14
772 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
Fig. 7. Convergence of thealgorithms in theideal case toward true parameters.
where t R2m +1 are the estimated filter parameters at time t.We take the values k = 10 and m = 15 as in [9]. Thetransmittedsignal yt is chosen to be zero-mean Gaussian with unit variance;however, yt was a binary signal in [9] (which is obviously notquite the same as our framework). The vector u Rk , describ-ing the channel, was chosen in [9] in two different manners.
1) In the first case, u is chosen from a Gaussian distributionwith unit variance and then normalized to make u = 1.
2) In the second case, ui = si ez i , where si {1, 1}, zi {10, 10} are distributed uniformly and then normalizedto make u = 1.
Algorithm (23) was used to estimate the filter parameters tak-
ing different choices of at p = 2.5 in the first case. However,the second case (with u being sparse), as discussed in [9], fa-vors fairly large valueofp [i.e.,p = 2ln(2m + 1)]. Thelearningrate is chosen to be equal to (28) as in the previous example.
The instantaneous filtering error is defined as
ef ,t = yt GTt t1 .
The performance of different filtering algorithms is assessed by
calculating the root mean square of filtering errors:
RMSFE(t) =
1
t + 1
t
i=0|ef ,t |
2
1/2.
The time plots of root mean square of filtering errors, aver-
aged over 100 independent experiments, are shown in Fig. 8.
Fig. 8(a) shows the faster convergence of algorithm A1 ,p thanA2,p while Fig. 8(b) shows the faster convergence of A3,p thanA2,p . This indicates that the algorithms corresponding to thedifferent choices of (i.e., A1,p , A3,p , etc.) may prove beingbetter in some sense than the standard choice (e) = e (i.e.,algorithm A2,p ). Hence, the proposed framework that offers thepossibility of developing filtering algorithms corresponding to
the different choices of is a useful tool.The provided simulation studies clearly indicate the poten-
tial of our approach in adaptive filtering. The first example
shows the better filtering performance of our approach than
the most commonly used gradient-descent algorithm for esti-
mating the parameters of nonlinear neural/fuzzy models. Some
hybrid methods, e.g., clustering for membership functions and
RLS algorithm for consequents, have been suggested in the lit-
erature for an online identification of the fuzzy models. It is
a well-known fact that RLS optimizes the average (expected)
performance under some statistical assumptions, while LMSoptimizes the worst-case performance. Since our algorithms are
the generalized versions of LMS and possess LMS-like robust-
ness properties (as indicated in Theorem 1), we do not compare
RLS algorithm with our algorithms. Moreover, for a fair com-
parison, RLS must be generalized as we generalized LMS in
our analysis. More will be said on the generalization of the RLS
algorithm in Section V.
V. SOME REMARKS
This text outlines an approach to adaptive fuzzy filtering in a
broad sense, and thus, several related studies could be made. In
particular, we would like to mention the following.1) We studied the adaptive filtering problem using a zero-
order TakagiSugeno fuzzy model due to the simplicity.
However, the approach can be applied to any semilinear
model with linear inequality constraints, e.g., first-order
TakagiSugeno fuzzy models, radial basis function (RBF)
neural networks, B-spline models, etc. The approach is
valid for any model characterized by parameters set such that = l n
y = GT (x, n )l , cn h.
2) We have considered for the p-norm generalization of
the algorithms the Bregman divergence associated to thesquared q-norm. Another important example of Bregmandivergences is the relative entropy, which are defined
between vectors u = [u1 uK]T and w = [w1 wK]T
(assuming ui , wi 0 andK
i=1 ui =K
i= 1 wi = 1) asfollows:
dRE (u, w) =K
i= 1
ui lnuiwi
.
The relative entropy is the Bregman divergences dF(u, w)
for F(u) =
Ki= 1 (ui ln ui ui ). It is possible to derive
and analyze different exponentiated gradient [29] type
fuzzy filtering algorithms via using relative entropy as
regularizer and following the same approach. However,
some additional efforts are required to handle the unity
sum constraint on the vectors.
3) We arrived at the explicit update form (14) by making an
approximation. A natural question that arises is if any
improvement in the filtering performance (assessed in
Theorem 1) could be made by solving (13) nu-
merically instead of approximating. An upper bound
on the filtering errors (as in Theorem 1) could
be calculated in this case too; however, we would
then be considering the a posteriori filtering errorskj = 0 P GT(x(j), j )j , GT (x(j), j ).
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
11/14
KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 773
Fig. 8. RMSFE(t) as function of time. The curves are averaged over 100 independent experiments. (a) p = 2.5 with u being dense. (b) p = 2ln(2m + 1 )
with u being sparse.
4) Our emphasis in this study was on filtering. In machine
learning literature, one is normally interested in the pre-
diction performance of such algorithms [10], [33], [34].
The presented algorithms could be evaluated, in a similar
manner as Theorem 1, by calculating an upper bound on
the prediction errorsk
j = 0 P
GT(x(j), j )j 1 , y(j)
.
5) This study offers the possibility of developing and analyz-
ing new fuzzy filtering algorithms by defining a continu-
ous strictly increasing function . An interesting researchdirection is to optimize the function for the problem athand. For example, in the context of linear estimation, an
expression for the optimum function that minimizes the
steady-state mean-square error is derived in [8].
6) Other than the LMS, the recursive least squares (RLS) is a
well-known algorithm that optimizes the average perfor-
mance under some stochastic assumptions on the signals.
The deterministic interpretation of RLS algorithm is that
it solves a regularized least-squares problem
wk = arg minw
kj =0
y(j) GTj w2 + 1 w2 .
Our future study is concerned with the generalization of
RLS algorithm, in the context of interpretable fuzzy mod-
els, based on the solution of following regularized least-
squares problem
(k , k ) = arg min( , ,c h)
Jk
Jk =k
j =0
Lj (, ) + 1 dq(
, 1 )
+ 1 dq(, 1 )
where some examples of loss term Lj (, ) are provided
in Table II.
VI. CONCLUSION
Much work has been done on applying fuzzy models in func-
tion approximation and classification tasks. We feel that many
real-world applications (e.g., in chemistry [35], biomedical en-
gineering [2], etc.) require the filtering of uncertainties from
the experimental data. The nonlinear fuzzy models by virtue
of membership functions are more promising than the classical
linear models. Therefore, it is essential to study the adaptive
fuzzy filtering algorithms. Adaptive filtering theory for linear
models has been well developed in the literature; however, its
extension to the fuzzy models is complicated by the nonlin-earity of membership functions and the interpretability con-
straints. The contribution of the manuscript (summarized in
Theorem 1, its inferences, and Table II) is to provide a mathe-
matical framework that allows the development and an analysis
of the adaptive fuzzy filtering algorithms. The power of our
approach is the flexibility of designing the algorithms based
on the choice of function and the parameter p. The derivedfiltering algorithms have the desired properties of robustness,
stability, and convergence. This paper is an attempt to provide
a deterministic approach to study the adaptive fuzzy filtering
algorithms, and the study opens many research directions, as
discussed in Section V. Future work involves the study of fuzzy
filtering algorithms derived using relative entropy as a regular-
izer, optimizing the function for the problem at hand, and ageneralization of RLS algorithm to p-norm in the context ofinterpretable fuzzy models.
APPENDIX
TAKAGISUGENO FUZZY MODEL
Let us consider an explicit mathematical formulation of a
Sugeno-type fuzzy inference system that assigns to each crisp
value (vector) in input space a crisp value in output space.
Consider a Sugeno fuzzy inference system (Fs : X Y), map-
ping n-dimensional input space (X = X1 X2 Xn ) to
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
12/14
774 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
one-dimensional real line, consisting of K different rules. Theith rule is in form
Ifx1 is Ai1 and x2 is Ai2 and xn is Ain then y = ci
for all i = 1, 2, . . . , K , where Ai1 , Ai2 , . . . , Ain arenonempty fuzzy subsets of X1 , X2 , . . . , X n , respectively,
such that the membership functions A i j : Xj [0, 1] ful-fillK
i= 1
nj =1 A i j (xj ) > 0 for all xj Xj , and values
c1 , . . . , cK are real numbers. The different rules, by usingproduct as conjunction operator, can be aggregated as
Fs (x1 , x2 , . . . , xn ) =
Ki= 1 ci
nj = 1 A i j (xj )K
i=1
nj = 1 A i j (xj )
. (35)
Let us define a real vector such that the membership functionscan be constructed from the elements of vector . To illustratethe construction of membership functions based on knot vector
(), consider the following examples.1) Trapezoidal membership functions: Let
=
a1 , t11 , . . . , t
2P1 21 , b1 , . . . , an , t
1n , . . . , t
2Pn 2n , bn
such that for the ith input (xi [ai , bi ]), ai 0 for all i = 1, . . . , nsuch that for trapezoidal membership functions
t1i ai i
tj + 1i tji i for all j = 1, 2, . . . , (2Pi 3)
bi t2Pi 2i i .
These inequalities can be written in terms of a matrix inequality
c h [18], [37][42]. Hence, the output of a Sugeno-typefuzzy model
Fs (x) = GT (x, ), c h
is linear in consequents (i.e., ) but nonlinear in antecedents(i.e., ).
REFERENCES
[1] M. Kumar, K. Thurow, N. Stoll, and R. Stoll, Robust fuzzy mappings forQSAR studies, Eur. J. Med. Chem., vol. 42, no. 5, pp. 675685, 2007.
[2] M. Kumar, M. Weippert, R. Vilbrandt, S. Kreuzfeld, and R. Stoll, Fuzzyevaluation of heartrate signals for mental stress assessment, IEEE Trans.Fuzzy Syst., vol. 15, no. 5, pp. 791808, Oct. 2007.
[3] A. H. Sayed, Fundamentals of Adaptive Filtering. New York: Wiley,2003.
[4] S. Haykin, Adaptive Filter Theory, 3rd ed. New York: PrenticeHall,1996.
[5] B. Hassibi, A. H. Sayed, and T. Kailath, H optimality of the LMSalgorithm, IEEE Trans. Signal Process., vol. 44, no. 2, pp. 267280,Feb. 1996.
[6] N. R. Yousef and A. H. Sayed, A unified approach to the steady-stateand tracking analyses of adaptive filters, IEEE Trans. Signal Process.,
vol. 49, no. 2, pp. 314324, Feb. 2001.[7] T. Y. Al-Naffouri and A. H. Sayed, Transient analysis of adaptive filters
with error nonlinearities, IEEE Trans. Signal Process., vol. 51, no. 3,pp. 653663, Mar. 2003.
[8] T. Y. Al-Naffouri and A. H. Sayed, Adaptive filters with error nonlin-earities: Mean-square analysis and optimum design, EURASIP J. Appl.Signal Process., vol. 2001, no. 4, pp. 192205, 2001.
[9] J. Kivinen,M. K. Warmuth, andB. Hassibi, The p-normgeneralization ofthe LMS algorithm for adaptive filtering, IEEE Trans. Signal Process.,vol. 54, no. 5, pp. 17821793, May 2006.
[10] C. Gentile, The robustness of the p-norm algorithms, Mach. Learning,vol. 53, no. 3, pp. 265299, 2003.
[11] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Comput-ing: A Computational Approach to Learning and Machine Intelligence.Upper Saddle River, NJ: PrenticeHall, 1997.
[12] J. Abonyi, L. Nagy, and F. Szeifert, Adaptive fuzzy inference system and
its application in modelling and model based control, Chem. Eng. Res.Des., vol. 77, pp. 281290, Jun. 1999.[13] P. P. Angelov and D. P. Filev, An approach to online identification of
TakagiSugeno fuzzy models, IEEE Trans. Syst., Man Cybern. B, Cy-bern., vol. 34, no. 1, pp. 484498, Feb. 2004.
[14] M. J. Er, Z. Li, H. Cai, and Q. Chen, Adaptive noise cancellation usingenhanced dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst.,vol. 13, no. 3, pp. 331342, Jun. 2005.
[15] C.-F. Juang and C.-T. Lin, Noisy speech processing by recurrently adap-tive fuzzy filters, IEEE Trans. Fuzzy Syst., vol. 9, no. 1, pp. 139152,Feb. 2001.
[16] B. Hassibi, A. H. Sayed, and T. Kailath, H optimality criteria for LMSand backpropagation, in Advance Neural Information Process Systems,vol. 6, J. D. Cowan, G. Tesauro, and J. Alspector, Eds. San Mateo, CA:Morgan Kaufmann, Apr. 1994, pp. 351359.
[17] W. Yu and X. Li, Fuzzy identification using fuzzy neural networks withstable learning algorithms, IEEE Trans. Fuzzy Syst., vol. 12, no. 3,
pp. 411420, Jun. 2004.
[18] M. Kumar, N. Stoll, and R. Stoll, An energy gain bounding approachto robust fuzzy identification, Automatica, vol. 42, no. 5, pp. 711721,May 2006.
[19] M. Kumar, R. Stoll, and N. Stoll, Deterministic approach to robust adap-tive learning of fuzzy models, IEEE Trans. Syst., Man. Cybern. B,Cybern., vol. 36, no. 4, pp. 767780, Aug. 2006.
[20] K. S. Azoury and M. K. Warmuth, Relative loss bounds for on-linedensity estimation with the exponential family of distributions, Mach.
Learning, vol. 43, no. 3, pp. 211246, Jun. 2001.[21] L. Bregman, The relaxation method of finding the common point of
convex sets and its application to the solution of problems in convexprogramming, USSR Comp. Math. Phys., vol. 7, pp. 200217, 1967.
[22] J. Kivinenand M. K. Warmuth,Relative lossbounds for multidimensionalregression problems, Mach. Learning, vol. 45, no. 3, pp. 301329, 2001.
[23] M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, Adaboostand Bregman distances, Mach. Learning, vol. 48, pp. 253285, 2002.
[24] N. Murata, T. Takenouchi, T. Kanamori, and S. Eguchi, Informationgeometry of u-boost and Bregman divergence, Neural Comput., vol. 16,no. 7, pp. 14371481, 2004.
[25] B. Taskar, S. Lacoste-Julien,and M. I. Jordan, Structured prediction, dualextragradient and Bregman projections, J. Mach. Learning Res., vol. 7,pp. 16271653, 2006.
[26] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering withBregman divergences, J. Mach. Learning Res., vol. 6, pp. 17051749,2005.
[27] R. Nock and F. Nielsen, On weighting clustering, IEEE Trans. PatternAnal. Mach. Intell., vol. 28, no. 8, pp. 12231235, Aug. 2006.
[28] A. Banerjee, X. Guo, and H. Wan, On the optimality of conditionalexpectation as a Bregman predictor, IEEE Trans. Inf. Theory, vol. 51,no. 7, pp. 26642669, Jul. 2005.
[29] J. Kivinen and M. K. Warmuth, Additive versus exponentiated gradientupdates for linear prediction, Inf. Comput., vol. 132, no. 1, pp. 164, Jan.1997.
[30] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems.Philadelphia, PA: SIAM, 1995.
[31] D. P. Helmbold, J. Kivinen,and M. K. Warmuth, Relative loss boundsforsingle neurons, IEEE Trans. Neural Netw., vol. 10, no.6, pp.12911304,Nov. 1999.
[32] P. Auer, M. Hebster, and M. K. Warmuth, Exponentially many localminima for single neurons, in Advance Neural Information ProcessSystems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds.Cambridge, MA: MIT Press, Jun. 1996, pp. 316322.
[33] N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth, Worst-case quadraticloss bounds for prediction using linear functions and gradient descent,
IEEE Trans. Neural Netw., vol. 7, no. 3, pp. 604619, May 1996.[34] P. Auer,N. Cesa-Bianchi, and C. Gentile, Adaptive and self-confidenton-
line learning algorithms, J. Comput. Syst. Sci., vol. 64, no. 1, pp. 4875,Feb. 2002.
[35] S. Kumar, M. Kumar, R. Stoll, and U. Kragl, Handling uncertainties intoxicity modelling using a fuzzy filter, SAR QSAR Environ. Res., vol. 18,no. 7/8, pp. 645662, Dec. 2007.
[36] P. Lindskog, Fuzzy identification from a grey box modeling point ofview, in Fuzzy Model Identification: Sel. Approaches, H. Hellendoornand D. Driankov, Eds. Berlin, Germany: Springer-Verlag, 1997.
[37] M. Burger, H. Engl, J. Haslinger, and U. Bodenhofer, Regularized data-driven construction of fuzzy controllers, J. Inverse Ill-Posed Problems,vol. 10, pp. 319344, 2002.
[38] M. Kumar, R. Stoll, and N. Stoll, Robust adaptive fuzzy identification
of time-varying processes with uncertain data. Handling uncertainties inthe physical fitness fuzzy approximation with real world medical data:An application, Fuzzy Optim. Decis. Making, vol. 2, pp. 243259, Sep.2003.
[39] M. Kumar, R. Stoll, and N. Stoll, Regularized adaptation of fuzzy infer-ence systems. Modelling the opinion of a medical expert about physicalfitness: An application, FuzzyOptim. Decis. Making, vol. 2, pp. 317336,Dec. 2003.
[40] M. Kumar, R. Stoll, and N. Stoll, Robust solution to fuzzy identificationproblem with uncertain data by regularization. Fuzzy approximation tophysical fitness with real world medical data: An application, FuzzyOptim. Decis. Making, vol. 3, no. 1, pp. 6382, Mar. 2004.
[41] M. Kumar, R. Stoll, and N. Stoll, Robust adaptive identification of fuzzysystems with uncertain data, Fuzzy Optim. Decis. Making, vol. 3, no. 3,pp. 195216, Sep. 2004.
[42] M. Kumar, R. Stoll, and N. Stoll, A robust design criterion for inter-pretable fuzzy models with uncertain data, IEEE Trans. Fuzzy Syst.,
vol. 14, no. 2, pp. 314328, Apr. 2006.
-
7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting
14/14
776 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009
Mohit Kumar (M08) received the B.Tech. degree inelectrical engineering from the National Institute ofTechnology, Hamirpur, India, in 1999, the M.Tech.degree in control engineering from the Indian Insti-tute of Technology, Delhi, India, in 2001, and thePh.D. degree (summa cum laude) in electrical engi-neering from Rostock University, Rostock, Germany,in 2004.
From 2001 to 2004, he was a Research Scien-tist with the Institute of Occupational and SocialMedicine, Rostock. He is currently with the Center
for Life Science Automation, Rostock. His current research interests include ro-bust adaptive fuzzy identification, fuzzy logic in medicine, and robust adaptivecontrol.
Norbert Stoll received the Dip.-Ing. in automa-tion engineering and the Ph.D. degree in measure-ment technology from Rostock University, Rostock,Germany, in 1979 and 1985, respectively.
He was the Head of the analytical chemistry sec-tion with the Academy of Sciences of GDR, Cen-tral Institute for Organic Chemistry, until 1991. From1992 to 1994, he was an Associate Director of theInstitute of Organic Catalysis, Rostock. Since 1994,he has been a Professor of measurement technologywith the Engineering Faculty, University of Rostock.
From 1994 to 2000, he was the Director of the Institution of Automation atRostock University. Since 2003, he has been the Vice President of the Centerfor Life Science Automation, Rostock. His current research interests includemedical process measurement, lab automation, and smart systems and devices.
Regina Stoll received the Dip.-Med., the Dr.med.in occupational medicine, and the Dr.med.habil inoccupational and sports medicine from Rostock Uni-versity, Rostock, Germany, in 1980, 1984, and 2002,respectively.
She is the Head of the Institute of PreventiveMedicine, Rostock. She is a faculty member withthe Medicine Faculty and a Faculty Associate withthe College of Computer Science and Electrical En-gineering, Rostock University. She also holds an ad-
junct faculty member position with the Industrial En-gineering Department, North Carolina State University, Raleigh. Her currentresearch interests include occupational physiology, preventive medicine, andcardiopulmonary diagnostics.