chapter 2 literature review -...
TRANSCRIPT
![Page 1: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/1.jpg)
16
CHAPTER 2
LITERATURE REVIEW
2.1. INTRODUCTION This chapter provides an in-depth review of methods and state of arts related to
student’s academic performance evaluation for the development of better
understanding. To begin with predicting student performance, student modeling,
detecting undesirable student behaviors and grouping of students has been
elaborated. Apart from this, selected methods used for student’s academic
performance evaluation and soft computing techniques have been described in
detail.
2.2. PREDICTING STUDENT PERFORMANCE Prediction of student performance aims to estimate the unknown variable value
describing the student performance, knowledge, score or mark. This value can be
numerical/continuous value (regression task) or categorical/discrete value
(classification task). Regression analysis determines the relationship between
dependent and independent variables [1]. In classification procedure individual
items classified into various groups on the basis of quantitative characteristics
inherent therein and training set of previously labeled items [2]. Prediction of
student’s performance is one of the newest and most popular applications of soft
computing technique in education. Different techniques and models have been
applied (fuzzy logic, neural networks, Bayesian networks, rule-based systems,
regression and correlation analysis) for this purpose.
A comparison of machine learning methods has been carried out to predict success
in a course (either passed or failed) in Intelligent Tutoring Systems. Other
comparisons of different data mining algorithms are made to classify students
(predict final marks) based on Moodle usage data [3]; to predict student
performance/final grade on the basis of features extracted from logged data [4].
Neural network models based on back-propagation and feed forward neural
networks have been frequently used to predict final student grades. To predict the
number of errors, students have been assumed by make using feed-forward and
![Page 2: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/2.jpg)
17
backpropagation. However, to predict performance from test scores generally
backpropagation and counter-propagation techniques have been used. To predict
students’ pass or fail from Moodle logs (using radial basis functions) is widely
used [5] have ability to predict performance of a candidate for admission
eligibility into the university (multilayer perceptron topology) [6]. Within a
tutoring system Bayesian networks have been used to predict student applicant
performance to assess knowledge and performance [7]. Various rule-based systems have been applied to predict student performance
(mark prediction): monitoring and evaluation of student academic performance by
using rule induction [8]; final grades extracted from logged data in an education
web-based system (using genetic algorithm to find association rules) [9]; grades in
learning management systems (using grammar guided genetic programming) and
student performance and provide timely lessons in web-based e-learning systems
(using decision tree) [10].
Regression techniques (using model trees, neural networks, linear regression,
locally weighed linear regression and support vector machines) have been used to
predict student’s marks in an open university. Student performance from log and
test scores in web-based instruction has been assessed by using multivariable
regression model. Prediction of student academic performance (using stepwise
linear regression) identification variables that could predict success in colleges
courses (using multiple regression) [11], university students’ satisfaction (using
regression and decision trees analysis) [12]; determination of time a student will
get a question correct and association rules to guide a search process to find
transfer models to predict a student’s success (using logistic regression) are
available. To predict of probability of giving the correct answer to a problem in an
ITS (using a robust Ridge regression algorithm) [13] have been also assumed.
Correlation analyses have also been applied together to predict web-student
performance in on-line classes, exam score in online tutoring for predicting
probabilities high school students’ success in university [14].
Fuzzy mathematical modeling technique provides a solution in area of
performance measurement techniques and its evaluation. An effective
performance evaluation system can play a crucial role in an organization's efforts
to gain competitive advantage like motivating peak individual performance and
![Page 3: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/3.jpg)
18
improving quality of a student [15]. A student performance evaluation system
using fuzzy interference, FIS tool in MATLAB is used for building a Mamdani
fuzzy inference system using the inferences. The integration of such fuzzy
knowledge requires a methodology for converting fuzzy data into crisp data for a
quantitative analysis [16]. Fuzzy Analytic Hierarchy Process (FAHP) is frequently
used multi criteria decision making technique used to find ranking of students and
teachers. The quality of teacher is fuzzy in nature, hence FAHP approach can deal
better with this situation and finally decide ranking of the teachers based on the
multiple conflicting criteria of the teachers [17]. Model using fuzzy logic
approach to predict the risk status of students based on some predictive factors is
proposed. Some basic information that has correlations with students’ academic
achievement and other predictive variables have been modeled, the simulated
model shows some degree of risk associated with the past academic achievement
[18].
Saxena and Singh have presented a simulation of Neuro-Fuzzy application for
analyzing students’ performance based on their CPA and GPA which attempt for
extension of Analysis on Student’s Performance using Fuzzy Systems [19]. Fuzzy
Association Rule Mining (FARM) showed potential for identification of the
hidden relationships that exist between students’ pre-admission profile and
academic performance [20]. The advantage of Neural Networks is its learning
capability to adapt new data. On the other hand, Fuzzy Systems has the capability
to handle numerical data and linguistic knowledge simultaneously [21].
2.3. STUDENT MODELING The objective of student modeling is to develop cognitive models of human
users/students including their skills and declarative knowledge. Data mining has
been applied automatically for traits like motivation, satisfaction, learning styles,
affective status, etc. and learning behavior in order to automate the construction of
student models. This goal may be achieved via DM techniques and algorithms
(mainly, Bayesian networks). Several data mining algorithms (Naïve Bayes,
Bayes net, support vector machines, logistic regression and decision trees) have
been compared to detect student mental models in intelligent tutoring systems
[22]. Unsupervised (clustering) and supervised (classification) machine learning
![Page 4: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/4.jpg)
19
have been proposed to reduce development costs in building user models and to
facilitate transferability in intelligent learning environments. Bayesian networks
have been used to predict about student knowledge i.e. probability of student
skilled through cognitive tutors and students’ learning status in web-based
education system [23].
The use of cognitive and non-cognitive measures of students along with
background information to design predictive models of student performance by
using artificial neural networks (ANN) is available. These predictions constitute a
true predictive classification of academic performance anticipate one year advance
the actual academic performance [24]. Artificial Neural Networks and expert
systems to obtain knowledge for the learner model in the Linear Programming
Intelligent Tutoring System (LP-ITS) is able to determine the academic
performance level of the learners in order to offer the proper difficulty level of
linear programming. LP-ITS have been used Feed forward Back-propagation
algorithm to be trained with a group of learner’s data to predict their academic
performance [25]. The accurate prediction of student academic performance is of
prime importance for making admission decisions as well as providing better
educational services. Two models, the hierarchical (Adaptive Neuro-Fuzzy
Inference System) ANFIS and (Artificial neural network) ANN have been
proposed to predict student’s academic performance [26]. The student’s
performance has been evaluated based on selected attributes which generate rules
by means of association rule mining. Artificial neural network checks accuracy of
the results [27].
Three supervised data mining algorithms (Naive Bayes, Multilayer Perceptron and
decision tree-J48) have been applied on the preoperative assessment data to
predict success in a course (either passed or failed) and the performance of the
learning methods based on their predictive accuracy, ease of learning and user
friendly characteristics. The results indicate that the Naïve Bayes classifier
outperforms in prediction decision tree and neural network methods. [28]. Data
mining has been used to predict the intra-year Academic Performance of the
student using the historic data and final grade of students [29-30].
Education Data Mining is a promising discipline which has an imperative impact
on predicting students’ academic performance. Student’s performance has been
![Page 5: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/5.jpg)
20
evaluated using association rule mining algorithm [31]. Predicting student’s
intermediate mental steps in sequences of actions stored by learning environments
based on problem solving have also been made. Association rule algorithms have
been applied for personality mining based on web-based education models in
order to deduce learners’ personality characteristics [32]. Meaningful
characteristics extraction and updation of model have been carried out to reflect
newly gained knowledge. Self-organizing maps and principal component analysis
have been applied for predictive and compositional modeling of student profile
[33].
2.4. DETECTING UNDESIRABLE STUDENT BEHAVIORS Detection of undesirable student behavior aims to find out students who having
some problem or unusual behavior such as: erroneous actions, low motivation,
playing games, misuse, cheating, dropping out, academic failure, etc. Several soft
computing techniques (predominantly classification and clustering) have been
used to search such students so that they may be appropriately. Such classification
algorithms used for predicting, understanding and preventing academic failure
includes decision tree, neural networks, naïve Bayes, instance-based learning,
logistic regression and support vector machines, feed-forward neural networks,
probabilistic ensemble simplified fuzzy ARTMAP, Bayesian nets, logistic
regression, simple logic classification, instance based classification, attribute
selected classification, bagging, classification via regression, Bayesian classifiers,
logistic models, rule-based learner, random forest, C4.5 decision tree algorithm,
J48 decision tree algorithm, FarthestFirst clustering and algorithm, algorithm for
the automatic identification of the students’ cognitive styles [34-38].
Discriminant analysis, neural networks, random forests and decision trees have
been used for classifying university students into low-risk, medium-risk and high-
risk of failing [39]. Decision tree algorithms help earlier in identifying the
dropouts and students who need special attention and allow teacher to provide
appropriate advising/counseling [40]. In Educational Data Mining hidden
knowledge can be retrieve through data mining techniques which indicate
student's terminal performance [41]. Among the clustering used for this purpose,
![Page 6: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/6.jpg)
21
prominent are: Kohonen nets to detect students that cheat in online assessments;
outlier detection method to detect learners’ irregular learning [42-43].
2.5. GROUPING STUDENTS Creation of students groups involve student customized features, personal
characteristics which can be used by the instructor/developer for various purpose
like personalized learning system, promotion of effective group learning, for
adaptive contents. The DM techniques utilized for this purpose are classification
(supervised learning) and clustering (unsupervised learning). Cluster analysis or
clustering is the assignment of set of observations into subsets (called clusters)
based on maximum possible similarity [44].
Various clustering algorithms have been used to group students; the prominent
are: hierarchical agglomerative clustering, K-Means and models with similar skill
profiles [45]; clustering algorithm based on large generalized sequences to find
groups of student with similar learning characteristics [46]; hierarchical clustering
algorithm for user modeling (learning styles) in intelligent e-learning systems in
order to group students according to their individual learning style preferences
[47]; hybrid clustering and Bayesian networks to group students according to their
skills [48]; improved matrix-based clustering for grouping learners by
characteristics in e-learning [49]; fuzzy clustering algorithm to find out groups of
learner according to their personality and learning strategy [50]; Expectation-
Maximization algorithm to form heterogeneous groups according to student skills;
K-means clustering algorithm to discover interesting patterns that characterize the
work of stronger and weaker students [51]; Multiple correspondence analysis and
cross-validation by correlation analysis have been applied to identify learning
styles in Index of Learning Styles (ILS) questionnaires [52]; two-step cluster
analysis to classify how students organize personal information spaces (piling,
one-folder, small-folders and big-folder filing) [53]; hierarchical cluster analysis
to establish the proportion of students who get an exercise wrong or right and
genetic clustering algorithm to solve the problem of allocating new [54].
![Page 7: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/7.jpg)
22
2.6. EXISTING FUZZY APPROACH Student performance evaluation tasks require consideration of evidence collected
via various modes of assessment such as practical, examinations and observations
all involves awarding scores as numerical values and grades that may often be
expressed in linguistic terms such as good, bad, satisfactory, excellent [55]. These
linguistic terms carry imprecision that may arise from human interpretations and
from different means of implementing the evaluation. The use of linguistic terms
in assessing performance has been the main reason for researchers applying fuzzy
techniques to student performance evaluation. It has been argued that one of most
appropriate way of handling multiple variables that contain imprecise data is use
of fuzzy logic reasoning which reflects the way of human-thinking. The important
reasons for fuzzy approach utilization in educational grading system incorporate
the presence of substantial vagueness in educational systems and ability of fuzzy
theory to provide subjective judgment [56]. Law reinforces use of fuzzy
techniques for student performance evaluation by giving a list of reasons [57]: 1. Scores/marks given for student performance are not very precise.
2. Examinations consist of vague data.
3. A common method of reading students is the use of linguistic variables.
The fuzzy approach for evaluation of student performance involves three
important tasks: fuzzification, inference and de-fuzzification. In general, student
scores or marks (crisp values) have to be transformed into fuzzy input values by
the use of suitable membership before aggregation. Fuzzy values can also be
obtained directly from domain experts, avoiding the need for fuzzification in this
case. The outputs of fuzzy inference are typically in terms of fuzzy values
representing a student's performance. These fuzzy values need to be again
transformed into crisp values in order to produce an output, often as easy to
understand e.g. percentage mark. Four prominent approaches used for evaluation
of student performance have been described below in detail:
2.6.1. Biswas’s Approach Biswas fuzzy technique based on student's answerscripts [56] employs idea of
fuzzy similarity which is specifically defined as follows:
![Page 8: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/8.jpg)
23
For two discrete fuzzy sets Q and M their similarity is:
푆(푄,푀) = ∑ ( )∗ ( )∑ ( ), ∑ ( )
(2.1)
Where i =1,2,…, are the domain elements. Obviously S(Q,M) [0,1]. Also, the
larger the value of S(Q,M), the greater the similarity between fuzzy sets Q and M.
In their work, the above measure is used to compare the similarity of a student’s
performance, expressed in fuzzy values, with Standard Fuzzy Sets (SFS), which
are predefined with membership values corresponding to different levels of
student performance. The SFS are devised by experts according to the standard
fixed by educational authority. SFS refer to levels of student performance such as:
Excellent (A), Very Good (B), Good (C), Satisfactory (D) and Unsatisfactory (F)
(Table 2.1). Initially evaluator has been awarded fuzzy marks for each question
(Qi) into fuzzy grade sheet containing rows for questions and columns for
awarding marks. A matching operation is then performed according to definition
(2.1) for each question (Qi), to each level of performance A, B, C, D and F, to
obtain similarity values S(Qi,A), S(Qi,B), S(Qi,C), S(Qi,D) and S(Qi,F). The grade
for each question is determined based on the maximum similarity value among the
level of performance. The total score involves the use of marks allocated for each
question and mid-grade points for each grade is awarded (Table 2.2). Different
grades obtained from each question are used to calculate the total score based on
the definition:
푇푆 = [∑푇(푄 ) ∗ 푃(푔 )] (2.2)
Where T(Qi) are marks allocated for each question and P(gi) are the mid-grade
points. The total score (TS) will be in the form of crisp values [0, 100]. New final
grade will be determined based on crisp interval values referring to the level of
performance.
Table 2.1: Standard Fuzzy Sets to Represent Student Performance
S.No. Linguistic Terms Fuzzy Sets 1. Excellent {0/0, 0/20, 0.8/40, 0.9/60,1/80, 1/100} 2. Very Good {0/0, 0/20, 0.8/40, 0.9/60, 0.9/80, 0.8/100} 3. Good {0/0, 0/20, 0.8/40, 0.9/60, 0.9/80, 0.8/100} 4. Satisfactory {0.4/0, 0.4/20, 0.9/40, 0.6/60, 0.2/80, 0/100} 5. Unsatisfactory {1/0, 1/20, 0.4/40, 0.2/60, 0/80, 0/100}
![Page 9: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/9.jpg)
24
Table 2.2: Grade and their Corresponding Mid-Grade Points
S.No. Linguistic Terms Grade Mid-Grade Points
1. Excellent (90 A 100) 95 2. Very Good (80 A 90) 85 3. Good (50 A 70) 60 4. Satisfactory (30 A 50) 40 5. Unsatisfactory (0 A 30) 15
Although this technique shows the usefulness of fuzzy membership values for
aggregating marks from different questions, its disadvantages are as follows:
1. The use of fuzzy grade sheet (to obtain fuzzy marks) is very confusing
because the fuzzy marks are not referred to each level of performance.
2. This method may be time consuming to compute the matching operations
between the fuzzy marks and each of the SFS.
3. Method also suffers from the use of mid-grade points in the calculation of the
total score. These values may greatly influence the total score and thus can
create unexpected results.
2.6.2. Chen and Lee's Approach Chen and Lee technique aims to resolve drawbacks of the method outlined above
for evaluation of student answerscripts [58]. In this approach, the degree of
satisfaction is defined in advance by experts with respect to levels of performance.
In this way, the maximum degree of satisfaction per level is obtained as
summarized in Table 2.3. Also shown are eleven levels of student performance
that have been proposed and used. The evaluator has to award fuzzy marks into
the fuzzy grade sheet for each question (Qi) according to level of performance.
From this, the degree of satisfaction for each individual is calculated as:
퐷(푄) =∑ ( )∗ ( )
∑ ( ) (2.3)
where 휇 (푥 ) are membership values awarded to each level of performance and
F(xi) is the respective maximum degree of satisfaction.
The final step of the method is to calculate the total score TS based on questions
as follows:
푇푆 = ∑푇(푄 ) ∗ 퐷(푄 ) (2.4)
![Page 10: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/10.jpg)
25
where T(Qi) is marks allocated for each question by the evaluator and D(Qi) is the
computed degrees of satisfaction for Qi.
Table 2.3: Degrees of Satisfaction According to Performance Level
S.No. Satisfaction Levels Degrees of Satisfaction
Maximum Degree of Satisfaction
1. Extremely good (EG) 100 1.00 2. Very very good (VVG) 91-99 0.99 3. Very good (VG) 81-90 0.90 4. Good (G) 71-80 0.80 5. More or less good (MG) 61-70 0.70 6. Fair (F) 51-60 0.60 7. More or less bad (MB) 41-50 0.50 8. Bad (B) 25-40 0.40 9. Very Bad(VB) 10-24 0.24
10. Very Very Bad (VVB) 01-09 0.09 11. Extremely bad (EB) 0 0.00
From TS a grade is awarded based on the satisfaction level that has been
predefined. This technique also has several disadvantages as given below:
1. The usage of the maximum degree of satisfaction is very confusing and the
results of the aggregation are biased towards the number of satisfaction levels
created.
2. Lower satisfaction level means that the difference between the original score
and the new score is greater.
3. The use of an extended fuzzy grade sheet to award fuzzy marks may not be
practical when the problem scales up, as it involves awarding too many fuzzy
values to evaluate each question.
4. Become worse in cases where the number of questions or modes of assessment
increases.
2.6.3. Law's Approach Law proposed an alternative approach to student performance evaluation based on
the notion of fuzzy expected values [57]. The fuzzy expected value of a fuzzy set
A is defined as:
퐸(퐴) = ∫ ( ) ( )∫ ( ) ( )
(2.5)
![Page 11: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/11.jpg)
26
With 휇 (푥)being the membership function of x in A and f(x) being the distribution
function of x in A. Contrary to other methods, Law's approach the original student
scores have to be represented in crisp values. Fuzzification is used to transform
such scores into fuzzy values. The fuzzy partitions underlying the fuzzification are
defined in advance by experts based on an expectation of the percentage of
students who will receive a certain level of performance (being one of the
following five grades: A, B, C, D and F). A fuzzy assessment matrix, M, is created
using the fuzzified values as given in equation (2.6):
푀 =휇 (푄 )휇 (푄 )
… …휇 (푄 )
휇 (푄 )휇 (푄 )
… …휇 (푄 )
휇 (푄 )휇 (푄 )
… …휇 (푄 )
휇 (푄 )휇 (푄 )
… …휇 (푄 )
휇 (푄 )휇 (푄 )
… …휇 (푄 )
(2.6)
The matrix is employed in conjunction with the fuzzy expected values for each
level of performance to compute an intermediate new score vector (one new score
per question):
푁푆 = 푀 ∗ [퐸(퐴),퐸(퐵),퐸(퐶),퐸(퐷),퐸(퐹)] (2.7)
where the expected values for each level of performance E(A), E(B), E(C), E(D),
and E(F) are calculated using equation (2.7) and the same fuzzy partitions
mentioned above. This new vector is then used to calculate the core of the total
score (CTS),
퐶푇푆 = ∑ 퐷 푄 ∗ 푁푆 (2.8)
where D(Qj) are the full percentage marks allocated for each question. Since CTS
(0, 1), the final total score, TS is set to CTS 100 to obtained a readily
understandable mark on student performance. The approach demonstrates the
advantage of using fuzzy expected value in student performance evaluation.
Disadvantages of this technique are given below:
1. Although it may be useful to obtain evaluation results according to expert
expectation the resulting new total score and grade may not reflect actual
performance of the student on the subject matter. This is due to initial fuzzy
partitions which may not be specified with regard to student’s performance.
2. This method works with respect to single evaluation criterion therefore it
cannot assess student's performance based on multiple criteria. In addition, the
method involves extensive computation making limitations for the approach.
![Page 12: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/12.jpg)
27
2.6.4. Rasmani and Shen’s Method Rasmani and Shen proposed a special fuzzy inference technique and use of data
driven fuzzy rule identification method which allow the addition of expert
knowledge [55] with aim to obtain user comprehensive knowledge from historical
data making possible justification of evaluation. The suggested inference
technique, called weighted fuzzy subset hood based reasoning, developed for
multiple input and single output (MISO) fuzzy systems that apply rules of form:
퐼퐹 퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅 …푂푅 푤 퐸 ,퐴 .퐴 푂푅 …
푂푅 푤 퐸 ,퐴 .퐴 ] 퐴푁퐷
퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅…푂푅 푤 퐸 ,퐴 .퐴 푂푅… …
푂푅 푤 퐸 ,퐴 .퐴 ] 퐴푁퐷… … …퐴푁퐷
퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅 …푂푅 푤 퐸 ,퐴 .퐴 푂푅 … …
푂푅 푤 퐸 ,퐴 .퐴 ] 퐴푁퐷 …퐴푁퐷
퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅…푂푅 푤 퐸 ,퐴 .퐴 푂푅…
푂푅 푤 퐸 ,퐴 .퐴 ] 푇퐻퐸푁 퐵 푖푠 퐸
where m is the number of antecedent dimensions, 퐴 , 푘 휖 [푙,푚] are the antecedent
linguistic variables, 푛 is the number of linguistic terms in the kth antecedent
dimension, B is the consequent linguistic variables, 퐸 , 푖 ∈ [푙,푁] is the ith
consequent linguistic term, N is the number of consequent linguistic terms, and
푤(퐸 ,퐴 ) is the relative weight of the antecedent linguistic term 퐴 . The weight
expresses the influence of the set 퐴 towards the conclusion drawn. One
determines the weight as a result of the normalization of the fuzzy subset hood
value of the set.
푤 퐸 ,퐴 = ,
…. , (2.9)
The fuzzy subset hood value S represents in this case the degree to which the
fuzzy set 퐴 is the subset of a the fuzzy set 퐸 . It is calculated as:
![Page 13: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/13.jpg)
28
푆 퐸 ,퐴 =∑ ∇ ( ), ( )∈
∑ ( ) (2.10)
where U is the universe of discourse, 휇 is the membership function, and ∇ is an
arbitrary t-norm.
The rule base contains only one rule for each consequent linguistic term. The first
step of the fuzzy inference is the calculation of the overall weight of each rule by
applying the arbitrary disjunction and conjunction operator to the antecedent side.
Next, one selects the rule having the highest weight, whose consequent the final
score of the student. One identifies the rule base in the following steps:
1. Create the input and output partitions.
2. Divide the training dataset into subgroups on the output linguistic terms.
3. Calculate fuzzy subset hood values for each subgroup.
4. Calculate weight for each linguistics term.
5. Create rules of form.
6. Test the rule base using a test dataset.
The main advantage of this method is that it requires a rule base with a low
number of rules, which number is equal with the numbers of output linguistic
terms. Besides, it allows the evaluation of question/test to be made by fuzzy
numbers. This technique also suffers few disadvantages:
1. It is not clear how the antecedent and consequent are determined and what is
the meaning of the fuzzy subset hood values in case of the evaluation of the
student’s academic performance?
2. Numbers of Fuzzy ‘IF-THEN’ rules are maximum making computation
process more complex.
3. Some rules are not used for inference mechanism.
2.7. SOFT COMPUTING Soft computing is a collection of methodologies that aim to exploit the tolerance
for imprecision and uncertainty to achieve tractability, robustness, and low
solution cost [59]. The term soft computing was proposed by the inventor of fuzzy
logic Lotfi A. Zadeh. Its principal constituents are fuzzy logic, neurocomputing,
and probabilistic reasoning. Soft computing is likely to play an increasingly
![Page 14: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/14.jpg)
29
important role in many application areas including software engineering. The role
model for soft computing is the human mind.
Soft computing, not precisely defined, consists of distinct concepts and techniques
which aim to overcome the difficulties encountered in real world problems. These
problems result from the fact that our world seems to be imprecise, uncertain and
difficult to categorize. For example, the uncertainty in a measured quantity is due
to inherent variations in the measurement process itself. The uncertainty in a result
is due to the combined and accumulated effects of these measurement
uncertainties which were used in the calculation of that result [60].
In many cases the increase in precision and certainty can be achieved by lot of
work and cost. Zadeh gives as an example the travel salesman problem, in which
the computation time is a function of accuracy and it increases exponentially [59].
Another possible definition of soft computing is to consider it as an anti-thesis to
the concept of computer we now have, which can be described with all the
adjectives such as hard, crisp, rigid, inflexible and stupid. Along this track, one
may see soft computing as an attempt to mimic natural creatures: plants, animals,
human beings, which are soft, flexible, adaptive and clever. Thus soft computing
is the name of a family of problem-solving methods that are analogous with
biological reasoning and problem solving (sometimes referred as cognitive
computing). The basic methods included in cognitive computing are fuzzy logic
(FL), neural networks (NN) and genetic algorithms (GA) methods which do not
derive from classical theories.
Soft computing can also be seen as a foundation for the growing field of
computational intelligence (CI). The difference between traditional artificial
intelligence (AI) and computational intelligence is that AI is based on hard
computing whereas CI is based on soft computing. Soft Computing is not just a
mixture of these ingredients, but a discipline in which each constituent contributes
a distinct methodology for addressing problems in its domain in complementary
manner rather than competitive way [59].
Soft computing methods have been applied to many real-world problems.
Applications can be found in signal processing, pattern recognition, quality
assurance and industrial inspection, business forecasting, speech processing, credit
rating, adaptive process control, robotics control, natural language understanding,
![Page 15: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/15.jpg)
30
etc. Possible new application areas may include programming languages, user
friendly application interfaces, automaticized programming, computer networks,
database management, fault diagnostics and information security [59].
Fuzzy logic is mainly associated to imprecision, approximate reasoning and
computing with words, neuro-computing to learning and curve fitting
(classification), and probabilistic reasoning to uncertainty and belief propagation
(belief networks). These methods have similarity such as they are nonlinear, have
ability to deal with non-linearities, follow greater human-like reasoning paths
utilize self-learning, utilize yet-to-be-proven theorems and are robust in the
presence of noise or errors.
Similarities between fuzzy logic systems and neural networks [61] includes
estimate functions from sample data and are dynamic systems which can be
expressed as a graph made up of nodes and edges. This has ability to convert
numerical inputs to numerical outputs, process inexact information inexactly,
same state space, produce bounded signals, set of n neurons defines n-
dimensional fuzzy sets, learn unknown probability function p(x), act as associative
memories and can model any system provides the number of nodes sufficient. The
main dissimilarity between fuzzy logic system and neural network is that FLS
uses heuristic knowledge to form rules and tunes these rules by using sample data,
whereas Neural Network forms “rules” based entirely on data.
In many cases, better results have been achieved by combining different soft
computing methods (hybrid systems) which are growing rapidly. A very
interesting combination is the neuro-fuzzy architecture in which the good
properties of both methods have been bringing together. Mostly neuro-fuzzy
systems are fuzzy rule based systems in which neural networks techniques have
been used for rule induction and calibration. Fuzzy logic may also be employed to
improve the performance of optimization methods used with neural networks. For
example, it may control the vibration of direction for searching vector in quasi
Newton method [62].
2.7.1 Fuzzy Logic Fuzzy set theory provides a mathematical tool for dealing with the concepts used
in natural language (linguistic variables) [63]. Fuzzy Logic is basically a
![Page 16: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/16.jpg)
31
multivalued logic that allows intermediate values to be defined between
conventional evaluations. The story of fuzzy logic is very ancient. To devise a
concise theory of logic and later mathematics, Aristotle proposed so-called “Laws
of Thought”. One of these, the “Law of the Excluded Middle”, states that every
proposition must either be True (T) or False (F). Even when Parminedes proposed
the first version of this law (ca. 400 BC) there were strong and immediate
objections: for example, Heraclitus proposed that things could be simultaneously
true and not true. Plato laid the foundation for fuzzy logic indicating that there is a
third region (beyond T and F) where these opposites tumbled about. A systematic
alternative to the bi-valued logic of Aristotle was first proposed by Lukasiewicz
around 1920 by describing a three-valued logic along with the mathematics to
accompany it. The third value proposed can be translated as the term possible and
assign numeric value between T and F.
Later, four-valued logics, five-valued logics have been explored and agreed that in
principle there have been nothing to prevent the derivation of an infinite-valued
logic. £ukasiewicz felt that three and infinite valued logics are the most intriguing,
but ultimately settled on a four-valued logic because it seems to be easily
adaptable to Aristotelian logic. Knuth also proposed a three valued logic similar to
Lukasiewicz’s speculating that mathematics would become even more elegant
than in traditional bi-valued logic. The notion of an infinite-valued logic is also
evident in Zadeh’s seminal work “Fuzzy Sets” where mathematics of fuzzy set
theory and extension fuzzy logic has been explained. This theory proposed
making the membership function (or the values F and T) operate over the range of
real numbers (0, 1).
New operations for the calculus of logic have been proposed in principles which
are at least a generalization of classic logic. Fuzzy logic provides an inference
morphology that enables approximate human reasoning capabilities to be applied
to knowledge-based systems. The theory of fuzzy logic provides a mathematical
strength to capture the uncertainties associated with human cognitive processes
such as thinking and reasoning. The conventional approaches to knowledge
representation lack means for representating the meaning of fuzzy concepts. As a
consequence, the approaches based on first order logic and classical probability
theory does not provide an appropriate conceptual framework for dealing with the
![Page 17: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/17.jpg)
32
representation of commonsense knowledge. Such knowledge by its nature is both
lexically imprecise and noncategorical.
The development of fuzzy logic motivated large measure by generating need for a
conceptual framework which can address issue of uncertainty and lexical
imprecision. The essential characteristics of fuzzy logic relate to following [59]:
1. Exact reasoning is viewed as a limiting case of approximate reasoning.
2. Everything is a matter of degree.
3. Knowledge is interpreted a collection of elastic or equivalently, fuzzy
constraint on a collection of variables.
4. Inference is viewed as a process of propagation of elastic constraints.
There are two main characteristics of fuzzy systems that give better performance
for specific applications. Fuzzy systems are suitable for uncertain or approximate
reasoning, especially for the system with a mathematical model that is difficult to
derive. Fuzzy logic allows decision making with estimated values under
incomplete or uncertain information. In 1972 Zadeh’s colleague Kalman (the
inventor of Kalman filter) commented on the importance of fuzzy logic as
“Zadeh’s proposal could be severely, fericiously, even brutally criticized from a
technical point of view. This would be out of place here. But a blunt question
remains: Is Zadeh presenting important ideas or is he indulging in wishful
thinking?” [64]. Heaviest critique has been presented by probability theoreticians
and that is the reason why many fuzzy logic authors have included the comparison
between probability and fuzzy logic in their publications. Fuzzy researchers try to
separate fuzzy logic from probability theory, whereas some probability
theoreticians consider fuzzy logic a probability in disguise [63-64].
Claim: Probability theory is the only correct way of dealing with uncertainty and
anything can be done with fuzzy logic can be done equally well through the use of
probability-based methods. Therefore, fuzzy sets are unnecessary for representing
and reasoning about uncertainty and vagueness probability theory is all that is
required. “Close examination shows that the fuzzy approaches have exactly the
same representation as the corresponding probabilistic approach and include
similar calculi” [65].
Objection: Classical probability theory is not sufficient to express uncertainty
encountered in expert systems. The main limitation is that it is based on two-
![Page 18: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/18.jpg)
33
valued logic. An event either occurs or does not occur. There is nothing between
them. Another limitation is that in reality events are not known with sufficient
precision to be represented as real numbers. For example considers a case with
following given information: An urn contains 20 balls of various sizes, several of
which are large. One cannot express this within the framework of classical theory
or, if it can be done, it cannot be done simply [65-66].
Term fuzzy logic has two meanings. According to the first interpretation (in
narrow sense) it is seen as a multi-valued “imprecise” logic and as an extension to
the more traditional multi-valued logic. Bart Kosko explains this point of view by
emphasizing that in reality everything seems to occur or to be true to a degree.
Facts are always fuzzy, vague or inaccurate to some extent. Only mathematics has
black and white facts and it is only a collection of artificial rules and symbols.
Science deals with gray or fuzzy facts as if they have been black-and-white facts
of mathematics. Nobody has presented a fact having to do with the real world that
is 100 per cent true or 100 per cent false. The first meaning deals some kind of
model for human reasoning. The second interpretation (in wide sense) is that
fuzzy logic = fuzzy set theory. According to this view any field X can be fuzzified
by changing a set in X by a fuzzy set [67]. For example, set theory, arithmetic,
topology, graph theory, probability theory and logic can be fuzzified. This has
already been done in neurocomputing, pattern recognition, mathematical
programming and in stability theory.
If the conventional techniques of system analysis cannot be successfully
incorporated to the modeling or control problem, the use of heuristic linguistic
rules may be the most reasonable solution to the problem. For example, there is no
mathematical model for truck and trailer reversing problem, in which the truck
must be guided from an arbitrary initial position to a desired final position.
Humans and fuzzy systems can perform this nonlinear control task with relative
ease by using practical and at the same time imprecise rules as “If the trailer turns
slightly left, then turn the wheel slightly left”. The most significant application of
fuzzy logic has been in control field. It has been made a rough guess that 90% of
applications are in control. During last decade many applications have been found
in educational domain.
![Page 19: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/19.jpg)
34
2.7.1.1. Fuzzy Set A classical set is a set with crisp boundary. For example, a classical set A of real
numbers greater than 6 can be expressed as 퐴 = {푥 푥⁄ > 6}, where there is a clear,
unambiguous boundary 6 such that if x is greater than this number, then x, belongs
to the set A; otherwise x does not belong to the set. They do not reflect the nature
of human concepts and thoughts which trend to be abstract and imprecise [63]. In
contrast to classical set, fuzzy set is a set without a crisp boundary (i.e. the
transition from “belong to a set” to “not belong to a set”). This smooth transition
is characterized by membership functions that give fuzzy sets flexibility in
modeling commonly used linguistic expressions, such as “the water is hat” or “the
temperature is high”. Zadeh pointed out in 1965 in his seminal paper entitled
“Fuzzy Set”, such imprecisely defined sets or class plays an important role in
human thinking [56]. If X is a collection of objects denoted by x, then fuzzy set A
in X is defined as a set of ordered pairs [68]:
퐴 = {(푥,휇 (푥) 푥⁄ 휖 푋}, (2.11)
Where 휇 (푥) is called the membership function for the fuzzy set A. The
membership function maps each of X to a membership grade (or membership
value) between 0 and 1. Usually X is referred as the universe of discourse or
simply the universe which may consist of discrete objects or continuous space
[68].
2.7.1.2. Membership Formulation A fuzzy set is completely characterized by its membership function to begin with
define several of parameterized MFs of one dimension (i.e. MFs with a single
input). Generally, triangular, trapezoidal and Gaussian membership function has
been used for converting the crisp set into fuzzy set which are as follows [68]: Triangular Membership Function: A Triangular MF is specified by three
parameters [68] {a, b, c} as follows:
푇푟푖푎푛푔푙푒(푥; 푎,푏, 푐) = 푚푎푥 푚푖푛 , , 0 (2.12)
The parameters {a, b, c} (with a<b<c) determine the x coordinates of the three
corners of the underlying triangular MF.
![Page 20: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/20.jpg)
35
Fig. 2.1: Triangular Membership Function
Trapezoidal Membership Function: A trapezoidal MF is specified by four
parameters {a, b, c, d} as follows [68]:
푇푟푎푝푒푧표푖푑푎푙(푥;푎, 푏, 푐, 푑) = 푚푎푥 푚푖푛 , 1, , 0 (2.13)
The parameters {a, b, c, d} (with 푎 < 푏 ≤ 푐 < 푑) determine the x coordinates of
the four corners of the underlying trapezoidal MF.
Fig. 2.2: Trapezoidal Membership Function
Due to simple formulas and computational efficiency both triangular and
trapezoidal MFs have been used extensively, especially in real-time
implementations.
Gaussian Membership Function: A Gaussian MF is specified by two parameters
{푐,휎}:
![Page 21: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/21.jpg)
36
퐺푎푢푠푠푖푎푛(푥; 푐, 휎) = 푒 (2.14)
A Gaussian MF is determined completely by c and 휎; c represents the MF center
and 휎 determines the MF width.
Fig. 2.3: Gaussian Membership Function
2.7.1.3. Fuzzy Relation The fuzzy relation is fuzzy sets in 푋 × 푌 which map each element in 푋 × 푌 to
membership grade between 0 and 1. Applications of fuzzy relations include areas
such as fuzzy control and decision making. Let X and Y be two inverse of
discourse. Then fuzzy relation in 푋 × 푌 will be
푅 = (푥, 푦),휇 (푥,푦) /(휖 푋 × 푌) (2.15)
2.7.1.4. Max-Min Composition Assume 푅 and 푅 two fuzzy relations defined on 푋 × 푌 and 푌 × 푍, respectively.
The max-min composition of 푅 and 푅 is a fuzzy set defined by
푅 ∘ 푅 = (푥, 푧), max 푚푖푛 휇 (푥,푦),휇 (푦, 푧) /푥 휖 푋, 푦 휖 푍, 푧 휖 푍 (2.16)
2.7.1.5. Fuzzy IF-THEN Rules Linguistic variable is characterized by a quintuple (x, T(x), X, G, M) in which x is
the name of the variable; T(x) is the term of x-that is, the set of its linguistic
values; X is the universe of discourse; G is a syntactic rule which generates the
terms in T(x); and M is a semantic rule which associates with each linguistic value
![Page 22: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/22.jpg)
37
A its meaning M(A), where M(A) denotes a fuzzy set in X [68].A fuzzy if-then rule
assumes the form: if x is A then y is B, where A and B are linguistic values defined
by fuzzy sets on universe of discourse X and Y, respectively. Often “x is A” and is
called the antecedent or premise, while “y is B” which is called the consequence
or conclusion. Examples of fuzzy if-then rules are widespread in our daily
linguistic expression e.g. if pressure is high then volume is small [68].
Before employment of fuzzy if-then rules to model and analyze a system,
formalization of what is meant by expression “if x is A then y is B”, which is
sometimes abbreviated as 퐴 → 퐵. In essence, the expression describes a relation
between two variables x and y; this suggests that a fuzzy if-then rule may be
defined as a fuzzy relation R on the product space 푋 × 푌. Generally speaking,
there are two ways to interpret the fuzzy rule 퐴 → 퐵. If we interpret 퐴 → 퐵 as A
coupled with B, then
푅 = 퐴 → 퐵 = 퐴 × 퐵 = ∫ × 휇 (푥) ∗ 휇 (푦)/(푥, 푦) (2.17)
where ∗ is a T-norm operator and 퐴 → 퐵 is used again to represent the fuzzy
relation R. On the other hand, if 퐴 → 퐵 is interpreted as A entails B, then it can be
written as [68]:
푅 = 퐴 → 퐵 = 퐴 × 퐵 = ¬퐴 ∪ 퐵 (2.18)
2.7.1.6. Fuzzy Reasoning Fuzzy reasoning, also known as approximate reasoning, is an inference procedure
that derives conclusions from a set of fuzzy if-then rules and facts. Let 퐴, 퐴 and
B be fuzzy sets of X, X and Y, respectively, assume that the fuzzy implication
퐴 → 퐵 is expressed as a fuzzy relation R on 푋 × 푌 then the fuzzy B induced by “x
is A” and the fuzzy rule “if x is A then y is B” is defined by [68]:
휇 (푦) = 푚푎푥 푚푖푛[휇 (푥),휇 (푥, 푦)] (2.19)
Inference procedure of fuzzy reasoning can be used to derive conclusions,
provided that the fuzzy implication 퐴 → 퐵 is defined as an appropriate fuzzy
relation [68].
When multiple rules with multiple antecedents are used, the interpretation of
multiple rules is usually taken as the union of the fuzzy relation corresponding to
![Page 23: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/23.jpg)
38
the fuzzy rules. The multiple rules with multiple antecedents can be written as
[68]:
Fuzzy reasoning can be employed as an inference procedure to derive the resulting
output fuzzy set 퐶 (Fig. 2.4). Since the max-min composition operator 휊 is
distributive over the ∪ operator it follows that:
퐶 = (퐴 × 퐵 ) ∘ (푅 ∪ 푅 )
= [(퐴 × 퐵 ) ∘ 푅 ]∪ [(퐴 × 퐵 ) ∘ 푅 ] = 퐶 ∪ 퐶 (2.20)
where 퐶 and 퐶 are the inferred fuzzy set for rule 1 and rule 2, respectively.
Fuzzy if-then rules and fuzzy reasoning are the backbone of fuzzy inference
systems which are the most important modeling tool based on fuzzy set theory.
Fig. 2.4: Fuzzy Reasoning for Multiple Rules with Multiple Antecedents
2.7.1.7. Defuzzification The conversion of fuzzy output to crisp output is known as defuzzification. Four
methods of defuzzification are given below:
Premise 1 (fact): x is 퐴 and y is 퐵 Premise 2 (rule 1): x is 퐴 and y is 퐵 then z is 퐶 Premise 3 (rule 2): x is 퐴 and y is 퐵 then z is 퐶 Consequence (conclusion): z is 퐶
![Page 24: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/24.jpg)
39
1. Max Membership Principle: This is also known as the height method and
method is given by the algebraic expression:
휇 (푧∗) ≥ 휇 (푧),푓표푟 푎푙푙 푧 휖 푍 (2.21)
Where 푧∗ is the defuzzified value, as shown in Fig. 2.5.
Fig. 2.5: Max Membership Defuzzification Method
2. Centroid Method: This procedure (also called center of area or center of
gravity) is the most prevalent and physically appealing of all the
defuzzification methods [68]. It is given by the algebraic expression:
푧∗ = ∫ ( ).∫ ( )
(2.22)
Where ∫ denotes an algebraic integration as shown in Fig. 2.6.
Fig. 2.6: Centroid Defuzzification Method
3. Weighted Average Method: The weighted average method is most
frequently used in fuzzy applications because it is one of the greatest
computational efficient methods. Unfortunately, it is usually restricted to
symmetrical output membership functions. It is given by the algebraic
expression:
푧∗ = ∑ ( ̅). ̅∑ ( ̅)
(2.23)
![Page 25: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/25.jpg)
40
Where ∑ denote the algebraic sum and 푧̅ is the centroid of each symmetric
membership function (Fig. 2.7).
Fig. 2.7: Weighted Average Method of Defuzzification
4. Center of Sums: This is faster than many defuzzification methods and is not
restricted to symmetric membership functions. This process involves the
algebraic sum of individual output fuzzy set, say 퐶 and 퐶 , instead of their
union. Two drawbacks to this method are: interesting areas are added twice
and involvement of finding the centroid of the individual membership
functions. The defuzzified value 푧∗ is given by:
푧∗ =∑ ( ) ∫ ̅
∑ ( ) ∫ (2.24)
Where 푧̅ is the distance to the centroid of each of the respective membership
functions (Fig. 2.8).
5. Mean of Max Membership: This method (also called middle-of-maxima) is
closely related to the Max Membership principle except that the locations of
maximum membership can be non-unique (i.e., the maximum membership can
be a plateau rather than a single point). This method is given by the expression
[68]:
푧∗ = (2.25)
Where a and b are as defined in Fig. 2.9.
![Page 26: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/26.jpg)
41
Fig. 2.8: Center of Sum Defuzzification Method (a) First Membership
Function (b) Second Membership Function (c) Defuzzification Step
Fig. 2.9: Mean Max Membership Defuzzification Method
6. Center of Largest Area: If the output fuzzy set has at least two convex sub
regions, then the center of gravity (i.e., 푧∗ is calculated using the centroid
method) of the convex fuzzy sub-region with the largest area is used to obtain
![Page 27: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/27.jpg)
42
the defuzzified value 푧∗ of the output. This is shown graphically in Fig. 2.10,
and given algebraically as:
푧∗ = ∫ ( ).
∫ ( ) (2.26)
where 퐶 is the convex sub-region that has the largest area making up 퐶 .
Fig. 2.10: Center of Largest Area Method
2.7.1.8. Fuzzy Inference System The Fuzzy inference system is a popular computing framework based on concepts
of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. It has been
successfully applied in fields such as automatic control, data classification,
decision analysis, expert systems, and computer vision [69]. Due to its
multidisciplinary nature the fuzzy inference system is known variously such as
fuzzy rule-based system, fuzzy expert system, fuzzy model, fuzzy associative
memory, fuzzy logic controller and simply fuzzy system [70-71]. Basically, a
fuzzy inference system consists of three conceptual components: a rule base,
which contains a selection of fuzzy rules, a database or dictionary, which defines
the membership functions used in the fuzzy rules, and a reasoning mechanism,
which performs the inference procedure upon the rules and a given condition to
derive a reasonable output [70]. A fuzzy inference system is composed of five
functional blocks (Fig. 2.11):
1. A rule base containing number of fuzzy if-then rules.
![Page 28: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/28.jpg)
43
2. A database which defines the membership functions of the fuzzy sets used in
the fuzzy rules.
3. A decision-making unit which performs the inference operations on the rules.
4. A fuzzification interface which transforms the crisp inputs into degrees of
match with linguistic value.
5. A defuzzification interface which transform the fuzzy results of the inference
into a crisp output.
Fig. 2.11: Fuzzy Inference System
The rule base and the database are jointly referred as the knowledge base. The
steps of fuzzy reasoning performed by fuzzy inference systems are [72]:
1. Compare the input variables with the membership functions on the premise
part to obtain the membership values of each linguistic label, a step known as
fuzzification.
2. Combine the membership values through a specific T-norm operator, usually
multiplication or min on the premise part to get firing strength (weight) of
each rule.
3. Generate the qualified consequent either fuzzy or crisp of each rule depending
on the firing strength.
4. Aggregate the qualified consequents to produce a crisp output. This is final
step and is called defuzzification.
![Page 29: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/29.jpg)
44
Several types of fuzzy reasoning have been proposed in this chapter. Depending
on the types of fuzzy reasoning and fuzzy if-then rules employed, most fuzzy
inference systems can be classified into three types as shown in Fig. 2.12.
Fig. 2.12: IF-THEN Rules and Fuzzy Reasoning Mechanism
It is evident that Fig 2.12 utilizes a two-rule two-input fuzzy inference system to
show different types of fuzzy rules and fuzzy reasoning mentioned above. Most of
the differences come from the specification of the consequent part and thus the
defuzzification schemes are also different.
Type 1: Tsukamoto Fuzzy Model: The overall output is the weighted average of
each rule’s crisp output induced by the rule’s firing strength (the product or
minimum of the degrees of match with the premise part) and output membership
functions. The output membership functions used in this scheme must be
monotonic function, as shown in Fig. 2.13 [68]. Since, each rule refers a crisp
output the Tsukamoto fuzzy model aggregate each rule’s output by the method of
weighted average. Thus avoids the time-consuming process of defuzzification.
However, the Tsukamoto fuzzy model is not often used since it is less transparent
compared to Mamdani and Sugeno fuzzy model.
Type 2: Mamdani Fuzzy Model: The overall fuzzy output is derived by applying
max operation to the qualified fuzzy outputs (each equal to minimum of firing
![Page 30: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/30.jpg)
45
strength and the output membership function of each rule). A two-rule Mamdani
fuzzy inference system derives the overall output z when subjected to two crisp
inputs x and y (Fig 2.14). When max and algebraic products adapt as per choice
for the T-norm and T-conorm operators, respectively, and use max-product
composition instead of the original max-min composition then the resulting fuzzy
reasoning will be like Fig. 2.14.
Fig. 2.13: Tsukamoto Fuzzy Model
Various schemes based on fuzzy output (centroid of area, bisector of area, mean
of maxima, maximum criterion) have been proposed for final crisp output [68].
Fig. 2.14: Mamdani Fuzzy Model
![Page 31: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/31.jpg)
46
Type 3: Sugeno Fuzzy Model: The Sugeno fuzzy model (also known as TSK
fuzzy model) is an effort to develop systematic approach to generating fuzzy rules
from a given input-output data set. A typical fuzzy rule in Sugeno fuzzy model
has the form 푖푓 푥 푖푠 퐴 푎푛푑 푦 푖푠 퐵 푡ℎ푒푛 푧 = 푓(푥, 푦), where A and B are fuzzy sets
in the antecedent, while 푧 = 푓(푥,푦) is a crisp function in the consequent. Usually
푓(푥,푦) is a polynomial in the input variables x and y, but it can be any function as
long as it can appropriately describe the output of model within the fuzzy region
specified by the antecedent of the rule. When 푓(푥, 푦) is a fist order polynomial,
the resulting fuzzy inference system is called first-order Sugeno fuzzy model (Fig
2.15). If f is constant, then zero-order Sugeno fuzzy model can be viewed as a
special case of the Mamdani fuzzy inference system. The output of Takagi and
Sugeno’s if-then rules is a linear combination of input variables plus a constant
term, and the final output is the weighted average of each rule’s output [68].
Fig. 2.15: Sugeno Fuzzy Model
2.7.2. Neural Networks The study of neural networks started with the publication of McCulloch and Pitts
[73]. The single layer networks, with threshold activation functions are called
perceptrons, have been introduced by Rosenblatt [74]. In 1960s, experiment
showed that perceptrons could solve many problems. But many problems which
did not seem to be more difficult could not be solved. These limitations of one-
layer perceptron have been mathematically discussed in detail by Minsky and
Papert in the book Perceptron which resulted in less of neural networks
![Page 32: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/32.jpg)
47
interestingness for almost two decades. In the mid-1980s, back-propagation
algorithm proposed by Rumelhart, Hinton, and Williams [75], which revived the
study of neural networks and signify that multilayer networks could be trained by
using it.
Neural network makes an attempt to simulate human brain. The simulating is
based on the present knowledge of brain function, and this knowledge is even at
its best primitive. So, it is not absolutely wrong to claim that artificial neural
networks probably have no close relationship to operation of human brains. The
operation of brain is believed to be based on simple basic elements called neurons
which are connected to each other with transmission lines called axons and
receptive lines called dendrites (Fig. 2.16). The learning may be based on two
mechanisms: the creation of new connections, and the modification of
connections. Each neuron has an activation level which, in contrast to Boolean
logic, ranges between some minimum and maximum value.
Fig. 2.16: Biological and Artificial Neuron
In artificial neural networks the inputs of the neuron are combined in a linear way
with different weights. The result of this combination is then fed into a non-linear
activation unit (activation function), which can in its simplest form be a threshold
unit (see Fig. 2.10). Neural networks are often used to enhance and optimize fuzzy
logic based systems, e.g., by giving them a learning ability. This learning ability is
![Page 33: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/33.jpg)
48
achieved by presenting a training set of different examples to the network and
using learning algorithm which changes the weights (or the parameters of
activation functions) in such a way that the network will reproduce a correct
output with the correct input values. The difficulty is how to guarantee
generalization and to determine when the network is sufficiently trained.
Neural networks offer nonlinearity, input-output mapping, adaptivity and fault
tolerance. Nonlinearity is a desired property if the generator of input signal is
inherently nonlinear [76]. The high connectivity of the network ensures that the
influence of errors in a few terms will be minor, which ideally gives a high fault
tolerance. (Note that an ordinary sequential computation may be ruined by a
single bit error).
2.7.2.1. Adaptive Neural Networks Adaptive networks are unifying framework that subsumes almost all kinds of
neural networks paradigms with supervised and unsupervised learning
capabilities. The fundamentals of adaptive networks will be a key element in
underlying other various neural network paradigms such as multilayer
perceptrons.
An adaptive network is a network structure consisting of a number of nodes
connected through directional links. Each node represents a process unit and the
links between nodes specify the casual relationship between the connected nodes.
All or parts of the nodes are adaptive, which means the outputs of these nodes
depend on modifiable parameters pertaining to these nodes. The learning rule
specifies how these parameters should be updated to minimize a prescribed error
measure, which is a mathematical expression that measures the discrepancy
between the network’s actual output and a desired output. In other words, an
adaptive network is used for system identification and out task is to find
appropriate network architecture and a set of parameters which can best model an
unknown target system that is described by a set of input-output data pairs. The
basic learning rule of the adaptive network is the well-known steepest descent
method, in which the gradient vector is derived by successive invocations of the
chain rule. Another procedure is known as the backpropagation learning rule.
![Page 34: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/34.jpg)
49
Fig. 2.17: Feed forward Adaptive Neural Network
In adaptive network overall input-output behavior is determined by a collection of
modifiable parameters (Fig. 2.17). Specifically, the configuration of an adaptive
network is composed of a set of nodes connected by directed links, where each
node performs a static node function on its incoming signals to generate a single
node output and each link specifies the direction of signal flow from one node to
another. Usually, a node function in a parameterized function with modifiable
parameters; change in these parameters results in the change in node function as
well the overall behaviour of the adaptive network.
Assume that each node in an adaptive network performs a static mapping from its
inputs(s) to output. Namely, a node’s output depends on its current input only;
there are no dynamic or internal states in each node. Moreover, to facilitate the
development of learning algorithms assumption that all nodes functions are
differentiable except at a finite number of points. Mostly an adaptive network is
heterogeneous and each node may have a specific node function different from the
others. Links in an adaptive network are merely used to specify the propagation
direction of node outputs; generally there are no weights or parameters associated
with links. Fig. 2.17 is a typical adaptive network with two inputs and two
outputs.
The parameters of an adaptive network are distributed into its nodes, so each node
has a local parameter set. The union of these local parameter sets is the network’s
overall parameter set. If a node’s parameter set is not empty, then its node
function depends on the parameter values; we can use a square to represent this
![Page 35: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/35.jpg)
50
kind of adaptive node. On the other hand, if a node has a empty parameter set,
then its function is fixed; we use a circle to denote this of fixed node. Each
adaptive node can be decomposed into a fixed node plus one or several parameter
node. Adaptive networks are generally classified into two categories on the basis
of the type of connections they have: feed forward and recurrent. The adaptive
network shown in Fig. 2.17 is feed forward, since the output of each node
propagates from the input side (left) to the output side (right) unanimously. If
there is a feedback link that forms a circular path in network, then the network in
recurrent; Fig. 2.18 is an example [68].
Fig. 2.18: A Recurrent Adaptive Network
Conceptually, a feed forward adaptive network is actually a static mapping
between its input and output spaces. The mapping may be either a single linear
relationship or a highly nonlinear one, depending on the network structure (node
arrangement and connections and so on) and the functionality for each node. Here
our aim is to construct a network for achieving a desired nonlinear mapping that is
regulated by a data set consisting of desired input-output pairs of a target system
to be modeled. This data set is usually called the training data set and the
procedure we follow in adjusting the parameters to improve the network’s
performances are often referred to as the learning rule or adaption algorithms.
Usually a network’s performance is measured as the discrepancy between the
desired output and the network’s output under the same input conditions. This
discrepancy is called the error measure and it can assume different forms for
different applications. Generally speaking, a learning rule is derived by applying a
specific optimization technique to give error measure [68].
![Page 36: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/36.jpg)
51
2.7.2.2. Backpropagation for Feed Forward Networks This section introduces a basic learning rule for adaptive network, which is in
essence the simple steepest descent method. The central part of this learning rule
concerns to recursively obtain a gradient vector in which each element is defined
as the derivative of an error measure with respect to parameter [68]. This is done
by means of the chain rule, a basic formula for differentiating composite
functions. The procedure of finding a gradient vector in a network structure is
generally referred to as backpropagation because the gradient vector is calculated
in the direction opposite to flow of the output of each node. Once the gradient is
obtained, a number of derivative-based optimization and regression techniques are
available for updating the parameters. In particular, if we use the gradient vector
in a simple steepest descent method, the resulting learning paradigm is often
referred at as the backpropagation learning rule [68].
Suppose that a given feed forward adaptive network in the layered representation
has L layers and layer l (l=0, 1, …, L; l = 0 represents the input layer) has N(l)
nodes. Then the output and function of nodes I [i = 1,….., N(l)] in layer l can be
represents as xl,i and fl,i respectively, as shown in Fig. 2.19. Without loss of
generality, we assume that there are no jumping links. Since the output of a node
depends on the incoming signals and the parameter set of the node, we have the
following general expression for the node function 푓 . [68]:
푥 , = 푓 , 푥 , …...,,푥 , ( ),훼,훽,훾, … (2.27)
where 훼,훽,훾, etc. are the parameters of this node. Assuming that the given
training data set has P entries, can define an error measure for the pth (1 ≤ 푝 ≤ 푃)
entry of the training data set as the sum of squared errors [68]:
퐸 = ∑ 푑 − 푥 ,( ) (2.28)
where 푑 is the kth component of the pth desired output vector and 푥 , is the kth
component of the actual output vector produced by presenting the pth input vector
to the network. Obviously, when 퐸 is equal to zero, the network is able to
reproduce exactly the desired output vector in the pth training data pair. Thus our
task here is to minimize an output error measure, which is defined as 퐸 =
∑ 퐸 .
![Page 37: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/37.jpg)
52
Fig. 2.19: Layered Representation of Adaptive Feed Forward Network
To use steepest descent to minimize the error measure, first to obtain the gradient
vector. Before calculating the gradient vector, we should observe the following
relationships:
where the arrows ⇒ indicate casual relationships. In other words, a small change
in a parameter 훼 will affect the output of the node containing 훼; this in turn will
affect the output of the final layer and thus the error measure. Therefore, the basic
concept in calculating the gradient vector is to pass a form of derivative
information starting from the output layer and going backward layer until the
input layer is reached [68]. To facilitate the discussion, we define the error signal
휖 , as the derivative of the error measure 퐸 with respect to the output of node i
layer l, taking both direct and indirect paths into consideration. In symbols,
휖 , =,
(2.29)
This expression was called the ordered derivative. The difference between the
ordered derivative and the ordinary partial derivative lies in the way we view the
function to be differentiated [68].
2.7.2.3. Adaptive Neuro-Fuzzy Inference System The architectures and learning rules of adaptive networks have been described the
previous section. Functionally there is almost no constraint on the node functions
Change in parameter 훼
Change in outputs of nodes
containing 훼
Change in network’s
outputs
Change in error measure
![Page 38: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/38.jpg)
53
of an adaptive network except the requirement of piecewise differentiability.
Structurally, the only limitation on the network configuration is that it should be
of the feed forward type if we do not want to use the more complex
asynchronously operated model. Because of these minimum restrictions, adaptive
networks can be employed directly in a wide variety of applications of modeling,
decision making, signal processing and control [68].
In the present research work, an attempt has been made to propose a class of
adaptive networks having functional equivalence to fuzzy inference system. The
proposed architecture referred to as ANFIS, stands for adaptive network-based
fuzzy inference system or semantically equivalence to adaptive neuro fuzzy
inference system. Description have been make work out how to decompose the
parameter set to facilitate the hybrid learning rule for ANFIS architectures
representing both the Sugeno and Tsukamoto fuzzy models.
ANFIS Architecture: For simplicity, given that the fuzzy inference system under
consideration has two inputs x and y and one output z. For a first-order Sugeno
fuzzy model, a common rule set with two fuzzy if-then rules is as:
Rule 1: If x is 퐴 and y is 퐵 then 푓 = 푝 푥 + 푞 푦 + 푟
Rule 2: If x is 퐴 and y is 퐵 then 푓 = 푝 푥 + 푞 푦 + 푟
Fig. 2.20: Two-Input First-Order Sugeno Fuzzy Model with Two Rules
Fig. 2.20 represent the reasoning mechanism for Sugeno fuzzy model and the
corresponding ANFIS architecture is as shown in Fig. 2.21, where nodes of the
same layer have similar functions as described next. (The output of the ith node in
layer l denoted as 푂 , ):
Layer 1: Every node i in this layer is an adaptive node with a node function
![Page 39: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/39.jpg)
54
푂 , = 휇 (푥), 푓표푟 푖 = 1, 2, 표푟
푂 , = 휇 (푦), 푓표푟 푖 = 3, 4 (2.30)
Where x (or y) is the input to node i and 퐴 (or 퐵 ) is a linguistic label
associated with this node. In other words, 푂 , is the membership grade of a fuzzy
set A and it specifies the degree to which the given input x (or y) satisfies the
quantifier A. Here the membership function for A can be any parameterized
membership function such as Gaussian membership function. Parameters in this
layer are referred to as premise parameters [68].
Fig. 2.21: ANFIS Architecture
Layer 2: Every node in this layer is a fixed node labeled ∏, whose output is the
product of all the incoming signals:
푂 , = 푤 = 휇 (푥)휇 (푦), 푖 = 1,2 (2.31)
Each node output represents the firing strength of a rule. In general, any other T-
norm operators that perform fuzzy AND can be used as the node function in this
layer.
Layer 3: Every node in this layer is fixed node labeled N. The ith node calculates
the ratio of the ith rule’s firing strength to the sum of all rule’s firing strengths:
푂 , = 푤 = , 푖 = 1,2 (2.32)
For convenience, outputs of this layer are called normalized firing strengths.
![Page 40: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/40.jpg)
55
Layer 4: Every node i in this layer is an adaptive node with a node function:
푂 , = 푤 푓 = 푤 (푝 푥 + 푞 푦 + 푟 ) (2.33)
where 푤 is a normalized firing strength from layer 3 and {푝 , 푞 , 푟 } is the
parameter set of this node. Parameters in this layer are referred as consequent
parameters.
Layer 5: The single node in this layer is a fixed node labeled ∑, which computes
the overall output as the summation of all incoming signals:
표푣푒푟푎푙푙 표푢푡푝푢푡 = 푂 , = ∑푤 푓 = ∑∑
(2.34)
Thus, an adaptive network has been constructed having functional equivalence to
Sugeno fuzzy model. For the Mamdani fuzzy model inference system with max-
min composition, a corresponding ANFIS can be constructed if discrete
approximations used to replace the integrals in the centroid defuzzification
method [68].
2.7.2.4. Hybrid Learning Algorithm Although applicable backpropagation or steepest descent learning to identify the
parameters in an adaptive network can be made, this simple optimization method
usually takes long time before it convergence. It may be noted that an adaptive
network’s output is linear the network’s parameters; identification of these linear
parameters can be made by the linear least-square method. This approach leads to
a hybrid learning rule which combines steepest descent and the least-square
estimator for fast identification of parameters [68].
In the ANFIS architecture (Fig. 2.21) when values of premise parameters are
fixed, the overall output can be expressed as a linear combination of the
consequent parameters. In symbols, the output f can be written as [68]
푓 = 푓 + 푓
= 푤 (푝 푥 + 푞 푦+ 푟 ) + 푤 (푝 푥 + 푞 푦 + 푟 ) (2.35)
= (푤 푥)푝 + (푤 푦)푞 + (푤 )푟 + (푤 푥)푝 + (푤 푦)푝 + (푤 )푟
This is linear in the consequent parameters 푝 ,푞 , 푟 ,푝 , 푞 , 푎푛푑 푟 .
![Page 41: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/41.jpg)
56
In the forward pass of the hybrid learning algorithm, node outputs go forward
until layer 4 and the consequent parameters are identified by the least-squares
method. In backward pass, the error signals propagate backward and the premise
parameters are updated by gradient descent. Accordingly, the hybrid approach
converges much faster since it reduces the search space dimensions of the original
pure backpropagation method [68].
2.7.3. Probabilistic Reasoning Probabilistic reasoning includes genetic algorithms, belief networks, chaotic
systems and parts of learning theory [77]. Although, the present thesis emphasis is
on fuzzy logic systems and neural networks, probabilistic reasoning is included
consistency. Similar to fuzzy set theory, the probability theory deals with the
uncertainty but usually the type of uncertainty is different. Stochastic uncertainty
deals with the uncertainty toward the occurrence of certain event and this
uncertainty is quantified by the degree of probability. Probability statements can
be combined with other statements using stochastic methods. Most popular is the
Bayesian calculus of conditional probability.
2.7.3.1. Probability Random events are used to model uncertainty which is measured by probabilities.
A random event E is defined as a crisp subset of a sample space U. The
probability of 퐸,푃(퐸) ∈ (0,1), is the proportion of occurrence of E. The
probability is supposed to fulfill the axioms of Kolmogorov [68]:
1. 푃(퐸) ≥ 0, ∀ 퐸 ⊂ 푈
2. 푃(푈) = 1
3. If 퐸 are distinct sets, then 푃(∪ 퐸 ) = ∑ 푃(퐸 ).
Example of classical probability is the throwing of the dice. Let U be the set of
integers {1, 2, 3, 4, 5, 6}. An event such as 퐸 = 6 has a probability (퐸 = 6) = .
The state of the art with respect to soft computing techniques and the relevant
work in this area has been classified, not only by the type of data and soft
computing techniques used, but more importantly by the type of educational task
that they resolve. Soft computing techniques has been introduced as an emerging
![Page 42: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/42.jpg)
57
research area related to several well-established areas of research including e-
learning, adaptive hypermedia, intelligent tutoring systems, web mining, data
mining, etc. Also describe are interesting future research lines. It is necessary for
researchers to develop more unified and collaborative studies instead of current
plethora of multiple individual proposals and lines. Thus, complete integration of
soft computing techniques in the educational environment may become a reality
and fully operative implementations (both commercial and free).
REFERENCES [1]. Draper, N.R., and H. Smith. Applied Regression Analysis. John Wiley &
Sons, (1998). [2]. Espejo, P., S. Ventura, and F. Herrera. “A Survey on the Application of
Genetic Programming to Classification.” IEEE Transactions on Systems, Man, and Cybernetics-Part C, 40, no. 2 (2010): 121-144.
[3]. Romero, C., S. Ventura, C. Hervás, and P. Gonzales. “Data Mining Algorithms to Classify Students.” In Proceeding of International Conference on Educational Data Mining, Montreal. Canada, 2008, 8-17.
[4]. Ibrahim, Z., and D. Rusli. “Predicting Students’ Academic Performance: Comparing Artificial Neural Network, Decision Tree and Linear Regression.” Annual SAS Malaysia Forum, Kuala Lumpur, 2007, 1-6.
[5]. Delgado, M., E. Gibaja, M.C. Pegalajar, and Q. Pérez. “Predicting Students' Marks from. Moodle Logs using Neural Network Models.” In Proceeding of International Conference on Current Developments in Technology-Assisted Education. Sevilla, Spain, 2006, 586-590.
[6]. Oladokun, V.O., A.T. Adebanjo, and O.E. Charles-Owaba (2008). “Predicting Student’s Academic Performance using Artificial Neural Network: A Case Study of an Engineering Course.” Pacific Journal of Science and Technology, 9, no. 1 (2008): 72-79.
[7]. Pardos, Z., N. Heffernan, B. Anderson, and C. Heffernan. “The Effect of Model Granularity on Student Performance Prediction Using Bayesian Networks.” In Proceeding of International Conference on User Modeling. Corfu, Greece, 2007, 435-439.
[8]. Ogor, E.N. “Student Academic Performance Monitoring and Evaluation Using Data Mining Techniques.” In Proceeding of Electronics, Robotics and Automotive Mechanics Conference. Washington, DC, 2007, 354-359.
[9]. Lakshmi, T.M., A. Martin, and V.P. Venkatesan. “An Analysis of Students Performance Using Genetic Algorithm.” Journal of Computer Sciences and Applications, 1, no. 4 (2013): 75-79.
[10]. Zafra, A., and S. Ventura. “Predicting Student Grades in Learning Management Systems with Multiple Instance Programming.” In
![Page 43: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/43.jpg)
58
Proceeding of International Conference on Educational Data Mining. Cordoba, Spain, 2009, 307-314.
[11]. Nugent, R., E. Ayers, and N. Dean. “Conditional Subspace Clustering of Skill Mastery: Identifying Skills that Separate Students.” In Proceeding of International Conference on Educational Data Mining, Cordoba. Spain, 2009, 101-110.
[12]. Thomas, E.H., and N. Galambos. “What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis.” Journal of Research in Higher Education, 45, no. 3 (2004): 251-269.
[13]. Cetintas, A., L. Si, Y.P. Xin, and C. Hord. “Predicting Correctness of Problem Solving from Low-Level log Data in Intelligent Tutoring Systems.” In Proceeding of International Conference on Educational Data Mining, Cordoba, Spain, 2009, 230-238.
[14]. Mcdonald, B. “Predicting Student Success.” International Journal for Mathematics Teaching and Learning, 5, (2004): 1-14.
[15]. Sapre, R.G, and S. Surve. “Fuzzy Mathematical Approach for Performance Evaluation of a Student.” International Journal of Fuzzy Mathematics and Systems, 2, no. 2 (2012): 191-198.
[16]. Torre, G.L. “Implementation of Student Performance Evaluation System Using FIS in MATLAB.” International Journal of Engineering Universe for Scientific Research and Management, 3, no. 2 (2011): 1-6.
[17]. Hota H.S., S. Pavani, and P.V.S.S. Gangadhar. “Evaluating Teachers Ranking Using Fuzzy AHP Technique.” International Journal of Soft Computing and Engineering, 2, no. 6 (2013): 485-488.
[18]. Ajiboye, A.R., R.A. Arshah, and H. Qin. “Risk Status Prediction and Modelling of Students’ Academic Achievement: A Fuzzy Logic Approach.” International Journal of Engineering and Science, 3, no. 11 (2013): 07-14.
[19]. Saxena, U.R., and S.P. Singh. “Integrating Neuro-Fuzzy Systems to Develop Intelligent Planning Systems for Predicting Students’ Performance.” International Journal of Evaluation and Research in Education, 1, no. 2 (2012): 61-66.
[20]. Oladipupo, O.O., O.J. Oyelade, and D.O. Aborisade. “Application of Fuzzy Association Rule Mining for Analyzing Students Academic Performance.” International Journal of Computer Science, no. 3 (2012): 216-223.
[21]. Iraji, M.S. “Students Classification with Adaptive Neuro Fuzzy.” International Journal of Modern Education and Computer Science, 7, (2012): 42-49.
[22]. Amershi, S., and C. Conati. “Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning
![Page 44: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/44.jpg)
59
Environments.” Journal of Educational Data Mining, 1, no. 1 (2009): 18-71.
[23]. Gaudioso, E., M. Montero, L. Talavera, and F. Hernandez-Del-Olmo. “Supporting Teachers in Collaborative Student Modeling: A Framework and Implementation.” International Journal of Expert System with Applications, 36, (2009): 2260-2265.
[24]. Mussoab, M. F., E. Kyndtac, E. C. Cascallarad, and F. Dochya. “Predicting General Academic Performance and Identifying the Differential Contribution of Participating Variables Using Artificial Neural Networks.” Journal of Frontline Learning Research, 1, no. 1 (2013): 42-71.
[25]. Naser, S.S.A. “Predicting Learners Performance Using Artificial Neural Networks in Linear Programming Intelligent Tutoring System.” International Journal of Artificial Intelligence and Applications, 3, no. 2 (2012): 65-73.
[26]. Do, Q. H., and J.-F. Chen. “A Comparative Study of Hierarchical ANFIS and ANN in Predicting Student Academic Performance.” WSEAS Transactions on Information Science and Applications, 10, no. 12 (2013): 396-405.
[27]. Borkar, S., and K. Rajeswari. “Attributes Selection for Predicting Students’ Academic Performance using Education Data Mining and Artificial Neural Network.” International Journal of Computer Applications, 86, no.10 (2014):25-29.
[28]. Osmanbegović, E., and M. Suljić. “Data Mining Approach for Predicting Student Performance.” Journal of Economics and Business, 10, no. 1 (2012): 3-12.
[29]. Basha, S.K. A. H., Y.R. R. Kumar, A. Govardhan, and M. Z. Ahmed. “Predicting Student Academic Performance Using Temporal Association Mining.” International Journal of Information Science and Education, 2, no. 1 (2012): 21-41.
[30]. Ahmed, A.B.E.D., and I.S. Elaraby. “Data Mining: A prediction for Student's Performance Using Classification Method.” World Journal of Computer Application and Technology, 2, no. 2 (2014): 43-47.
[31]. Borkar, S., and K. Rajeswari. “Predicting Students Academic Performance Using Education Data Mining.” International Journal of Computer Science and Mobile Computing, 2, no. 7 (2013): 273 – 279.
[32]. Matsuda, N., W. Cohen, J. Sewall, G. Lacerda, and K. R. Koedinger. “Predicting Student’s Performance with SimStudent that Learns Cognitive Skills from Observation.” In Proceeding of International Conference on Artificial Intelligence in Education. Amsterdam, Netherlands, 2007, 467-476.
![Page 45: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/45.jpg)
60
[33]. Lee, C.S. “Diagnostic, Predictive and Compositional Modeling with Data Mining in Integrated Learning Environments.” International Journal of Computer and Education, 49, (2007): 562-580.
[34]. Lykourentzou, I., I. Giannoukos, V. Nikolopoulos, G. Mpardis, and V. Loumos. “Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques.” International Journal of Computer and Education, 53, no. 3 (2009): 950-965.
[35]. Lee, M.W., S.Y. Chen, K. Chrysostomou, X. Liu. “Mining Student’s Behavior in Web-based Learning Programs.” International Journal of Expert System with Applications, 36, (2009): 3459-3464.
[36]. Chen, G., C. Liu, K. Ou, and B. Liu. “Discovering Decision Knowledge from Web log Portfolio for Managing Classroom Processes by Applying Decision Tree and Data Cube Technology.” Journal of Educational Computing Research, 23, no. 3 (2000): 305–332.
[37]. Yu. C.H., S. Digangi, A.K. Jannasch-Pennell, C. Kaprolet. “Profiling Students who take Online Courses using Data Mining Methods.” Online Journal of Distance Learning Administration, 1. 11, no. 2 (2008): 1-14.
[38]. Chang, Y.C, W.Y. Kao, C.P. Chu, and C.H. Chiu. “A Learning Style Classification Mechanism for E-Learning.” International Journal of Computer and Education, 53, no. 2, (2009): 273-285.
[39]. Superby, J.F., J.P. Vandamme, and N. Meskens. “Determination of Factors Influencing Achievement of the First-Year University Students Using Data Mining Methods.” In Proceeding of International Conference on Intelligent Tutoring Systems and Workshop on Educational Data Mining. Taiwan, 2006, 1-8.
[40]. Yadav, S.K, B. Bharadwaj, and S. Pal. “Data Mining Applications: A Comparative Study for Predicting Student’s Performance.” International Journal of Innovative Technology and Creative Engineering, 1, no. 12, (2013): 13-19.
[41]. Venkatesan, N., and N. Chandru. “Student's Performance Measuring using Assistant Algorithm.” International Journal of Soft Computing and Engineering, 3, no. 5 (2013): 216-222.
[42]. Burlak, G., J. Munoz, A. Ochoa, and J.A. Hernández (2006). “Detecting Cheats in Online Student Assessments Using Data Mining.” In Proceeding of International Conference on Data Mining. Las Vegas, 2006, 204-210.
[43]. Uneo, M, and K. Nagaoka. “Learning Log Database and Data Mining System for E-Learning–On Line Statistical Outlier Detection of Irregular Learning Processes.” In Proceeding of International Conference on Advanced Learning Technologies. Tatarstan, Russia, 2002, 436-438.
[44]. Romesburg, H.C. Cluster Analysis for Researchers. Krieger Pub, (2004).
![Page 46: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/46.jpg)
61
[45]. Ayers, E., R. Nugent, and N. Dean. “A Comparison of Student Skill Knowledge Estimates.” In Proceeding of International Conference on Educational Data Mining. Cordoba, Spain, 2009, 1-10.
[46]. Tang, T. Y., and G. Mccalla. “Student Modeling for a Web-Based Learning Environment: A Data Mining Approach.” In Proceeding of Conference on Artificial Intelligence, Edmonton, Canada, 2002, 967-968.
[47]. Zakrzewska, D. “Cluster Analysis for User’s modeling in Intelligent E-Learning Systems.” In Proceeding of International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Poland, 2008, 209-214.
[48]. Hamalainen, W., J. Suhonen, E. Sutinen, and H. Toivonen. “Data Mining in Personalizing Distance Education Courses.” In Proceeding of World Conference on Open Learning and Distance Education. Hong Kong, 2004, 1-11.
[49]. Zhang, K., L. Cui, H. Wang, and Q. Sui (2007). “An Improvement of Matrix-Based Clustering Method for Grouping Learners in E-Learning.” In Proceeding of International Conference on Computer Supported Cooperative Work Design, Melbourne, Australia, 2007, 1010-1015.
[50]. Tian, F., S. Wang, C. Zheng, and Q. Zheng (2008). “Research on E-Learning Personality Group Based on Fuzzy Clustering Analysis.” In Proceeding of International Conference on Computer Supported Cooperative Work in Design. China, 2008, 1035- 1040.
[51]. Perera, D., J. Kay, I. Koprinska, K. Yacef, and O.R. Zaiane. “Clustering and Sequential Pattern Mining of Online Collaborative Learning Data.” IEEE Transaction on Knowledge and Data Engineering, 21, no. 6 (2009): 759-772.
[52]. Hwang, W.Y., C.B. Chang, and G.J. Chen. “The Relationship of Learning Traits, Motivation and Performance-Learning Response Dynamics.” International Journal of Computer and Education, 42, (2004): 267-287.
[53]. Hardof-jaffe, S., A. Hershkovitz, H. Abu-kishk, O. Bergman, R. Nachmias (2009). How Do Students Organize Personal Information Spaces?” In Proceeding of International Conference on Educational Data Mining. Cordoba, Spain, 2009, 250-258.
[54]. Zukhri, Z., and K. Omar (2007). “Solving New Student Allocation Problem with Genetic Algorithms: A Hard Problem for Partition Based Approach.” Journal of Zhejiang University, 2, (2007): 1-9.
[55]. Rasmani, K.A., and Q. Shen. “Data-driven Fuzzy Rule Generation and its Application for Student Academic Performance Evaluation”, Applied Intelligence, 25, (2006): 305–319.
[56]. Biswas, R. “An Application of Fuzzy Sets in Students’ Evaluation.” Fuzzy sets and System, 74, no. 2 (1995): 187-194.
![Page 47: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/47.jpg)
62
[57]. Law, C.K. “Using Fuzzy Numbers in Educational Grading System”, Fuzzy Sets and System, 83, (1996): 311–323.
[58]. Chen, S.M., and C.H. Lee. “New Methods for Students’ Evaluation Using Fuzzy Sets.” Fuzzy Sets System, 104, (1999): 209–218.
[59]. Zadeh, L. A. Fuzzy Logic: Advanced Concepts and Structures. IEEE, Piscataway, New York, (1992).
[60]. Kirkpatrick and Wheeler (1992). Physic. A World View, Saunders, (1992). [61]. Kosko, B. “Fuzzy Systems as Universal Approximators.” In Proceeding of
First IEEE Conference on Fuzzy Systems. San Diego, March, 1992, 1153-1162.
[62]. Kawarada, H., and H. Suito (1996). Fuzzy Optimization Method. Institute of Computational Fluid Dynamics, Chiba University, Japan, (1996).
[63]. Zadeh, L. A. “Fuzzy Sets.” Information and Control, 8, no. 3 (1965): 338- 354.
[64]. Zadeh, L. A. “A New Approach to System Analysis.” Man and Computer, Amsterdam: North-Holland, (1972): 55-94.
[65]. Cheeseman, P. Probabilistic versus Fuzzy Reasoning. Uncertainty in Artificial Intelligence. Elsevier Science Publishers, Amsterdam, Netherlands, (1986).
[66]. Kosko, B. Fuzzy Thinking: The New Science of Fuzzy Logic. Art House, (1993).
[67]. Zadeh, L. A. “Soft Computing and Fuzzy Logic.” IEEE Software, 11, No. 6, (1994): 48-56.
[68]. Jang, J.-S.R., C.-T. Sun, and E. Mizutani. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. United State of America. Prentic Hall, (1997).
[69]. Jang, J,-S.R., and C.-T. Sun. “Neuro-Fuzzy Modeling and Control.” The Proceeding of the IEEE. 83, no. 3 (1995): 378-406.
[70]. Takagi, T., and M. Sugeno. “Fuzzy Identification of Systems and Its Application to Modeling and Control.” IEEE Transactions on Systems, Man, and Cybernetics, 15, (1985):116-132.
[71]. Kosko, B. Neural Networks and Fuzzy Systems: A Dynamical Systems Approach. Eaglewood Cliffs, N.J.: Prentice Hall, (1991).
[72]. Jang, J.-S.R. “ANFIS: Adaptive Network-based Inference System.” IEEE Transactions on Systems, Man and Cybernetics, 23, no. 3 (1993): 665-685.
[73]. McCulloch, W.S., and W. Pitts. (1943). “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics, 5, (1943): 115-133.
[74]. Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the theory of Brain Mechanism. Washington DC: Spartan, (1962).
![Page 48: CHAPTER 2 LITERATURE REVIEW - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/88403/11/11_chapter 2.p… · multiple conflicting criteria of the teachers [17]. Model using fuzzy](https://reader036.vdocuments.net/reader036/viewer/2022062603/5f0b573e7e708231d43008fa/html5/thumbnails/48.jpg)
63
[75]. Rumelhart, D. E., G.E. Hinton, and R.J. Williams. “Learning Internal Representations by Error Propagation.” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, (1986): 318-362.
[76]. Haykin, S. Neural Networks-A Comprehensive Foundation. Macmillan College Publishing Company, New York, (1994).
[77]. Kosko, B. “Fuzziness vs. Probability.” International Journal of General Systems, 17, no. 2-3 (1990): 211-240.