chapter 2 literature review -...

16

CHAPTER 2

LITERATURE REVIEW

2.1. INTRODUCTION This chapter provides an in-depth review of methods and state of arts related to

student’s academic performance evaluation for the development of better

understanding. To begin with predicting student performance, student modeling,

detecting undesirable student behaviors and grouping of students has been

elaborated. Apart from this, selected methods used for student’s academic

performance evaluation and soft computing techniques have been described in

detail.

2.2. PREDICTING STUDENT PERFORMANCE Prediction of student performance aims to estimate the unknown variable value

describing the student performance, knowledge, score or mark. This value can be

numerical/continuous value (regression task) or categorical/discrete value

(classification task). Regression analysis determines the relationship between

dependent and independent variables [1]. In classification procedure individual

items classified into various groups on the basis of quantitative characteristics

inherent therein and training set of previously labeled items [2]. Prediction of

student’s performance is one of the newest and most popular applications of soft

computing technique in education. Different techniques and models have been

applied (fuzzy logic, neural networks, Bayesian networks, rule-based systems,

regression and correlation analysis) for this purpose.

A comparison of machine learning methods has been carried out to predict success

in a course (either passed or failed) in Intelligent Tutoring Systems. Other

comparisons of different data mining algorithms are made to classify students

(predict final marks) based on Moodle usage data [3]; to predict student

performance/final grade on the basis of features extracted from logged data [4].

Neural network models based on back-propagation and feed forward neural

networks have been frequently used to predict final student grades. To predict the

number of errors, students have been assumed by make using feed-forward and

17

backpropagation. However, to predict performance from test scores generally

backpropagation and counter-propagation techniques have been used. To predict

students’ pass or fail from Moodle logs (using radial basis functions) is widely

used [5] have ability to predict performance of a candidate for admission

eligibility into the university (multilayer perceptron topology) [6]. Within a

tutoring system Bayesian networks have been used to predict student applicant

performance to assess knowledge and performance [7]. Various rule-based systems have been applied to predict student performance

(mark prediction): monitoring and evaluation of student academic performance by

using rule induction [8]; final grades extracted from logged data in an education

web-based system (using genetic algorithm to find association rules) [9]; grades in

learning management systems (using grammar guided genetic programming) and

student performance and provide timely lessons in web-based e-learning systems

(using decision tree) [10].

Regression techniques (using model trees, neural networks, linear regression,

locally weighed linear regression and support vector machines) have been used to

predict student’s marks in an open university. Student performance from log and

test scores in web-based instruction has been assessed by using multivariable

regression model. Prediction of student academic performance (using stepwise

linear regression) identification variables that could predict success in colleges

courses (using multiple regression) [11], university students’ satisfaction (using

regression and decision trees analysis) [12]; determination of time a student will

get a question correct and association rules to guide a search process to find

transfer models to predict a student’s success (using logistic regression) are

available. To predict of probability of giving the correct answer to a problem in an

ITS (using a robust Ridge regression algorithm) [13] have been also assumed.

Correlation analyses have also been applied together to predict web-student

performance in on-line classes, exam score in online tutoring for predicting

probabilities high school students’ success in university [14].

Fuzzy mathematical modeling technique provides a solution in area of

performance measurement techniques and its evaluation. An effective

performance evaluation system can play a crucial role in an organization's efforts

to gain competitive advantage like motivating peak individual performance and

18

improving quality of a student [15]. A student performance evaluation system

using fuzzy interference, FIS tool in MATLAB is used for building a Mamdani

fuzzy inference system using the inferences. The integration of such fuzzy

knowledge requires a methodology for converting fuzzy data into crisp data for a

quantitative analysis [16]. Fuzzy Analytic Hierarchy Process (FAHP) is frequently

used multi criteria decision making technique used to find ranking of students and

teachers. The quality of teacher is fuzzy in nature, hence FAHP approach can deal

better with this situation and finally decide ranking of the teachers based on the

multiple conflicting criteria of the teachers [17]. Model using fuzzy logic

approach to predict the risk status of students based on some predictive factors is

proposed. Some basic information that has correlations with students’ academic

achievement and other predictive variables have been modeled, the simulated

model shows some degree of risk associated with the past academic achievement

[18].

Saxena and Singh have presented a simulation of Neuro-Fuzzy application for

analyzing students’ performance based on their CPA and GPA which attempt for

extension of Analysis on Student’s Performance using Fuzzy Systems [19]. Fuzzy

Association Rule Mining (FARM) showed potential for identification of the

hidden relationships that exist between students’ pre-admission profile and

academic performance [20]. The advantage of Neural Networks is its learning

capability to adapt new data. On the other hand, Fuzzy Systems has the capability

to handle numerical data and linguistic knowledge simultaneously [21].

2.3. STUDENT MODELING The objective of student modeling is to develop cognitive models of human

users/students including their skills and declarative knowledge. Data mining has

been applied automatically for traits like motivation, satisfaction, learning styles,

affective status, etc. and learning behavior in order to automate the construction of

student models. This goal may be achieved via DM techniques and algorithms

(mainly, Bayesian networks). Several data mining algorithms (Naïve Bayes,

Bayes net, support vector machines, logistic regression and decision trees) have

been compared to detect student mental models in intelligent tutoring systems

[22]. Unsupervised (clustering) and supervised (classification) machine learning

19

have been proposed to reduce development costs in building user models and to

facilitate transferability in intelligent learning environments. Bayesian networks

have been used to predict about student knowledge i.e. probability of student

skilled through cognitive tutors and students’ learning status in web-based

education system [23].

The use of cognitive and non-cognitive measures of students along with

background information to design predictive models of student performance by

using artificial neural networks (ANN) is available. These predictions constitute a

true predictive classification of academic performance anticipate one year advance

the actual academic performance [24]. Artificial Neural Networks and expert

systems to obtain knowledge for the learner model in the Linear Programming

Intelligent Tutoring System (LP-ITS) is able to determine the academic

performance level of the learners in order to offer the proper difficulty level of

linear programming. LP-ITS have been used Feed forward Back-propagation

algorithm to be trained with a group of learner’s data to predict their academic

performance [25]. The accurate prediction of student academic performance is of

prime importance for making admission decisions as well as providing better

educational services. Two models, the hierarchical (Adaptive Neuro-Fuzzy

Inference System) ANFIS and (Artificial neural network) ANN have been

proposed to predict student’s academic performance [26]. The student’s

performance has been evaluated based on selected attributes which generate rules

by means of association rule mining. Artificial neural network checks accuracy of

the results [27].

Three supervised data mining algorithms (Naive Bayes, Multilayer Perceptron and

decision tree-J48) have been applied on the preoperative assessment data to

predict success in a course (either passed or failed) and the performance of the

learning methods based on their predictive accuracy, ease of learning and user

friendly characteristics. The results indicate that the Naïve Bayes classifier

outperforms in prediction decision tree and neural network methods. [28]. Data

mining has been used to predict the intra-year Academic Performance of the

student using the historic data and final grade of students [29-30].

Education Data Mining is a promising discipline which has an imperative impact

on predicting students’ academic performance. Student’s performance has been

20

evaluated using association rule mining algorithm [31]. Predicting student’s

intermediate mental steps in sequences of actions stored by learning environments

based on problem solving have also been made. Association rule algorithms have

been applied for personality mining based on web-based education models in

order to deduce learners’ personality characteristics [32]. Meaningful

characteristics extraction and updation of model have been carried out to reflect

newly gained knowledge. Self-organizing maps and principal component analysis

have been applied for predictive and compositional modeling of student profile

[33].

2.4. DETECTING UNDESIRABLE STUDENT BEHAVIORS Detection of undesirable student behavior aims to find out students who having

some problem or unusual behavior such as: erroneous actions, low motivation,

playing games, misuse, cheating, dropping out, academic failure, etc. Several soft

computing techniques (predominantly classification and clustering) have been

used to search such students so that they may be appropriately. Such classification

algorithms used for predicting, understanding and preventing academic failure

includes decision tree, neural networks, naïve Bayes, instance-based learning,

logistic regression and support vector machines, feed-forward neural networks,

probabilistic ensemble simplified fuzzy ARTMAP, Bayesian nets, logistic

regression, simple logic classification, instance based classification, attribute

selected classification, bagging, classification via regression, Bayesian classifiers,

logistic models, rule-based learner, random forest, C4.5 decision tree algorithm,

J48 decision tree algorithm, FarthestFirst clustering and algorithm, algorithm for

the automatic identification of the students’ cognitive styles [34-38].

Discriminant analysis, neural networks, random forests and decision trees have

been used for classifying university students into low-risk, medium-risk and high-

risk of failing [39]. Decision tree algorithms help earlier in identifying the

dropouts and students who need special attention and allow teacher to provide

appropriate advising/counseling [40]. In Educational Data Mining hidden

knowledge can be retrieve through data mining techniques which indicate

student's terminal performance [41]. Among the clustering used for this purpose,

21

prominent are: Kohonen nets to detect students that cheat in online assessments;

outlier detection method to detect learners’ irregular learning [42-43].

2.5. GROUPING STUDENTS Creation of students groups involve student customized features, personal

characteristics which can be used by the instructor/developer for various purpose

like personalized learning system, promotion of effective group learning, for

adaptive contents. The DM techniques utilized for this purpose are classification

(supervised learning) and clustering (unsupervised learning). Cluster analysis or

clustering is the assignment of set of observations into subsets (called clusters)

based on maximum possible similarity [44].

Various clustering algorithms have been used to group students; the prominent

are: hierarchical agglomerative clustering, K-Means and models with similar skill

profiles [45]; clustering algorithm based on large generalized sequences to find

groups of student with similar learning characteristics [46]; hierarchical clustering

algorithm for user modeling (learning styles) in intelligent e-learning systems in

order to group students according to their individual learning style preferences

[47]; hybrid clustering and Bayesian networks to group students according to their

skills [48]; improved matrix-based clustering for grouping learners by

characteristics in e-learning [49]; fuzzy clustering algorithm to find out groups of

learner according to their personality and learning strategy [50]; Expectation-

Maximization algorithm to form heterogeneous groups according to student skills;

K-means clustering algorithm to discover interesting patterns that characterize the

work of stronger and weaker students [51]; Multiple correspondence analysis and

cross-validation by correlation analysis have been applied to identify learning

styles in Index of Learning Styles (ILS) questionnaires [52]; two-step cluster

analysis to classify how students organize personal information spaces (piling,

one-folder, small-folders and big-folder filing) [53]; hierarchical cluster analysis

to establish the proportion of students who get an exercise wrong or right and

genetic clustering algorithm to solve the problem of allocating new [54].

22

2.6. EXISTING FUZZY APPROACH Student performance evaluation tasks require consideration of evidence collected

via various modes of assessment such as practical, examinations and observations

all involves awarding scores as numerical values and grades that may often be

expressed in linguistic terms such as good, bad, satisfactory, excellent [55]. These

linguistic terms carry imprecision that may arise from human interpretations and

from different means of implementing the evaluation. The use of linguistic terms

in assessing performance has been the main reason for researchers applying fuzzy

techniques to student performance evaluation. It has been argued that one of most

appropriate way of handling multiple variables that contain imprecise data is use

of fuzzy logic reasoning which reflects the way of human-thinking. The important

reasons for fuzzy approach utilization in educational grading system incorporate

the presence of substantial vagueness in educational systems and ability of fuzzy

theory to provide subjective judgment [56]. Law reinforces use of fuzzy

techniques for student performance evaluation by giving a list of reasons [57]: 1. Scores/marks given for student performance are not very precise.

2. Examinations consist of vague data.

3. A common method of reading students is the use of linguistic variables.

The fuzzy approach for evaluation of student performance involves three

important tasks: fuzzification, inference and de-fuzzification. In general, student

scores or marks (crisp values) have to be transformed into fuzzy input values by

the use of suitable membership before aggregation. Fuzzy values can also be

obtained directly from domain experts, avoiding the need for fuzzification in this

case. The outputs of fuzzy inference are typically in terms of fuzzy values

representing a student's performance. These fuzzy values need to be again

transformed into crisp values in order to produce an output, often as easy to

understand e.g. percentage mark. Four prominent approaches used for evaluation

of student performance have been described below in detail:

2.6.1. Biswas’s Approach Biswas fuzzy technique based on student's answerscripts [56] employs idea of

fuzzy similarity which is specifically defined as follows:

23

For two discrete fuzzy sets Q and M their similarity is:

푆(푄,푀) = ∑ ( )∗ ( )∑ ( ), ∑ ( )

(2.1)

Where i =1,2,…, are the domain elements. Obviously S(Q,M) [0,1]. Also, the

larger the value of S(Q,M), the greater the similarity between fuzzy sets Q and M.

In their work, the above measure is used to compare the similarity of a student’s

performance, expressed in fuzzy values, with Standard Fuzzy Sets (SFS), which

are predefined with membership values corresponding to different levels of

student performance. The SFS are devised by experts according to the standard

fixed by educational authority. SFS refer to levels of student performance such as:

Excellent (A), Very Good (B), Good (C), Satisfactory (D) and Unsatisfactory (F)

(Table 2.1). Initially evaluator has been awarded fuzzy marks for each question

(Qi) into fuzzy grade sheet containing rows for questions and columns for

awarding marks. A matching operation is then performed according to definition

(2.1) for each question (Qi), to each level of performance A, B, C, D and F, to

obtain similarity values S(Qi,A), S(Qi,B), S(Qi,C), S(Qi,D) and S(Qi,F). The grade

for each question is determined based on the maximum similarity value among the

level of performance. The total score involves the use of marks allocated for each

question and mid-grade points for each grade is awarded (Table 2.2). Different

grades obtained from each question are used to calculate the total score based on

the definition:

푇푆 = [∑푇(푄 ) ∗ 푃(푔 )] (2.2)

Where T(Qi) are marks allocated for each question and P(gi) are the mid-grade

points. The total score (TS) will be in the form of crisp values [0, 100]. New final

grade will be determined based on crisp interval values referring to the level of

performance.

Table 2.1: Standard Fuzzy Sets to Represent Student Performance

S.No. Linguistic Terms Fuzzy Sets 1. Excellent {0/0, 0/20, 0.8/40, 0.9/60,1/80, 1/100} 2. Very Good {0/0, 0/20, 0.8/40, 0.9/60, 0.9/80, 0.8/100} 3. Good {0/0, 0/20, 0.8/40, 0.9/60, 0.9/80, 0.8/100} 4. Satisfactory {0.4/0, 0.4/20, 0.9/40, 0.6/60, 0.2/80, 0/100} 5. Unsatisfactory {1/0, 1/20, 0.4/40, 0.2/60, 0/80, 0/100}

24

Table 2.2: Grade and their Corresponding Mid-Grade Points

S.No. Linguistic Terms Grade Mid-Grade Points

1. Excellent (90 A 100) 95 2. Very Good (80 A 90) 85 3. Good (50 A 70) 60 4. Satisfactory (30 A 50) 40 5. Unsatisfactory (0 A 30) 15

Although this technique shows the usefulness of fuzzy membership values for

aggregating marks from different questions, its disadvantages are as follows:

1. The use of fuzzy grade sheet (to obtain fuzzy marks) is very confusing

because the fuzzy marks are not referred to each level of performance.

2. This method may be time consuming to compute the matching operations

between the fuzzy marks and each of the SFS.

3. Method also suffers from the use of mid-grade points in the calculation of the

total score. These values may greatly influence the total score and thus can

create unexpected results.

2.6.2. Chen and Lee's Approach Chen and Lee technique aims to resolve drawbacks of the method outlined above

for evaluation of student answerscripts [58]. In this approach, the degree of

satisfaction is defined in advance by experts with respect to levels of performance.

In this way, the maximum degree of satisfaction per level is obtained as

summarized in Table 2.3. Also shown are eleven levels of student performance

that have been proposed and used. The evaluator has to award fuzzy marks into

the fuzzy grade sheet for each question (Qi) according to level of performance.

From this, the degree of satisfaction for each individual is calculated as:

퐷(푄) =∑ ( )∗ ( )

∑ ( ) (2.3)

where 휇 (푥 ) are membership values awarded to each level of performance and

F(xi) is the respective maximum degree of satisfaction.

The final step of the method is to calculate the total score TS based on questions

as follows:

푇푆 = ∑푇(푄 ) ∗ 퐷(푄 ) (2.4)

25

where T(Qi) is marks allocated for each question by the evaluator and D(Qi) is the

computed degrees of satisfaction for Qi.

Table 2.3: Degrees of Satisfaction According to Performance Level

S.No. Satisfaction Levels Degrees of Satisfaction

Maximum Degree of Satisfaction

1. Extremely good (EG) 100 1.00 2. Very very good (VVG) 91-99 0.99 3. Very good (VG) 81-90 0.90 4. Good (G) 71-80 0.80 5. More or less good (MG) 61-70 0.70 6. Fair (F) 51-60 0.60 7. More or less bad (MB) 41-50 0.50 8. Bad (B) 25-40 0.40 9. Very Bad(VB) 10-24 0.24

10. Very Very Bad (VVB) 01-09 0.09 11. Extremely bad (EB) 0 0.00

From TS a grade is awarded based on the satisfaction level that has been

predefined. This technique also has several disadvantages as given below:

1. The usage of the maximum degree of satisfaction is very confusing and the

results of the aggregation are biased towards the number of satisfaction levels

created.

2. Lower satisfaction level means that the difference between the original score

and the new score is greater.

3. The use of an extended fuzzy grade sheet to award fuzzy marks may not be

practical when the problem scales up, as it involves awarding too many fuzzy

values to evaluate each question.

4. Become worse in cases where the number of questions or modes of assessment

increases.

2.6.3. Law's Approach Law proposed an alternative approach to student performance evaluation based on

the notion of fuzzy expected values [57]. The fuzzy expected value of a fuzzy set

A is defined as:

퐸(퐴) = ∫ ( ) ( )∫ ( ) ( )

(2.5)

26

With 휇 (푥)being the membership function of x in A and f(x) being the distribution

function of x in A. Contrary to other methods, Law's approach the original student

scores have to be represented in crisp values. Fuzzification is used to transform

such scores into fuzzy values. The fuzzy partitions underlying the fuzzification are

defined in advance by experts based on an expectation of the percentage of

students who will receive a certain level of performance (being one of the

following five grades: A, B, C, D and F). A fuzzy assessment matrix, M, is created

using the fuzzified values as given in equation (2.6):

푀 =휇 (푄 )휇 (푄 )

… …휇 (푄 )

휇 (푄 )휇 (푄 )

… …휇 (푄 )

휇 (푄 )휇 (푄 )

… …휇 (푄 )

휇 (푄 )휇 (푄 )

… …휇 (푄 )

휇 (푄 )휇 (푄 )

… …휇 (푄 )

(2.6)

The matrix is employed in conjunction with the fuzzy expected values for each

level of performance to compute an intermediate new score vector (one new score

per question):

푁푆 = 푀 ∗ [퐸(퐴),퐸(퐵),퐸(퐶),퐸(퐷),퐸(퐹)] (2.7)

where the expected values for each level of performance E(A), E(B), E(C), E(D),

and E(F) are calculated using equation (2.7) and the same fuzzy partitions

mentioned above. This new vector is then used to calculate the core of the total

score (CTS),

퐶푇푆 = ∑ 퐷 푄 ∗ 푁푆 (2.8)

where D(Qj) are the full percentage marks allocated for each question. Since CTS

(0, 1), the final total score, TS is set to CTS 100 to obtained a readily

understandable mark on student performance. The approach demonstrates the

advantage of using fuzzy expected value in student performance evaluation.

Disadvantages of this technique are given below:

1. Although it may be useful to obtain evaluation results according to expert

expectation the resulting new total score and grade may not reflect actual

performance of the student on the subject matter. This is due to initial fuzzy

partitions which may not be specified with regard to student’s performance.

2. This method works with respect to single evaluation criterion therefore it

cannot assess student's performance based on multiple criteria. In addition, the

method involves extensive computation making limitations for the approach.

27

2.6.4. Rasmani and Shen’s Method Rasmani and Shen proposed a special fuzzy inference technique and use of data

driven fuzzy rule identification method which allow the addition of expert

knowledge [55] with aim to obtain user comprehensive knowledge from historical

data making possible justification of evaluation. The suggested inference

technique, called weighted fuzzy subset hood based reasoning, developed for

multiple input and single output (MISO) fuzzy systems that apply rules of form:

퐼퐹 퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅 …푂푅 푤 퐸 ,퐴 .퐴 푂푅 …

푂푅 푤 퐸 ,퐴 .퐴 ] 퐴푁퐷

퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅…푂푅 푤 퐸 ,퐴 .퐴 푂푅… …

푂푅 푤 퐸 ,퐴 .퐴 ] 퐴푁퐷… … …퐴푁퐷

퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅 …푂푅 푤 퐸 ,퐴 .퐴 푂푅 … …

푂푅 푤 퐸 ,퐴 .퐴 ] 퐴푁퐷 …퐴푁퐷

퐴 푖푠 [푤(퐸 ,퐴 ).퐴 푂푅 푤(퐸 ,퐴 ).퐴 푂푅…푂푅 푤 퐸 ,퐴 .퐴 푂푅…

푂푅 푤 퐸 ,퐴 .퐴 ] 푇퐻퐸푁 퐵 푖푠 퐸

where m is the number of antecedent dimensions, 퐴 , 푘 휖 [푙,푚] are the antecedent

linguistic variables, 푛 is the number of linguistic terms in the kth antecedent

dimension, B is the consequent linguistic variables, 퐸 , 푖 ∈ [푙,푁] is the ith

consequent linguistic term, N is the number of consequent linguistic terms, and

푤(퐸 ,퐴 ) is the relative weight of the antecedent linguistic term 퐴 . The weight

expresses the influence of the set 퐴 towards the conclusion drawn. One

determines the weight as a result of the normalization of the fuzzy subset hood

value of the set.

푤 퐸 ,퐴 = ,

…. , (2.9)

The fuzzy subset hood value S represents in this case the degree to which the

fuzzy set 퐴 is the subset of a the fuzzy set 퐸 . It is calculated as:

28

푆 퐸 ,퐴 =∑ ∇ ( ), ( )∈

∑ ( ) (2.10)

where U is the universe of discourse, 휇 is the membership function, and ∇ is an

arbitrary t-norm.

The rule base contains only one rule for each consequent linguistic term. The first

step of the fuzzy inference is the calculation of the overall weight of each rule by

applying the arbitrary disjunction and conjunction operator to the antecedent side.

Next, one selects the rule having the highest weight, whose consequent the final

score of the student. One identifies the rule base in the following steps:

1. Create the input and output partitions.

2. Divide the training dataset into subgroups on the output linguistic terms.

3. Calculate fuzzy subset hood values for each subgroup.

4. Calculate weight for each linguistics term.

5. Create rules of form.

6. Test the rule base using a test dataset.

The main advantage of this method is that it requires a rule base with a low

number of rules, which number is equal with the numbers of output linguistic

terms. Besides, it allows the evaluation of question/test to be made by fuzzy

numbers. This technique also suffers few disadvantages:

1. It is not clear how the antecedent and consequent are determined and what is

the meaning of the fuzzy subset hood values in case of the evaluation of the

student’s academic performance?

2. Numbers of Fuzzy ‘IF-THEN’ rules are maximum making computation

process more complex.

3. Some rules are not used for inference mechanism.

2.7. SOFT COMPUTING Soft computing is a collection of methodologies that aim to exploit the tolerance

for imprecision and uncertainty to achieve tractability, robustness, and low

solution cost [59]. The term soft computing was proposed by the inventor of fuzzy

logic Lotfi A. Zadeh. Its principal constituents are fuzzy logic, neurocomputing,

and probabilistic reasoning. Soft computing is likely to play an increasingly

29

important role in many application areas including software engineering. The role

model for soft computing is the human mind.

Soft computing, not precisely defined, consists of distinct concepts and techniques

which aim to overcome the difficulties encountered in real world problems. These

problems result from the fact that our world seems to be imprecise, uncertain and

difficult to categorize. For example, the uncertainty in a measured quantity is due

to inherent variations in the measurement process itself. The uncertainty in a result

is due to the combined and accumulated effects of these measurement

uncertainties which were used in the calculation of that result [60].

In many cases the increase in precision and certainty can be achieved by lot of

work and cost. Zadeh gives as an example the travel salesman problem, in which

the computation time is a function of accuracy and it increases exponentially [59].

Another possible definition of soft computing is to consider it as an anti-thesis to

the concept of computer we now have, which can be described with all the

adjectives such as hard, crisp, rigid, inflexible and stupid. Along this track, one

may see soft computing as an attempt to mimic natural creatures: plants, animals,

human beings, which are soft, flexible, adaptive and clever. Thus soft computing

is the name of a family of problem-solving methods that are analogous with

biological reasoning and problem solving (sometimes referred as cognitive

computing). The basic methods included in cognitive computing are fuzzy logic

(FL), neural networks (NN) and genetic algorithms (GA) methods which do not

derive from classical theories.

Soft computing can also be seen as a foundation for the growing field of

computational intelligence (CI). The difference between traditional artificial

intelligence (AI) and computational intelligence is that AI is based on hard

computing whereas CI is based on soft computing. Soft Computing is not just a

mixture of these ingredients, but a discipline in which each constituent contributes

a distinct methodology for addressing problems in its domain in complementary

manner rather than competitive way [59].

Soft computing methods have been applied to many real-world problems.

Applications can be found in signal processing, pattern recognition, quality

assurance and industrial inspection, business forecasting, speech processing, credit

rating, adaptive process control, robotics control, natural language understanding,

30

etc. Possible new application areas may include programming languages, user

friendly application interfaces, automaticized programming, computer networks,

database management, fault diagnostics and information security [59].

Fuzzy logic is mainly associated to imprecision, approximate reasoning and

computing with words, neuro-computing to learning and curve fitting

(classification), and probabilistic reasoning to uncertainty and belief propagation

(belief networks). These methods have similarity such as they are nonlinear, have

ability to deal with non-linearities, follow greater human-like reasoning paths

utilize self-learning, utilize yet-to-be-proven theorems and are robust in the

presence of noise or errors.

Similarities between fuzzy logic systems and neural networks [61] includes

estimate functions from sample data and are dynamic systems which can be

expressed as a graph made up of nodes and edges. This has ability to convert

numerical inputs to numerical outputs, process inexact information inexactly,

same state space, produce bounded signals, set of n neurons defines n-

dimensional fuzzy sets, learn unknown probability function p(x), act as associative

memories and can model any system provides the number of nodes sufficient. The

main dissimilarity between fuzzy logic system and neural network is that FLS

uses heuristic knowledge to form rules and tunes these rules by using sample data,

whereas Neural Network forms “rules” based entirely on data.

In many cases, better results have been achieved by combining different soft

computing methods (hybrid systems) which are growing rapidly. A very

interesting combination is the neuro-fuzzy architecture in which the good

properties of both methods have been bringing together. Mostly neuro-fuzzy

systems are fuzzy rule based systems in which neural networks techniques have

been used for rule induction and calibration. Fuzzy logic may also be employed to

improve the performance of optimization methods used with neural networks. For

example, it may control the vibration of direction for searching vector in quasi

Newton method [62].

2.7.1 Fuzzy Logic Fuzzy set theory provides a mathematical tool for dealing with the concepts used

in natural language (linguistic variables) [63]. Fuzzy Logic is basically a

31

multivalued logic that allows intermediate values to be defined between

conventional evaluations. The story of fuzzy logic is very ancient. To devise a

concise theory of logic and later mathematics, Aristotle proposed so-called “Laws

of Thought”. One of these, the “Law of the Excluded Middle”, states that every

proposition must either be True (T) or False (F). Even when Parminedes proposed

the first version of this law (ca. 400 BC) there were strong and immediate

objections: for example, Heraclitus proposed that things could be simultaneously

true and not true. Plato laid the foundation for fuzzy logic indicating that there is a

third region (beyond T and F) where these opposites tumbled about. A systematic

alternative to the bi-valued logic of Aristotle was first proposed by Lukasiewicz

around 1920 by describing a three-valued logic along with the mathematics to

accompany it. The third value proposed can be translated as the term possible and

assign numeric value between T and F.

Later, four-valued logics, five-valued logics have been explored and agreed that in

principle there have been nothing to prevent the derivation of an infinite-valued

logic. £ukasiewicz felt that three and infinite valued logics are the most intriguing,

but ultimately settled on a four-valued logic because it seems to be easily

adaptable to Aristotelian logic. Knuth also proposed a three valued logic similar to

Lukasiewicz’s speculating that mathematics would become even more elegant

than in traditional bi-valued logic. The notion of an infinite-valued logic is also

evident in Zadeh’s seminal work “Fuzzy Sets” where mathematics of fuzzy set

theory and extension fuzzy logic has been explained. This theory proposed

making the membership function (or the values F and T) operate over the range of

real numbers (0, 1).

New operations for the calculus of logic have been proposed in principles which

are at least a generalization of classic logic. Fuzzy logic provides an inference

morphology that enables approximate human reasoning capabilities to be applied

to knowledge-based systems. The theory of fuzzy logic provides a mathematical

strength to capture the uncertainties associated with human cognitive processes

such as thinking and reasoning. The conventional approaches to knowledge

representation lack means for representating the meaning of fuzzy concepts. As a

consequence, the approaches based on first order logic and classical probability

theory does not provide an appropriate conceptual framework for dealing with the

32

representation of commonsense knowledge. Such knowledge by its nature is both

lexically imprecise and noncategorical.

The development of fuzzy logic motivated large measure by generating need for a

conceptual framework which can address issue of uncertainty and lexical

imprecision. The essential characteristics of fuzzy logic relate to following [59]:

1. Exact reasoning is viewed as a limiting case of approximate reasoning.

2. Everything is a matter of degree.

3. Knowledge is interpreted a collection of elastic or equivalently, fuzzy

constraint on a collection of variables.

4. Inference is viewed as a process of propagation of elastic constraints.

There are two main characteristics of fuzzy systems that give better performance

for specific applications. Fuzzy systems are suitable for uncertain or approximate

reasoning, especially for the system with a mathematical model that is difficult to

derive. Fuzzy logic allows decision making with estimated values under

incomplete or uncertain information. In 1972 Zadeh’s colleague Kalman (the

inventor of Kalman filter) commented on the importance of fuzzy logic as

“Zadeh’s proposal could be severely, fericiously, even brutally criticized from a

technical point of view. This would be out of place here. But a blunt question

remains: Is Zadeh presenting important ideas or is he indulging in wishful

thinking?” [64]. Heaviest critique has been presented by probability theoreticians

and that is the reason why many fuzzy logic authors have included the comparison

between probability and fuzzy logic in their publications. Fuzzy researchers try to

separate fuzzy logic from probability theory, whereas some probability

theoreticians consider fuzzy logic a probability in disguise [63-64].

Claim: Probability theory is the only correct way of dealing with uncertainty and

anything can be done with fuzzy logic can be done equally well through the use of

probability-based methods. Therefore, fuzzy sets are unnecessary for representing

and reasoning about uncertainty and vagueness probability theory is all that is

required. “Close examination shows that the fuzzy approaches have exactly the

same representation as the corresponding probabilistic approach and include

similar calculi” [65].

Objection: Classical probability theory is not sufficient to express uncertainty

encountered in expert systems. The main limitation is that it is based on two-

33

valued logic. An event either occurs or does not occur. There is nothing between

them. Another limitation is that in reality events are not known with sufficient

precision to be represented as real numbers. For example considers a case with

following given information: An urn contains 20 balls of various sizes, several of

which are large. One cannot express this within the framework of classical theory

or, if it can be done, it cannot be done simply [65-66].

Term fuzzy logic has two meanings. According to the first interpretation (in

narrow sense) it is seen as a multi-valued “imprecise” logic and as an extension to

the more traditional multi-valued logic. Bart Kosko explains this point of view by

emphasizing that in reality everything seems to occur or to be true to a degree.

Facts are always fuzzy, vague or inaccurate to some extent. Only mathematics has

black and white facts and it is only a collection of artificial rules and symbols.

Science deals with gray or fuzzy facts as if they have been black-and-white facts

of mathematics. Nobody has presented a fact having to do with the real world that

is 100 per cent true or 100 per cent false. The first meaning deals some kind of

model for human reasoning. The second interpretation (in wide sense) is that

fuzzy logic = fuzzy set theory. According to this view any field X can be fuzzified

by changing a set in X by a fuzzy set [67]. For example, set theory, arithmetic,

topology, graph theory, probability theory and logic can be fuzzified. This has

already been done in neurocomputing, pattern recognition, mathematical

programming and in stability theory.

If the conventional techniques of system analysis cannot be successfully

incorporated to the modeling or control problem, the use of heuristic linguistic

rules may be the most reasonable solution to the problem. For example, there is no

mathematical model for truck and trailer reversing problem, in which the truck

must be guided from an arbitrary initial position to a desired final position.

Humans and fuzzy systems can perform this nonlinear control task with relative

ease by using practical and at the same time imprecise rules as “If the trailer turns

slightly left, then turn the wheel slightly left”. The most significant application of

fuzzy logic has been in control field. It has been made a rough guess that 90% of

applications are in control. During last decade many applications have been found

in educational domain.

34

2.7.1.1. Fuzzy Set A classical set is a set with crisp boundary. For example, a classical set A of real

numbers greater than 6 can be expressed as 퐴 = {푥 푥⁄ > 6}, where there is a clear,

unambiguous boundary 6 such that if x is greater than this number, then x, belongs

to the set A; otherwise x does not belong to the set. They do not reflect the nature

of human concepts and thoughts which trend to be abstract and imprecise [63]. In

contrast to classical set, fuzzy set is a set without a crisp boundary (i.e. the

transition from “belong to a set” to “not belong to a set”). This smooth transition

is characterized by membership functions that give fuzzy sets flexibility in

modeling commonly used linguistic expressions, such as “the water is hat” or “the

temperature is high”. Zadeh pointed out in 1965 in his seminal paper entitled

“Fuzzy Set”, such imprecisely defined sets or class plays an important role in

human thinking [56]. If X is a collection of objects denoted by x, then fuzzy set A

in X is defined as a set of ordered pairs [68]:

퐴 = {(푥,휇 (푥) 푥⁄ 휖 푋}, (2.11)

Where 휇 (푥) is called the membership function for the fuzzy set A. The

membership function maps each of X to a membership grade (or membership

value) between 0 and 1. Usually X is referred as the universe of discourse or

simply the universe which may consist of discrete objects or continuous space

[68].

2.7.1.2. Membership Formulation A fuzzy set is completely characterized by its membership function to begin with

define several of parameterized MFs of one dimension (i.e. MFs with a single

input). Generally, triangular, trapezoidal and Gaussian membership function has

been used for converting the crisp set into fuzzy set which are as follows [68]: Triangular Membership Function: A Triangular MF is specified by three

parameters [68] {a, b, c} as follows:

푇푟푖푎푛푔푙푒(푥; 푎,푏, 푐) = 푚푎푥 푚푖푛 , , 0 (2.12)

The parameters {a, b, c} (with a<b<c) determine the x coordinates of the three

corners of the underlying triangular MF.

35

Fig. 2.1: Triangular Membership Function

Trapezoidal Membership Function: A trapezoidal MF is specified by four

parameters {a, b, c, d} as follows [68]:

푇푟푎푝푒푧표푖푑푎푙(푥;푎, 푏, 푐, 푑) = 푚푎푥 푚푖푛 , 1, , 0 (2.13)

The parameters {a, b, c, d} (with 푎 < 푏 ≤ 푐 < 푑) determine the x coordinates of

the four corners of the underlying trapezoidal MF.

Fig. 2.2: Trapezoidal Membership Function

Due to simple formulas and computational efficiency both triangular and

trapezoidal MFs have been used extensively, especially in real-time

implementations.

Gaussian Membership Function: A Gaussian MF is specified by two parameters

{푐,휎}:

36

퐺푎푢푠푠푖푎푛(푥; 푐, 휎) = 푒 (2.14)

A Gaussian MF is determined completely by c and 휎; c represents the MF center

and 휎 determines the MF width.

Fig. 2.3: Gaussian Membership Function

2.7.1.3. Fuzzy Relation The fuzzy relation is fuzzy sets in 푋 × 푌 which map each element in 푋 × 푌 to

membership grade between 0 and 1. Applications of fuzzy relations include areas

such as fuzzy control and decision making. Let X and Y be two inverse of

discourse. Then fuzzy relation in 푋 × 푌 will be

푅 = (푥, 푦),휇 (푥,푦) /(휖 푋 × 푌) (2.15)

2.7.1.4. Max-Min Composition Assume 푅 and 푅 two fuzzy relations defined on 푋 × 푌 and 푌 × 푍, respectively.

The max-min composition of 푅 and 푅 is a fuzzy set defined by

푅 ∘ 푅 = (푥, 푧), max 푚푖푛 휇 (푥,푦),휇 (푦, 푧) /푥 휖 푋, 푦 휖 푍, 푧 휖 푍 (2.16)

2.7.1.5. Fuzzy IF-THEN Rules Linguistic variable is characterized by a quintuple (x, T(x), X, G, M) in which x is

the name of the variable; T(x) is the term of x-that is, the set of its linguistic

values; X is the universe of discourse; G is a syntactic rule which generates the

terms in T(x); and M is a semantic rule which associates with each linguistic value

37

A its meaning M(A), where M(A) denotes a fuzzy set in X [68].A fuzzy if-then rule

assumes the form: if x is A then y is B, where A and B are linguistic values defined

by fuzzy sets on universe of discourse X and Y, respectively. Often “x is A” and is

called the antecedent or premise, while “y is B” which is called the consequence

or conclusion. Examples of fuzzy if-then rules are widespread in our daily

linguistic expression e.g. if pressure is high then volume is small [68].

Before employment of fuzzy if-then rules to model and analyze a system,

formalization of what is meant by expression “if x is A then y is B”, which is

sometimes abbreviated as 퐴 → 퐵. In essence, the expression describes a relation

between two variables x and y; this suggests that a fuzzy if-then rule may be

defined as a fuzzy relation R on the product space 푋 × 푌. Generally speaking,

there are two ways to interpret the fuzzy rule 퐴 → 퐵. If we interpret 퐴 → 퐵 as A

coupled with B, then

푅 = 퐴 → 퐵 = 퐴 × 퐵 = ∫ × 휇 (푥) ∗ 휇 (푦)/(푥, 푦) (2.17)

where ∗ is a T-norm operator and 퐴 → 퐵 is used again to represent the fuzzy

relation R. On the other hand, if 퐴 → 퐵 is interpreted as A entails B, then it can be

written as [68]:

푅 = 퐴 → 퐵 = 퐴 × 퐵 = ¬퐴 ∪ 퐵 (2.18)

2.7.1.6. Fuzzy Reasoning Fuzzy reasoning, also known as approximate reasoning, is an inference procedure

that derives conclusions from a set of fuzzy if-then rules and facts. Let 퐴, 퐴 and

B be fuzzy sets of X, X and Y, respectively, assume that the fuzzy implication

퐴 → 퐵 is expressed as a fuzzy relation R on 푋 × 푌 then the fuzzy B induced by “x

is A” and the fuzzy rule “if x is A then y is B” is defined by [68]:

휇 (푦) = 푚푎푥 푚푖푛[휇 (푥),휇 (푥, 푦)] (2.19)

Inference procedure of fuzzy reasoning can be used to derive conclusions,

provided that the fuzzy implication 퐴 → 퐵 is defined as an appropriate fuzzy

relation [68].

When multiple rules with multiple antecedents are used, the interpretation of

multiple rules is usually taken as the union of the fuzzy relation corresponding to

38

the fuzzy rules. The multiple rules with multiple antecedents can be written as

[68]:

Fuzzy reasoning can be employed as an inference procedure to derive the resulting

output fuzzy set 퐶 (Fig. 2.4). Since the max-min composition operator 휊 is

distributive over the ∪ operator it follows that:

퐶 = (퐴 × 퐵 ) ∘ (푅 ∪ 푅 )

= [(퐴 × 퐵 ) ∘ 푅 ]∪ [(퐴 × 퐵 ) ∘ 푅 ] = 퐶 ∪ 퐶 (2.20)

where 퐶 and 퐶 are the inferred fuzzy set for rule 1 and rule 2, respectively.

Fuzzy if-then rules and fuzzy reasoning are the backbone of fuzzy inference

systems which are the most important modeling tool based on fuzzy set theory.

Fig. 2.4: Fuzzy Reasoning for Multiple Rules with Multiple Antecedents

2.7.1.7. Defuzzification The conversion of fuzzy output to crisp output is known as defuzzification. Four

methods of defuzzification are given below:

Premise 1 (fact): x is 퐴 and y is 퐵 Premise 2 (rule 1): x is 퐴 and y is 퐵 then z is 퐶 Premise 3 (rule 2): x is 퐴 and y is 퐵 then z is 퐶 Consequence (conclusion): z is 퐶

39

1. Max Membership Principle: This is also known as the height method and

method is given by the algebraic expression:

휇 (푧∗) ≥ 휇 (푧),푓표푟 푎푙푙 푧 휖 푍 (2.21)

Where 푧∗ is the defuzzified value, as shown in Fig. 2.5.

Fig. 2.5: Max Membership Defuzzification Method

2. Centroid Method: This procedure (also called center of area or center of

gravity) is the most prevalent and physically appealing of all the

defuzzification methods [68]. It is given by the algebraic expression:

푧∗ = ∫ ( ).∫ ( )

(2.22)

Where ∫ denotes an algebraic integration as shown in Fig. 2.6.

Fig. 2.6: Centroid Defuzzification Method

3. Weighted Average Method: The weighted average method is most

frequently used in fuzzy applications because it is one of the greatest

computational efficient methods. Unfortunately, it is usually restricted to

symmetrical output membership functions. It is given by the algebraic

expression:

푧∗ = ∑ ( ̅). ̅∑ ( ̅)

(2.23)

40

Where ∑ denote the algebraic sum and 푧̅ is the centroid of each symmetric

membership function (Fig. 2.7).

Fig. 2.7: Weighted Average Method of Defuzzification

4. Center of Sums: This is faster than many defuzzification methods and is not

restricted to symmetric membership functions. This process involves the

algebraic sum of individual output fuzzy set, say 퐶 and 퐶 , instead of their

union. Two drawbacks to this method are: interesting areas are added twice

and involvement of finding the centroid of the individual membership

functions. The defuzzified value 푧∗ is given by:

푧∗ =∑ ( ) ∫ ̅

∑ ( ) ∫ (2.24)

Where 푧̅ is the distance to the centroid of each of the respective membership

functions (Fig. 2.8).

5. Mean of Max Membership: This method (also called middle-of-maxima) is

closely related to the Max Membership principle except that the locations of

maximum membership can be non-unique (i.e., the maximum membership can

be a plateau rather than a single point). This method is given by the expression

[68]:

푧∗ = (2.25)

Where a and b are as defined in Fig. 2.9.

41

Fig. 2.8: Center of Sum Defuzzification Method (a) First Membership

Function (b) Second Membership Function (c) Defuzzification Step

Fig. 2.9: Mean Max Membership Defuzzification Method

6. Center of Largest Area: If the output fuzzy set has at least two convex sub

regions, then the center of gravity (i.e., 푧∗ is calculated using the centroid

method) of the convex fuzzy sub-region with the largest area is used to obtain

42

the defuzzified value 푧∗ of the output. This is shown graphically in Fig. 2.10,

and given algebraically as:

푧∗ = ∫ ( ).

∫ ( ) (2.26)

where 퐶 is the convex sub-region that has the largest area making up 퐶 .

Fig. 2.10: Center of Largest Area Method

2.7.1.8. Fuzzy Inference System The Fuzzy inference system is a popular computing framework based on concepts

of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. It has been

successfully applied in fields such as automatic control, data classification,

decision analysis, expert systems, and computer vision [69]. Due to its

multidisciplinary nature the fuzzy inference system is known variously such as

fuzzy rule-based system, fuzzy expert system, fuzzy model, fuzzy associative

memory, fuzzy logic controller and simply fuzzy system [70-71]. Basically, a

fuzzy inference system consists of three conceptual components: a rule base,

which contains a selection of fuzzy rules, a database or dictionary, which defines

the membership functions used in the fuzzy rules, and a reasoning mechanism,

which performs the inference procedure upon the rules and a given condition to

derive a reasonable output [70]. A fuzzy inference system is composed of five

functional blocks (Fig. 2.11):

1. A rule base containing number of fuzzy if-then rules.

43

2. A database which defines the membership functions of the fuzzy sets used in

the fuzzy rules.

3. A decision-making unit which performs the inference operations on the rules.

4. A fuzzification interface which transforms the crisp inputs into degrees of

match with linguistic value.

5. A defuzzification interface which transform the fuzzy results of the inference

into a crisp output.

Fig. 2.11: Fuzzy Inference System

The rule base and the database are jointly referred as the knowledge base. The

steps of fuzzy reasoning performed by fuzzy inference systems are [72]:

1. Compare the input variables with the membership functions on the premise

part to obtain the membership values of each linguistic label, a step known as

fuzzification.

2. Combine the membership values through a specific T-norm operator, usually

multiplication or min on the premise part to get firing strength (weight) of

each rule.

3. Generate the qualified consequent either fuzzy or crisp of each rule depending

on the firing strength.

4. Aggregate the qualified consequents to produce a crisp output. This is final

step and is called defuzzification.

44

Several types of fuzzy reasoning have been proposed in this chapter. Depending

on the types of fuzzy reasoning and fuzzy if-then rules employed, most fuzzy

inference systems can be classified into three types as shown in Fig. 2.12.

Fig. 2.12: IF-THEN Rules and Fuzzy Reasoning Mechanism

It is evident that Fig 2.12 utilizes a two-rule two-input fuzzy inference system to

show different types of fuzzy rules and fuzzy reasoning mentioned above. Most of

the differences come from the specification of the consequent part and thus the

defuzzification schemes are also different.

Type 1: Tsukamoto Fuzzy Model: The overall output is the weighted average of

each rule’s crisp output induced by the rule’s firing strength (the product or

minimum of the degrees of match with the premise part) and output membership

functions. The output membership functions used in this scheme must be

monotonic function, as shown in Fig. 2.13 [68]. Since, each rule refers a crisp

output the Tsukamoto fuzzy model aggregate each rule’s output by the method of

weighted average. Thus avoids the time-consuming process of defuzzification.

However, the Tsukamoto fuzzy model is not often used since it is less transparent

compared to Mamdani and Sugeno fuzzy model.

Type 2: Mamdani Fuzzy Model: The overall fuzzy output is derived by applying

max operation to the qualified fuzzy outputs (each equal to minimum of firing

45

strength and the output membership function of each rule). A two-rule Mamdani

fuzzy inference system derives the overall output z when subjected to two crisp

inputs x and y (Fig 2.14). When max and algebraic products adapt as per choice

for the T-norm and T-conorm operators, respectively, and use max-product

composition instead of the original max-min composition then the resulting fuzzy

reasoning will be like Fig. 2.14.

Fig. 2.13: Tsukamoto Fuzzy Model

Various schemes based on fuzzy output (centroid of area, bisector of area, mean

of maxima, maximum criterion) have been proposed for final crisp output [68].

Fig. 2.14: Mamdani Fuzzy Model

46

Type 3: Sugeno Fuzzy Model: The Sugeno fuzzy model (also known as TSK

fuzzy model) is an effort to develop systematic approach to generating fuzzy rules

from a given input-output data set. A typical fuzzy rule in Sugeno fuzzy model

has the form 푖푓 푥 푖푠 퐴 푎푛푑 푦 푖푠 퐵 푡ℎ푒푛 푧 = 푓(푥, 푦), where A and B are fuzzy sets

in the antecedent, while 푧 = 푓(푥,푦) is a crisp function in the consequent. Usually

푓(푥,푦) is a polynomial in the input variables x and y, but it can be any function as

long as it can appropriately describe the output of model within the fuzzy region

specified by the antecedent of the rule. When 푓(푥, 푦) is a fist order polynomial,

the resulting fuzzy inference system is called first-order Sugeno fuzzy model (Fig

2.15). If f is constant, then zero-order Sugeno fuzzy model can be viewed as a

special case of the Mamdani fuzzy inference system. The output of Takagi and

Sugeno’s if-then rules is a linear combination of input variables plus a constant

term, and the final output is the weighted average of each rule’s output [68].

Fig. 2.15: Sugeno Fuzzy Model

2.7.2. Neural Networks The study of neural networks started with the publication of McCulloch and Pitts

[73]. The single layer networks, with threshold activation functions are called

perceptrons, have been introduced by Rosenblatt [74]. In 1960s, experiment

showed that perceptrons could solve many problems. But many problems which

did not seem to be more difficult could not be solved. These limitations of one-

layer perceptron have been mathematically discussed in detail by Minsky and

Papert in the book Perceptron which resulted in less of neural networks

47

interestingness for almost two decades. In the mid-1980s, back-propagation

algorithm proposed by Rumelhart, Hinton, and Williams [75], which revived the

study of neural networks and signify that multilayer networks could be trained by

using it.

Neural network makes an attempt to simulate human brain. The simulating is

based on the present knowledge of brain function, and this knowledge is even at

its best primitive. So, it is not absolutely wrong to claim that artificial neural

networks probably have no close relationship to operation of human brains. The

operation of brain is believed to be based on simple basic elements called neurons

which are connected to each other with transmission lines called axons and

receptive lines called dendrites (Fig. 2.16). The learning may be based on two

mechanisms: the creation of new connections, and the modification of

connections. Each neuron has an activation level which, in contrast to Boolean

logic, ranges between some minimum and maximum value.

Fig. 2.16: Biological and Artificial Neuron

In artificial neural networks the inputs of the neuron are combined in a linear way

with different weights. The result of this combination is then fed into a non-linear

activation unit (activation function), which can in its simplest form be a threshold

unit (see Fig. 2.10). Neural networks are often used to enhance and optimize fuzzy

logic based systems, e.g., by giving them a learning ability. This learning ability is

48

achieved by presenting a training set of different examples to the network and

using learning algorithm which changes the weights (or the parameters of

activation functions) in such a way that the network will reproduce a correct

output with the correct input values. The difficulty is how to guarantee

generalization and to determine when the network is sufficiently trained.

Neural networks offer nonlinearity, input-output mapping, adaptivity and fault

tolerance. Nonlinearity is a desired property if the generator of input signal is

inherently nonlinear [76]. The high connectivity of the network ensures that the

influence of errors in a few terms will be minor, which ideally gives a high fault

tolerance. (Note that an ordinary sequential computation may be ruined by a

single bit error).

2.7.2.1. Adaptive Neural Networks Adaptive networks are unifying framework that subsumes almost all kinds of

neural networks paradigms with supervised and unsupervised learning

capabilities. The fundamentals of adaptive networks will be a key element in

underlying other various neural network paradigms such as multilayer

perceptrons.

An adaptive network is a network structure consisting of a number of nodes

connected through directional links. Each node represents a process unit and the

links between nodes specify the casual relationship between the connected nodes.

All or parts of the nodes are adaptive, which means the outputs of these nodes

depend on modifiable parameters pertaining to these nodes. The learning rule

specifies how these parameters should be updated to minimize a prescribed error

measure, which is a mathematical expression that measures the discrepancy

between the network’s actual output and a desired output. In other words, an

adaptive network is used for system identification and out task is to find

appropriate network architecture and a set of parameters which can best model an

unknown target system that is described by a set of input-output data pairs. The

basic learning rule of the adaptive network is the well-known steepest descent

method, in which the gradient vector is derived by successive invocations of the

chain rule. Another procedure is known as the backpropagation learning rule.

49

Fig. 2.17: Feed forward Adaptive Neural Network

In adaptive network overall input-output behavior is determined by a collection of

modifiable parameters (Fig. 2.17). Specifically, the configuration of an adaptive

network is composed of a set of nodes connected by directed links, where each

node performs a static node function on its incoming signals to generate a single

node output and each link specifies the direction of signal flow from one node to

another. Usually, a node function in a parameterized function with modifiable

parameters; change in these parameters results in the change in node function as

well the overall behaviour of the adaptive network.

Assume that each node in an adaptive network performs a static mapping from its

inputs(s) to output. Namely, a node’s output depends on its current input only;

there are no dynamic or internal states in each node. Moreover, to facilitate the

development of learning algorithms assumption that all nodes functions are

differentiable except at a finite number of points. Mostly an adaptive network is

heterogeneous and each node may have a specific node function different from the

others. Links in an adaptive network are merely used to specify the propagation

direction of node outputs; generally there are no weights or parameters associated

with links. Fig. 2.17 is a typical adaptive network with two inputs and two

outputs.

The parameters of an adaptive network are distributed into its nodes, so each node

has a local parameter set. The union of these local parameter sets is the network’s

overall parameter set. If a node’s parameter set is not empty, then its node

function depends on the parameter values; we can use a square to represent this

50

kind of adaptive node. On the other hand, if a node has a empty parameter set,

then its function is fixed; we use a circle to denote this of fixed node. Each

adaptive node can be decomposed into a fixed node plus one or several parameter

node. Adaptive networks are generally classified into two categories on the basis

of the type of connections they have: feed forward and recurrent. The adaptive

network shown in Fig. 2.17 is feed forward, since the output of each node

propagates from the input side (left) to the output side (right) unanimously. If

there is a feedback link that forms a circular path in network, then the network in

recurrent; Fig. 2.18 is an example [68].

Fig. 2.18: A Recurrent Adaptive Network

Conceptually, a feed forward adaptive network is actually a static mapping

between its input and output spaces. The mapping may be either a single linear

relationship or a highly nonlinear one, depending on the network structure (node

arrangement and connections and so on) and the functionality for each node. Here

our aim is to construct a network for achieving a desired nonlinear mapping that is

regulated by a data set consisting of desired input-output pairs of a target system

to be modeled. This data set is usually called the training data set and the

procedure we follow in adjusting the parameters to improve the network’s

performances are often referred to as the learning rule or adaption algorithms.

Usually a network’s performance is measured as the discrepancy between the

desired output and the network’s output under the same input conditions. This

discrepancy is called the error measure and it can assume different forms for

different applications. Generally speaking, a learning rule is derived by applying a

specific optimization technique to give error measure [68].

51

2.7.2.2. Backpropagation for Feed Forward Networks This section introduces a basic learning rule for adaptive network, which is in

essence the simple steepest descent method. The central part of this learning rule

concerns to recursively obtain a gradient vector in which each element is defined

as the derivative of an error measure with respect to parameter [68]. This is done

by means of the chain rule, a basic formula for differentiating composite

functions. The procedure of finding a gradient vector in a network structure is

generally referred to as backpropagation because the gradient vector is calculated

in the direction opposite to flow of the output of each node. Once the gradient is

obtained, a number of derivative-based optimization and regression techniques are

available for updating the parameters. In particular, if we use the gradient vector

in a simple steepest descent method, the resulting learning paradigm is often

referred at as the backpropagation learning rule [68].

Suppose that a given feed forward adaptive network in the layered representation

has L layers and layer l (l=0, 1, …, L; l = 0 represents the input layer) has N(l)

nodes. Then the output and function of nodes I [i = 1,….., N(l)] in layer l can be

represents as xl,i and fl,i respectively, as shown in Fig. 2.19. Without loss of

generality, we assume that there are no jumping links. Since the output of a node

depends on the incoming signals and the parameter set of the node, we have the

following general expression for the node function 푓 . [68]:

푥 , = 푓 , 푥 , …...,,푥 , ( ),훼,훽,훾, … (2.27)

where 훼,훽,훾, etc. are the parameters of this node. Assuming that the given

training data set has P entries, can define an error measure for the pth (1 ≤ 푝 ≤ 푃)

entry of the training data set as the sum of squared errors [68]:

퐸 = ∑ 푑 − 푥 ,( ) (2.28)

where 푑 is the kth component of the pth desired output vector and 푥 , is the kth

component of the actual output vector produced by presenting the pth input vector

to the network. Obviously, when 퐸 is equal to zero, the network is able to

reproduce exactly the desired output vector in the pth training data pair. Thus our

task here is to minimize an output error measure, which is defined as 퐸 =

∑ 퐸 .

52

Fig. 2.19: Layered Representation of Adaptive Feed Forward Network

To use steepest descent to minimize the error measure, first to obtain the gradient

vector. Before calculating the gradient vector, we should observe the following

relationships:

where the arrows ⇒ indicate casual relationships. In other words, a small change

in a parameter 훼 will affect the output of the node containing 훼; this in turn will

affect the output of the final layer and thus the error measure. Therefore, the basic

concept in calculating the gradient vector is to pass a form of derivative

information starting from the output layer and going backward layer until the

input layer is reached [68]. To facilitate the discussion, we define the error signal

휖 , as the derivative of the error measure 퐸 with respect to the output of node i

layer l, taking both direct and indirect paths into consideration. In symbols,

휖 , =,

(2.29)

This expression was called the ordered derivative. The difference between the

ordered derivative and the ordinary partial derivative lies in the way we view the

function to be differentiated [68].

2.7.2.3. Adaptive Neuro-Fuzzy Inference System The architectures and learning rules of adaptive networks have been described the

previous section. Functionally there is almost no constraint on the node functions

Change in parameter 훼

Change in outputs of nodes

containing 훼

Change in network’s

outputs

Change in error measure

53

of an adaptive network except the requirement of piecewise differentiability.

Structurally, the only limitation on the network configuration is that it should be

of the feed forward type if we do not want to use the more complex

asynchronously operated model. Because of these minimum restrictions, adaptive

networks can be employed directly in a wide variety of applications of modeling,

decision making, signal processing and control [68].

In the present research work, an attempt has been made to propose a class of

adaptive networks having functional equivalence to fuzzy inference system. The

proposed architecture referred to as ANFIS, stands for adaptive network-based

fuzzy inference system or semantically equivalence to adaptive neuro fuzzy

inference system. Description have been make work out how to decompose the

parameter set to facilitate the hybrid learning rule for ANFIS architectures

representing both the Sugeno and Tsukamoto fuzzy models.

ANFIS Architecture: For simplicity, given that the fuzzy inference system under

consideration has two inputs x and y and one output z. For a first-order Sugeno

fuzzy model, a common rule set with two fuzzy if-then rules is as:

Rule 1: If x is 퐴 and y is 퐵 then 푓 = 푝 푥 + 푞 푦 + 푟

Rule 2: If x is 퐴 and y is 퐵 then 푓 = 푝 푥 + 푞 푦 + 푟

Fig. 2.20: Two-Input First-Order Sugeno Fuzzy Model with Two Rules

Fig. 2.20 represent the reasoning mechanism for Sugeno fuzzy model and the

corresponding ANFIS architecture is as shown in Fig. 2.21, where nodes of the

same layer have similar functions as described next. (The output of the ith node in

layer l denoted as 푂 , ):

Layer 1: Every node i in this layer is an adaptive node with a node function

54

푂 , = 휇 (푥), 푓표푟 푖 = 1, 2, 표푟

푂 , = 휇 (푦), 푓표푟 푖 = 3, 4 (2.30)

Where x (or y) is the input to node i and 퐴 (or 퐵 ) is a linguistic label

associated with this node. In other words, 푂 , is the membership grade of a fuzzy

set A and it specifies the degree to which the given input x (or y) satisfies the

quantifier A. Here the membership function for A can be any parameterized

membership function such as Gaussian membership function. Parameters in this

layer are referred to as premise parameters [68].

Fig. 2.21: ANFIS Architecture

Layer 2: Every node in this layer is a fixed node labeled ∏, whose output is the

product of all the incoming signals:

푂 , = 푤 = 휇 (푥)휇 (푦), 푖 = 1,2 (2.31)

Each node output represents the firing strength of a rule. In general, any other T-

norm operators that perform fuzzy AND can be used as the node function in this

layer.

Layer 3: Every node in this layer is fixed node labeled N. The ith node calculates

the ratio of the ith rule’s firing strength to the sum of all rule’s firing strengths:

푂 , = 푤 = , 푖 = 1,2 (2.32)

For convenience, outputs of this layer are called normalized firing strengths.

55

Layer 4: Every node i in this layer is an adaptive node with a node function:

푂 , = 푤 푓 = 푤 (푝 푥 + 푞 푦 + 푟 ) (2.33)

where 푤 is a normalized firing strength from layer 3 and {푝 , 푞 , 푟 } is the

parameter set of this node. Parameters in this layer are referred as consequent

parameters.

Layer 5: The single node in this layer is a fixed node labeled ∑, which computes

the overall output as the summation of all incoming signals:

표푣푒푟푎푙푙 표푢푡푝푢푡 = 푂 , = ∑푤 푓 = ∑∑

(2.34)

Thus, an adaptive network has been constructed having functional equivalence to

Sugeno fuzzy model. For the Mamdani fuzzy model inference system with max-

min composition, a corresponding ANFIS can be constructed if discrete

approximations used to replace the integrals in the centroid defuzzification

method [68].

2.7.2.4. Hybrid Learning Algorithm Although applicable backpropagation or steepest descent learning to identify the

parameters in an adaptive network can be made, this simple optimization method

usually takes long time before it convergence. It may be noted that an adaptive

network’s output is linear the network’s parameters; identification of these linear

parameters can be made by the linear least-square method. This approach leads to

a hybrid learning rule which combines steepest descent and the least-square

estimator for fast identification of parameters [68].

In the ANFIS architecture (Fig. 2.21) when values of premise parameters are

fixed, the overall output can be expressed as a linear combination of the

consequent parameters. In symbols, the output f can be written as [68]

푓 = 푓 + 푓

= 푤 (푝 푥 + 푞 푦+ 푟 ) + 푤 (푝 푥 + 푞 푦 + 푟 ) (2.35)

= (푤 푥)푝 + (푤 푦)푞 + (푤 )푟 + (푤 푥)푝 + (푤 푦)푝 + (푤 )푟

This is linear in the consequent parameters 푝 ,푞 , 푟 ,푝 , 푞 , 푎푛푑 푟 .

56

In the forward pass of the hybrid learning algorithm, node outputs go forward

until layer 4 and the consequent parameters are identified by the least-squares

method. In backward pass, the error signals propagate backward and the premise

parameters are updated by gradient descent. Accordingly, the hybrid approach

converges much faster since it reduces the search space dimensions of the original

pure backpropagation method [68].

2.7.3. Probabilistic Reasoning Probabilistic reasoning includes genetic algorithms, belief networks, chaotic

systems and parts of learning theory [77]. Although, the present thesis emphasis is

on fuzzy logic systems and neural networks, probabilistic reasoning is included

consistency. Similar to fuzzy set theory, the probability theory deals with the

uncertainty but usually the type of uncertainty is different. Stochastic uncertainty

deals with the uncertainty toward the occurrence of certain event and this

uncertainty is quantified by the degree of probability. Probability statements can

be combined with other statements using stochastic methods. Most popular is the

Bayesian calculus of conditional probability.

2.7.3.1. Probability Random events are used to model uncertainty which is measured by probabilities.

A random event E is defined as a crisp subset of a sample space U. The

probability of 퐸,푃(퐸) ∈ (0,1), is the proportion of occurrence of E. The

probability is supposed to fulfill the axioms of Kolmogorov [68]:

1. 푃(퐸) ≥ 0, ∀ 퐸 ⊂ 푈

2. 푃(푈) = 1

3. If 퐸 are distinct sets, then 푃(∪ 퐸 ) = ∑ 푃(퐸 ).

Example of classical probability is the throwing of the dice. Let U be the set of

integers {1, 2, 3, 4, 5, 6}. An event such as 퐸 = 6 has a probability (퐸 = 6) = .

The state of the art with respect to soft computing techniques and the relevant

work in this area has been classified, not only by the type of data and soft

computing techniques used, but more importantly by the type of educational task

that they resolve. Soft computing techniques has been introduced as an emerging

57

research area related to several well-established areas of research including e-

learning, adaptive hypermedia, intelligent tutoring systems, web mining, data

mining, etc. Also describe are interesting future research lines. It is necessary for

researchers to develop more unified and collaborative studies instead of current

plethora of multiple individual proposals and lines. Thus, complete integration of

soft computing techniques in the educational environment may become a reality

and fully operative implementations (both commercial and free).

REFERENCES [1]. Draper, N.R., and H. Smith. Applied Regression Analysis. John Wiley &

Sons, (1998). [2]. Espejo, P., S. Ventura, and F. Herrera. “A Survey on the Application of

Genetic Programming to Classification.” IEEE Transactions on Systems, Man, and Cybernetics-Part C, 40, no. 2 (2010): 121-144.

[3]. Romero, C., S. Ventura, C. Hervás, and P. Gonzales. “Data Mining Algorithms to Classify Students.” In Proceeding of International Conference on Educational Data Mining, Montreal. Canada, 2008, 8-17.

[4]. Ibrahim, Z., and D. Rusli. “Predicting Students’ Academic Performance: Comparing Artificial Neural Network, Decision Tree and Linear Regression.” Annual SAS Malaysia Forum, Kuala Lumpur, 2007, 1-6.

[5]. Delgado, M., E. Gibaja, M.C. Pegalajar, and Q. Pérez. “Predicting Students' Marks from. Moodle Logs using Neural Network Models.” In Proceeding of International Conference on Current Developments in Technology-Assisted Education. Sevilla, Spain, 2006, 586-590.

[6]. Oladokun, V.O., A.T. Adebanjo, and O.E. Charles-Owaba (2008). “Predicting Student’s Academic Performance using Artificial Neural Network: A Case Study of an Engineering Course.” Pacific Journal of Science and Technology, 9, no. 1 (2008): 72-79.

[7]. Pardos, Z., N. Heffernan, B. Anderson, and C. Heffernan. “The Effect of Model Granularity on Student Performance Prediction Using Bayesian Networks.” In Proceeding of International Conference on User Modeling. Corfu, Greece, 2007, 435-439.

[8]. Ogor, E.N. “Student Academic Performance Monitoring and Evaluation Using Data Mining Techniques.” In Proceeding of Electronics, Robotics and Automotive Mechanics Conference. Washington, DC, 2007, 354-359.

[9]. Lakshmi, T.M., A. Martin, and V.P. Venkatesan. “An Analysis of Students Performance Using Genetic Algorithm.” Journal of Computer Sciences and Applications, 1, no. 4 (2013): 75-79.

[10]. Zafra, A., and S. Ventura. “Predicting Student Grades in Learning Management Systems with Multiple Instance Programming.” In

58

Proceeding of International Conference on Educational Data Mining. Cordoba, Spain, 2009, 307-314.

[11]. Nugent, R., E. Ayers, and N. Dean. “Conditional Subspace Clustering of Skill Mastery: Identifying Skills that Separate Students.” In Proceeding of International Conference on Educational Data Mining, Cordoba. Spain, 2009, 101-110.

[12]. Thomas, E.H., and N. Galambos. “What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis.” Journal of Research in Higher Education, 45, no. 3 (2004): 251-269.

[13]. Cetintas, A., L. Si, Y.P. Xin, and C. Hord. “Predicting Correctness of Problem Solving from Low-Level log Data in Intelligent Tutoring Systems.” In Proceeding of International Conference on Educational Data Mining, Cordoba, Spain, 2009, 230-238.

[14]. Mcdonald, B. “Predicting Student Success.” International Journal for Mathematics Teaching and Learning, 5, (2004): 1-14.

[15]. Sapre, R.G, and S. Surve. “Fuzzy Mathematical Approach for Performance Evaluation of a Student.” International Journal of Fuzzy Mathematics and Systems, 2, no. 2 (2012): 191-198.

[16]. Torre, G.L. “Implementation of Student Performance Evaluation System Using FIS in MATLAB.” International Journal of Engineering Universe for Scientific Research and Management, 3, no. 2 (2011): 1-6.

[17]. Hota H.S., S. Pavani, and P.V.S.S. Gangadhar. “Evaluating Teachers Ranking Using Fuzzy AHP Technique.” International Journal of Soft Computing and Engineering, 2, no. 6 (2013): 485-488.

[18]. Ajiboye, A.R., R.A. Arshah, and H. Qin. “Risk Status Prediction and Modelling of Students’ Academic Achievement: A Fuzzy Logic Approach.” International Journal of Engineering and Science, 3, no. 11 (2013): 07-14.

[19]. Saxena, U.R., and S.P. Singh. “Integrating Neuro-Fuzzy Systems to Develop Intelligent Planning Systems for Predicting Students’ Performance.” International Journal of Evaluation and Research in Education, 1, no. 2 (2012): 61-66.

[20]. Oladipupo, O.O., O.J. Oyelade, and D.O. Aborisade. “Application of Fuzzy Association Rule Mining for Analyzing Students Academic Performance.” International Journal of Computer Science, no. 3 (2012): 216-223.

[21]. Iraji, M.S. “Students Classification with Adaptive Neuro Fuzzy.” International Journal of Modern Education and Computer Science, 7, (2012): 42-49.

[22]. Amershi, S., and C. Conati. “Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning

59

Environments.” Journal of Educational Data Mining, 1, no. 1 (2009): 18-71.

[23]. Gaudioso, E., M. Montero, L. Talavera, and F. Hernandez-Del-Olmo. “Supporting Teachers in Collaborative Student Modeling: A Framework and Implementation.” International Journal of Expert System with Applications, 36, (2009): 2260-2265.

[24]. Mussoab, M. F., E. Kyndtac, E. C. Cascallarad, and F. Dochya. “Predicting General Academic Performance and Identifying the Differential Contribution of Participating Variables Using Artificial Neural Networks.” Journal of Frontline Learning Research, 1, no. 1 (2013): 42-71.

[25]. Naser, S.S.A. “Predicting Learners Performance Using Artificial Neural Networks in Linear Programming Intelligent Tutoring System.” International Journal of Artificial Intelligence and Applications, 3, no. 2 (2012): 65-73.

[26]. Do, Q. H., and J.-F. Chen. “A Comparative Study of Hierarchical ANFIS and ANN in Predicting Student Academic Performance.” WSEAS Transactions on Information Science and Applications, 10, no. 12 (2013): 396-405.

[27]. Borkar, S., and K. Rajeswari. “Attributes Selection for Predicting Students’ Academic Performance using Education Data Mining and Artificial Neural Network.” International Journal of Computer Applications, 86, no.10 (2014):25-29.

[28]. Osmanbegović, E., and M. Suljić. “Data Mining Approach for Predicting Student Performance.” Journal of Economics and Business, 10, no. 1 (2012): 3-12.

[29]. Basha, S.K. A. H., Y.R. R. Kumar, A. Govardhan, and M. Z. Ahmed. “Predicting Student Academic Performance Using Temporal Association Mining.” International Journal of Information Science and Education, 2, no. 1 (2012): 21-41.

[30]. Ahmed, A.B.E.D., and I.S. Elaraby. “Data Mining: A prediction for Student's Performance Using Classification Method.” World Journal of Computer Application and Technology, 2, no. 2 (2014): 43-47.

[31]. Borkar, S., and K. Rajeswari. “Predicting Students Academic Performance Using Education Data Mining.” International Journal of Computer Science and Mobile Computing, 2, no. 7 (2013): 273 – 279.

[32]. Matsuda, N., W. Cohen, J. Sewall, G. Lacerda, and K. R. Koedinger. “Predicting Student’s Performance with SimStudent that Learns Cognitive Skills from Observation.” In Proceeding of International Conference on Artificial Intelligence in Education. Amsterdam, Netherlands, 2007, 467-476.

60

[33]. Lee, C.S. “Diagnostic, Predictive and Compositional Modeling with Data Mining in Integrated Learning Environments.” International Journal of Computer and Education, 49, (2007): 562-580.

[34]. Lykourentzou, I., I. Giannoukos, V. Nikolopoulos, G. Mpardis, and V. Loumos. “Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques.” International Journal of Computer and Education, 53, no. 3 (2009): 950-965.

[35]. Lee, M.W., S.Y. Chen, K. Chrysostomou, X. Liu. “Mining Student’s Behavior in Web-based Learning Programs.” International Journal of Expert System with Applications, 36, (2009): 3459-3464.

[36]. Chen, G., C. Liu, K. Ou, and B. Liu. “Discovering Decision Knowledge from Web log Portfolio for Managing Classroom Processes by Applying Decision Tree and Data Cube Technology.” Journal of Educational Computing Research, 23, no. 3 (2000): 305–332.

[37]. Yu. C.H., S. Digangi, A.K. Jannasch-Pennell, C. Kaprolet. “Profiling Students who take Online Courses using Data Mining Methods.” Online Journal of Distance Learning Administration, 1. 11, no. 2 (2008): 1-14.

[38]. Chang, Y.C, W.Y. Kao, C.P. Chu, and C.H. Chiu. “A Learning Style Classification Mechanism for E-Learning.” International Journal of Computer and Education, 53, no. 2, (2009): 273-285.

[39]. Superby, J.F., J.P. Vandamme, and N. Meskens. “Determination of Factors Influencing Achievement of the First-Year University Students Using Data Mining Methods.” In Proceeding of International Conference on Intelligent Tutoring Systems and Workshop on Educational Data Mining. Taiwan, 2006, 1-8.

[40]. Yadav, S.K, B. Bharadwaj, and S. Pal. “Data Mining Applications: A Comparative Study for Predicting Student’s Performance.” International Journal of Innovative Technology and Creative Engineering, 1, no. 12, (2013): 13-19.

[41]. Venkatesan, N., and N. Chandru. “Student's Performance Measuring using Assistant Algorithm.” International Journal of Soft Computing and Engineering, 3, no. 5 (2013): 216-222.

[42]. Burlak, G., J. Munoz, A. Ochoa, and J.A. Hernández (2006). “Detecting Cheats in Online Student Assessments Using Data Mining.” In Proceeding of International Conference on Data Mining. Las Vegas, 2006, 204-210.

[43]. Uneo, M, and K. Nagaoka. “Learning Log Database and Data Mining System for E-Learning–On Line Statistical Outlier Detection of Irregular Learning Processes.” In Proceeding of International Conference on Advanced Learning Technologies. Tatarstan, Russia, 2002, 436-438.

[44]. Romesburg, H.C. Cluster Analysis for Researchers. Krieger Pub, (2004).

61

[45]. Ayers, E., R. Nugent, and N. Dean. “A Comparison of Student Skill Knowledge Estimates.” In Proceeding of International Conference on Educational Data Mining. Cordoba, Spain, 2009, 1-10.

[46]. Tang, T. Y., and G. Mccalla. “Student Modeling for a Web-Based Learning Environment: A Data Mining Approach.” In Proceeding of Conference on Artificial Intelligence, Edmonton, Canada, 2002, 967-968.

[47]. Zakrzewska, D. “Cluster Analysis for User’s modeling in Intelligent E-Learning Systems.” In Proceeding of International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Poland, 2008, 209-214.

[48]. Hamalainen, W., J. Suhonen, E. Sutinen, and H. Toivonen. “Data Mining in Personalizing Distance Education Courses.” In Proceeding of World Conference on Open Learning and Distance Education. Hong Kong, 2004, 1-11.

[49]. Zhang, K., L. Cui, H. Wang, and Q. Sui (2007). “An Improvement of Matrix-Based Clustering Method for Grouping Learners in E-Learning.” In Proceeding of International Conference on Computer Supported Cooperative Work Design, Melbourne, Australia, 2007, 1010-1015.

[50]. Tian, F., S. Wang, C. Zheng, and Q. Zheng (2008). “Research on E-Learning Personality Group Based on Fuzzy Clustering Analysis.” In Proceeding of International Conference on Computer Supported Cooperative Work in Design. China, 2008, 1035- 1040.

[51]. Perera, D., J. Kay, I. Koprinska, K. Yacef, and O.R. Zaiane. “Clustering and Sequential Pattern Mining of Online Collaborative Learning Data.” IEEE Transaction on Knowledge and Data Engineering, 21, no. 6 (2009): 759-772.

[52]. Hwang, W.Y., C.B. Chang, and G.J. Chen. “The Relationship of Learning Traits, Motivation and Performance-Learning Response Dynamics.” International Journal of Computer and Education, 42, (2004): 267-287.

[53]. Hardof-jaffe, S., A. Hershkovitz, H. Abu-kishk, O. Bergman, R. Nachmias (2009). How Do Students Organize Personal Information Spaces?” In Proceeding of International Conference on Educational Data Mining. Cordoba, Spain, 2009, 250-258.

[54]. Zukhri, Z., and K. Omar (2007). “Solving New Student Allocation Problem with Genetic Algorithms: A Hard Problem for Partition Based Approach.” Journal of Zhejiang University, 2, (2007): 1-9.

[55]. Rasmani, K.A., and Q. Shen. “Data-driven Fuzzy Rule Generation and its Application for Student Academic Performance Evaluation”, Applied Intelligence, 25, (2006): 305–319.

[56]. Biswas, R. “An Application of Fuzzy Sets in Students’ Evaluation.” Fuzzy sets and System, 74, no. 2 (1995): 187-194.

62

[57]. Law, C.K. “Using Fuzzy Numbers in Educational Grading System”, Fuzzy Sets and System, 83, (1996): 311–323.

[58]. Chen, S.M., and C.H. Lee. “New Methods for Students’ Evaluation Using Fuzzy Sets.” Fuzzy Sets System, 104, (1999): 209–218.

[59]. Zadeh, L. A. Fuzzy Logic: Advanced Concepts and Structures. IEEE, Piscataway, New York, (1992).

[60]. Kirkpatrick and Wheeler (1992). Physic. A World View, Saunders, (1992). [61]. Kosko, B. “Fuzzy Systems as Universal Approximators.” In Proceeding of

First IEEE Conference on Fuzzy Systems. San Diego, March, 1992, 1153-1162.

[62]. Kawarada, H., and H. Suito (1996). Fuzzy Optimization Method. Institute of Computational Fluid Dynamics, Chiba University, Japan, (1996).

[63]. Zadeh, L. A. “Fuzzy Sets.” Information and Control, 8, no. 3 (1965): 338- 354.

[64]. Zadeh, L. A. “A New Approach to System Analysis.” Man and Computer, Amsterdam: North-Holland, (1972): 55-94.

[65]. Cheeseman, P. Probabilistic versus Fuzzy Reasoning. Uncertainty in Artificial Intelligence. Elsevier Science Publishers, Amsterdam, Netherlands, (1986).

[66]. Kosko, B. Fuzzy Thinking: The New Science of Fuzzy Logic. Art House, (1993).

[67]. Zadeh, L. A. “Soft Computing and Fuzzy Logic.” IEEE Software, 11, No. 6, (1994): 48-56.

[68]. Jang, J.-S.R., C.-T. Sun, and E. Mizutani. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. United State of America. Prentic Hall, (1997).

[69]. Jang, J,-S.R., and C.-T. Sun. “Neuro-Fuzzy Modeling and Control.” The Proceeding of the IEEE. 83, no. 3 (1995): 378-406.

[70]. Takagi, T., and M. Sugeno. “Fuzzy Identification of Systems and Its Application to Modeling and Control.” IEEE Transactions on Systems, Man, and Cybernetics, 15, (1985):116-132.

[71]. Kosko, B. Neural Networks and Fuzzy Systems: A Dynamical Systems Approach. Eaglewood Cliffs, N.J.: Prentice Hall, (1991).

[72]. Jang, J.-S.R. “ANFIS: Adaptive Network-based Inference System.” IEEE Transactions on Systems, Man and Cybernetics, 23, no. 3 (1993): 665-685.

[73]. McCulloch, W.S., and W. Pitts. (1943). “A Logical Calculus of the Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics, 5, (1943): 115-133.

[74]. Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the theory of Brain Mechanism. Washington DC: Spartan, (1962).

63

[75]. Rumelhart, D. E., G.E. Hinton, and R.J. Williams. “Learning Internal Representations by Error Propagation.” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, (1986): 318-362.

[76]. Haykin, S. Neural Networks-A Comprehensive Foundation. Macmillan College Publishing Company, New York, (1994).

[77]. Kosko, B. “Fuzziness vs. Probability.” International Journal of General Systems, 17, no. 2-3 (1990): 211-240.

chapter 2 literature review -...

Documents