student performance prediction using svm · 2017-11-28 · srinivasa ramanujan center, kumbakonam,...

14
http://www.iaeme.com/IJMET/index.asp 649 [email protected] International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 11, November 2017, pp. 649662, Article ID: IJMET_08_11_066 Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11 ISSN Print: 0976-6340 and ISSN Online: 0976-6359 © IAEME Publication Scopus Indexed STUDENT PERFORMANCE PREDICTION USING SVM K. B. Eashwar, R. Venkatesan Assistant Professor, CSE, SASTRA University, Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining is one of the essential and inevitable processes that were conducted in the vast environment of education. Initially, the methods of data mining were applied in this area with limited number of parameters, because of the poor maintenance of student records in the respective institutions. After the sudden leap of the technology in various aspects of human life, this field’s dimension has been changed. Institutions have been equipped highly efficient technical components to maintain the data. Now-a-days the amounts of data that are stored in Educational Data Base (EDB) are increasing rapidly. At any time, the following problems are faced by institutions: poor performance of the student, leaving the programme in- between due to the complexity of the curriculum, financial problems, psychological problems, lack of support from the parent side, etc,. But in this work, the concentration is focused on Post-Graduation students. Because, now-a-days the research status at PG level is low compared to other parts of the world. To increase the standard, as well as, to identify the selected individual’s state, whether he/she is to be given additional care or to sharpen his/her research abilities. According to the report of MHRD of India, among the Asian countries the contribution of India is 25%, which is low compared to other parts the world. Also, in India, the percent of students moving from Master Degree to Ph. D level is 0.3%, which is highly low. The prediction process has been concentrated on the student’s performance with various parameters and the students were categorized as high, medium, and low. For this process, we combined two techniques: (i) for classification Support Vector Machine (SVM), and (ii) for clustering K-means algorithm. Keywords: EDB, SVM, k-Means, Classification, Clustering, MMH, Hyperplane Cite this Article: K. B. Eashwar, R. Venkatesan and D. Ganesh, Student Performance Prediction Using Svm, International Journal of Mechanical Engineering and Technology 8(11), 2017, pp. 649662. http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

http://www.iaeme.com/IJMET/index.asp 649 [email protected]

International Journal of Mechanical Engineering and Technology (IJMET)

Volume 8, Issue 11, November 2017, pp. 649–662, Article ID: IJMET_08_11_066

Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11

ISSN Print: 0976-6340 and ISSN Online: 0976-6359

© IAEME Publication Scopus Indexed

STUDENT PERFORMANCE PREDICTION

USING SVM

K. B. Eashwar, R. Venkatesan

Assistant Professor, CSE, SASTRA University,

Srinivasa Ramanujan Center, Kumbakonam, India

D. Ganesh

Final year student M.Sc Computer Science,

ABSTRACT

Education Mining is one of the essential and inevitable processes that were

conducted in the vast environment of education. Initially, the methods of data mining

were applied in this area with limited number of parameters, because of the poor

maintenance of student records in the respective institutions. After the sudden leap of

the technology in various aspects of human life, this field’s dimension has been

changed. Institutions have been equipped highly efficient technical components to

maintain the data. Now-a-days the amounts of data that are stored in Educational

Data Base (EDB) are increasing rapidly. At any time, the following problems are

faced by institutions: poor performance of the student, leaving the programme in-

between due to the complexity of the curriculum, financial problems, psychological

problems, lack of support from the parent side, etc,. But in this work, the

concentration is focused on Post-Graduation students. Because, now-a-days the

research status at PG level is low compared to other parts of the world. To increase

the standard, as well as, to identify the selected individual’s state, whether he/she is to

be given additional care or to sharpen his/her research abilities. According to the

report of MHRD of India, among the Asian countries the contribution of India is 25%,

which is low compared to other parts the world. Also, in India, the percent of students

moving from Master Degree to Ph. D level is 0.3%, which is highly low. The

prediction process has been concentrated on the student’s performance with various

parameters and the students were categorized as high, medium, and low. For this

process, we combined two techniques: (i) for classification – Support Vector Machine

(SVM), and (ii) for clustering – K-means algorithm.

Keywords: EDB, SVM, k-Means, Classification, Clustering, MMH, Hyperplane

Cite this Article: K. B. Eashwar, R. Venkatesan and D. Ganesh, Student Performance

Prediction Using Svm, International Journal of Mechanical Engineering and

Technology 8(11), 2017, pp. 649–662.

http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11

Page 2: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 650 [email protected]

1. INTRODUCTION

Data mining assumes a vital part in the business world and it is additionally being utilized as a

part of the academics to foresee and make choices identified with the students' proficiency

status. In modern days, the measure of information that are put away in Educational Database

(EDB) is expanding quickly. A student‟s scholastic execution is impacted by many

variables/elements. It is fundamental to create prescient information digging model for

foreseeing students' execution, in order to distinguish the contrast between high learners and

moderate-level learners among the group of students. The information sources are gathered

from pupils, with the goal that the data are organized, and experimented on the right direction.

Based on the results the student details were classified. K-means Clustering algorithm has

been used (based on academic records). As a result, we will have n-number of student

records, on which are Support Vector Classification (SVM) was applied and the prediction

model was constructed.

2. BACKGROUND STUDY

Overlade et al., K-means clustering algorithm is a simple and efficient technique to monitor

the progression of a student in his/her performance state higher education. Based on the

student‟s score they are grouped into various clusters (using k-means, fuzzy c-means etc),

where each clusters denoting the different level of performance (High, Moderate, Low). By

knowing the number of students in each cluster we are able to know the average performance

of a class as a whole[1]. According to Carlos Villagrá-Arnedo et al., the system used Random

Forests technique to model data coming from previous years of Standard Admission Test

(SAT). While their results are good, the method designed was neither thought to be

maintained over time, nor to do progressive predictions based on incremental information [2].

Dr. Saurabh Pal et al., suggested that the Naïve Bayes Classifier technique is particularly

suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can

often out perform more sophisticated classification methods. Naïve Bayes model identifies the

characteristics of dropout students. It shows the probability of each input attribute for the

predictable state. A Naive Bayesian classifier is a simple probabilistic classifier based on

applying Bayesian theorem (from Bayesian statistics) with strong (naive) independence

assumptions [3]. But for dealing with continuous-valued attributes we need to do Gaussian

distribution, which in turn increases the complexity of the task. Also these continuous-valued

attributes may lead to inaccuracy in assumptions.

M. Ramaswami and R. Bhaskaran et al., proposed their main objective of feature selection

as, to choose a subset of input variables by eliminating features, which are irrelevant or not

possessing predictive information [4]. Guodong Zhao et al., expressed their interest in various

feature selection methods, which are roughly divided into three types: (i) embedded, (ii)

wrapper and (iii) filter methods. Embedded and wrapper methods are classifier-dependent,

which evaluate the features using a learning algorithm [5].

Mrinal Pandey et al., offered filtering methods: The filter methods are generally sub

divided into two classes: Ranking and Subset selection. The methodology adopted for this

research starts with the data collection followed by initial pre-processing, attribute selection

and balancing the class [6]. Zhiyun Ren et al., focused on two methods to predict student

performance, namely, Regression- based Methods and Matrix factorization-based Methods.

These two methods use student-academic-records (especially course grades) [7].

Chih-Wei Hsu et al., proposed SVM (Support Vector Machines) which is a very useful

technique for data classification. Although SVM is considered easier to use, than Neural

Networks, users not familiar with it often get unsatisfactory results at first. SVMs (Support

Vector Machines) are a useful technique for data classification [8]. J.K. Jothi Kalpana et al.,

Page 3: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

Student Performance Prediction Using Svm

http://www.iaeme.com/IJMET/index.asp 651 [email protected]

compelled that main objective of higher education institutes should be, to provide quality

education to its students and to improve the quality of managerial decision-making skill of a

student. One way to achieve the highest level of quality in higher education system is, by

discovering knowledge from educational data to study the main attributes that may affect the

students „performance. By utilizing these methods, numerous types of learning can be found,

for example, K-means and Gaussian. Mikko Vinni et al., suggests that the problem contains

many uncertain or unknown factors that we cannot classify the students deterministically into

two mutual classes. Rather, we should use additional class values (e.g. the mastering level is

good, average, or poor) or estimate the class probabilities. The student profile contains too

many attributes for building accurate classifiers, and we should select only the most

influencing factors for predicting purposes [10].

Pooja Thakar et al., In this paper the authors used to generalize the data mining methods.

Data mining methods such as preprocessing, feature selection, clustering, classification can be

combined. So, finally it may be easy to show the predicted results via graphs and in consistent

manner [12]. P. Usha et al., this paper deals with Support Vector Machines, Decision Tree,

feature extraction and selection, genetic algorithm. These are the major methods that are used

in predicting the student performance using multiple classifiers. So, the main objective of the

work is to achieve the higher level of accuracy in predicting the student(s)‟ performance [13].

Hector M. Romero Ugalde et al., - Neural networks are suitable for modeling complex

nonlinear systems when we consider the plant as a black-box. If low order reference models

are used, the number of parameters to be computed will be rather small to address this

problem many works are to be done to derive balanced accuracy, complexity and

computational cost models [15].

2.1. Scope of related work

This paper is intended to concentrate on Post-Graduation students‟ performance level. In

order to expose the real situation at higher studies, the data have been collected only from

those set of students.

3. METHOD OF PREDICTION

The proposed method for predicting the performance of the student has been done in two

phases, namely (i) training phase, and (ii) testing phase. At the initial stage of the work,

certain preliminary tasks have been performed related to the work. The dataset was collected

from the students both at juniors and senior levels of Post-Graduation programmes.

Questionnaires were prepared under the details that are part of the profile of each student and

with selected variables such as consolidated internal mark in each subject, CGPA (Cumulative

Grade Point Average), Father‟s financial income, mother‟s income (in the absence of father,

at any case), whether the student has opted this programme under interest or stress, sports

activity, living environment, family support for his/her studies, is he/she working as part-time

student, college/university environment, resources for the programme, regarding friends

circle, showing interest in academic related activities, etc.

This collected information about the students were uploaded. Since, always there exist a

possibility of samples that are sometimes don‟t satisfy the maximum number of criteria of

values with their respective variables or sometimes they may have irrelevant values that are

not suitable to present conditions of the work. Those data samples are considered as irrelevant

and should be omitted always for the prediction of student performance for a whole class. So

pre-processing of data samples was done on them.

Page 4: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 652 [email protected]

3.1. Preprocessing

Data pre-processing is an important step in the data mining process. If there is much irrelevant

and redundant information exist or noise and unreliable data, exist in the data samples, cause

the knowledge discovery at the training phase, to generate unexpected results. This step

removes noise words such as stemming words. By pre-processing the student data, applying

suitable data mining method the dataset were cleansed thus resulting in expected knowledge.

The discovered knowledge is used to provide constructive recommendation to overcome the

problem of low grades of graduate students and to improve student‟s academic performance at

the earliest. Pre-handling jobs were applied to set up all the already portrayed information so

the grouping assignment could be completed effectively. During this procedure those pupils

without 100% finish data will be considered that they are out of the process. A few changes

were likewise made to the estimations of a few properties.

For example: Name, Register No, Contact, Address, Email, Marks1,2,3,4., Personal

details, etc.

3.2. Feature Selection

Feature extraction starts from an initial set of measured data and builds derived values

(features) intended to be informative and non-redundant, facilitating the subsequent learning

and generalization steps, and in some cases leading to better human interpretations. Feature

selection algorithms can be used to identify which feature has the greatest effect on the output

variable (academic status). The objective is to resolve the problem of high dimensional data

by reducing the number of attributes without losing reliability in classification. Feature

selection techniques have been used to choose a subset of variables and eliminate others that

could be irrelevant or of no predictive information and therefore could prevent the classifiers

from reaching a good accuracy. A feature selection process can also be used to remove terms

in the training documents that are statistically uncorrelated with the class labels. This will

reduce the set of terms to be used in classification, thus improving both efficiency and

accuracy.

For example: having interest in social activities (Yes or No), showing involvement project

(Yes or No), his/her non-involvement in institution functions like terms.

3.3. Clustering

Clustering is a procedure of arranging information (or items) into an order/topology of

important sub-classes, called groups. This part causes the clients to comprehend the

characteristic gathering or structure in an informational collection and clustering calculation

(k-Means) can be actualized to state the highlights This algorithm clusters the data

into k groups (student groups), where k is predefined and selects k points at random as cluster

centers (total of obtained examination results by the student). To assign objects to their closest

cluster center according to the Euclidean distance function. Finally, it calculates the centroid

or mean of all objects (student samples) in each cluster.

For example: TOT_INT1, TOT_INT2, TOT_INT3, TOT_INT4, TOT_INT (internal

marks for various subjects) ADS_LAB, TOT_INT for JAVA_LAB (sample courses).

3.4. Classification

In this step, the classification algorithm (SVM) was implemented to classify the data, and

finally the status of students will be provided. Using the SVM algorithm test data and training

data records were collected from the previous process and were used for the classification.

The data were collected from the database. Some of the training record-set were also used for

testing. The SVM classifier function has been applied over the data sets and the classification

Page 5: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

Student Performance Prediction Using Svm

http://www.iaeme.com/IJMET/index.asp 653 [email protected]

details were obtained. These details from the classification were further analysed to produce

the results and the accuracy of the prediction was found for the overall functionality. Finally,

the students were categorized, based on given conditions.

For example : Internal Marks for individual theory courses, Internal Lab Marks, whether

he/she involves in sports activities, department activities, experiencing any psychological

pressure from college environment or family environment, etc.,.

The above mentioned steps are shown in the following architecture diagram:

Figure 1 Phases of Prediction System

3.5. Algorithms

As we mentioned in the previous part, the entire work was accomplished by using two

techniques namely, clustering and classification. To implement these techniques two

algorithms were used. K-Means algorithm was used for clustering and Support Vector

Machine (SVM) for classification process. Now those two algorithms are explained as

follows:

3.5.1. Algorithm 1: K-Means (Clustering):

If we want to give the literal meaning for cluster, then this is one of the formal definitions for

clustering. The process of grouping a set of physical or abstract objects into classes of similar

objects is called clustering. A cluster is an accumulation of information protests that are like

each other inside a similar group and are unlike the items in different groups. A cluster of

information items can be dealt with on the whole as one gathering thus might be considered as

a type of data compression

These objects are going to be organized ask number of clusters (k< n) or (k=n). Cluster

similarity is measured in regard to the mean value of the objects in a cluster, which can be

viewed as the cluster‟s centroid or center of gravity.

The k-means algorithm continues as takes after. In the first place, it arbitrarily chooses k of

the articles, each of which at first speaks to a group mean or focus. For each of the remaining

objects, an object is assigned to the cluster to which it is the most similar, based on the

distance between the object and the cluster mean. It then computes the new mean for each

cluster. This process iterates until the criterion function converges. For each object in each

cluster, the distance from the object to its cluster center is squared, and the distances are

summed. This criterion tries to make the resulting k clusters as compact and as separate as

possible. The square-error criterion is used, defined as

Page 6: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 654 [email protected]

)1(

2

1

k

i Cipi

mp

Where,

E - Sum of the square error for all objects in the data set

p - Point in space representing a given object

mi- mean of cluster Ci

Ci – clusters

3.5.1.1. Working way of K-Means algorithm:

According to our problem, we have predicted the student performance into three categories.

So let us assume that k=3, the students are partitioned into three clusters. Initially, we

arbitrarily choose three students‟ performance state (for high, medium, low) as three initial

cluster centers. The following Figure 2.0 helps us to understand how K-Means algorithm is

useful in partitioning the group of students. In this diagram, these three cluster centers are

denoted as „+‟. Each object is distributed to a cluster based on the cluster center to which it is

the nearest. Such a distribution forms clusters encircled by dotted curves, as shown in Figure

2.0 (a).

Next, the cluster centers (performance state) are updated. That is, the mean value

(performance state) of each cluster is recalculated based on the current objects in the cluster.

Using the new cluster centers, the objects are redistributed to the clusters based on which

cluster center is the nearest. Such redistribution forms new student cluster groups are

encircled by dashed curves, as shown in Figure 2.0 (b).

This process iterates, leading to Figure 2.0 (c). The process of iteratively reassigning

objects to clusters to improve the partitioning is referred to as iterative relocation. Eventually,

no redistribution of the objects in any cluster occurs, and so the process terminates. The

resulting student clusters are returned by the clustering process [Jiawei Han, Micheline

Kamber].

Figure 2 The k- Means partitioning algorithm.

Now the results that were obtained in clustering phase should be forwarded to

classification phase. That phase is going to be discussed in the next topic.

3.5.2. Algorithm-2 Support Vector Machine(SVM) (Classification):

It‟s a promising method for classifying both linear and non-linear data. It uses the non-linear

mapping to transform the original training data into higher dimension. Since, we defined each

individual student with multiple variables, and then each one of them are termed as multi-

dimensional objects. Within this new dimension, it searches for the linear optimal separating

hyperplane (that is, a “decision boundary” separating the students of one class from another).

Data from two classes can always be separated by a hyperplane (H1 and H2). The SVM

discovers this hyperplane utilizing support vectors ("critical" preparing information samples)

and edges (Large edge and Small edge, which is characterized by help vectors.

Page 7: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

Student Performance Prediction Using Svm

http://www.iaeme.com/IJMET/index.asp 655 [email protected]

Advantages:

Many data analysts argue, that this method performs very slowly during its training phase; but

its accuracy is very high, especially for small amount of support vectors, which are

independent of higher dimension of the objects. Thus, we can say that to classify the meagre

amount of training samples with numerous parameters SVM is the highly suitable method.

When we compare every selected parameters of one student with other students to predict the

performance category on which those students are falling, we can use non-linear approach. It

is much less prone to over-fitting than other methods.

Since, we already clustered the students under three different performance measures; it

was easy to classify the groups further, with n number of constraints.

3.5.2.1. SVM- Linearly Separable:

So far, the existing approaches concentrated on classifying the sample data set into two major

classifications either (“yes” and “no”) or (“1” and “0”). Because, they use linear type of

samples. So when the dimension of the objects (students) was increased the SVM would go

for maximum marginal hyperplane (MMH). There are infinite number of separation lines

among the objects. Among them, the best separator has to be found (with minimum

classification error).

The training samples that are fallen on the HyperPlane (HP), are known as Support

Vectors. Actually they are close to MMH. So SVM is best suitable for small number of data,

which are less than 2000 training set, to classify them. Any software package can be used to

find support vectors and MMH. The above mentioned reason is the one behind the selection

of SVM for this work with the sample count of one hundred students.

Lagrangian formulation is used to define MMH and it can be referred in [16]

bXXyXT

ii

l

ii

Td

01

)(

(2)

Where,

1. yi is the class label of support vector Xi

2. XT

is a test tuple;

3. αi and b0 are numeric parameters that were determined automatically by the

optimization or SVM algorithm. and

4. l is the number of support vectors.

As already mentioned when we use SVM, dimension of the student is not important, only

number of Support Vectors is important. So, in short, the work is independent of data

dimension. Even we can produce a good generalized result when the dimensionality of data

sample is high.

3.5.2.2. SVM-Linearly Inseparable:

We cannot always expect that we always have a type of yes or no classes. Sometimes we may

need to classify the samples with more than two in count. In our work we obtained three

classes of students that were classified with high, medium, and low performance states, which

is actually a non-linear in type. It is accomplished in two steps: (i) the original input data is to

be transformed into higher dimensional space, (ii) In that space, a search is to be conducted

for linearly separating hyperplane. But this process is costly and also it is difficult to find a

non-linear mapping to a higher dimensional space. So we have to use a kernel function to the

original input data. Also an user-specified upper bound is used, like the maximum value for

Page 8: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 656 [email protected]

the internal marks and CGPA (the two essential parameters to predict the performance of the

students).

Advantages of SVM in non-linear environment:

Usually, SVM goes for global solution. But other approaches like neural networks, produce

local minima in their solution. Another important advantage of SVM is, it is well-suited for

multi-class case, where the classification has to obtain the result with more than two classes.

Thus, it was suggested to go ahead with the student performance prediction process [Jiawei

Han, Micheline Kamber].

4. EXPERIMENTS AND RESULTS

The work we have taken is to identify the students who are at the edge of the performance

level in a Post Graduate programme. For that purpose, we collected the input data from the

students of both first year and second or final year students. Since, the number of students are

always less than the Under Graduate level, the availed data samples count was also limited to

one hundred students. The students‟ personal information were collected from their database.

The attendance status was also collected from the same student database. Because, this

parameter is playing a vital role in determining the performance of a student. Every

institution/university has made a constraint that every one of its student should attend at least

the minimum of a margin of tutorial hours for every subject. Suppose a student is not

attending the classes not in a regular manner, because of high level of involvement in other

academic activities. With high probability, this low state of attendance of a student will affect

his/her academic performance. So they should be warned by their respective tutors to

maintain their attendance state in “safe” zone. These details were collected from the students

through questionnaires. The format of the questionnaire is given below. These data are the

input to the system.

1) Model of a Questionnaire:

Your Name:

Father‟s Name:

Father‟s Occupation:

Mother‟s Name:

Mother‟s Occupation:

ADDRESS:

DOB:

Email id:

Gender:

Age:

Marital status:

PG Course:

Year of Passing (PG):

College (PG):

UG Course:

Year of Passing (UG):

College (UG):

SGPA:

Page 9: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

Student Performance Prediction Using Svm

http://www.iaeme.com/IJMET/index.asp 657 [email protected]

CGPA:

Arrear? Yes No

From the input of data only a selected set of samples were selected through pre-processing

step. Here, the variables that are not relevant for the prediction are not considered usually. For

example, DOB, E-mail ID, since they are not part of any equations for partitioning and

classification. Sometimes the collected data values may not suitable or not been provided. All

these type of samples are to be neglected for consideration. Lot of parameters may improve

the performance state of the student indirectly, like parents are educated. But they are playing

a very little role in the part. So they too omitted.

Figure 3 List of phases of the tool

The processes should be executed in chronological order as given in Figure 3.0. The pre-

processing step as mentioned already removed the noisy and irrelevant data samples, which

were not fully equipped. It is shown in Figure 4.0. The filtered samples are now ready for

clustering. Before clustering get started the features of the data samples have to be selected.

The reason behind this action is, we decided that certain parameters are causing the impact

directly on the performance of the student. And that will ensure the expected accuracy level

from the system. The feature selection process is shown in Figure 5.0.

Figure 4 Pre-processing stage.

Page 10: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 658 [email protected]

Figure 5 Feature Selection

Once the feature selection process was done the data samples were sent to the clustering

phase. Here, the center of a group was selected randomly (though) from CGPA, Consolidated

Internal Marks from each subject, Attendance Percentage of a student. These three values are

important in deciding the attitude of a student. To calculate the center we used Euclidean

distance. For an n-dimensional space, the distance is defined as

)()22()21(222

....),( qnpnppppqpd

d ( p , q ) = ( p 1 − q 1 ) 2 + ( p 2 − q 2 ) 2 + ⋯ + ( p i − q i ) 2 + ⋯ + ( p n − q n ) 2 .

{\displaystyle d(p,q)={\sqrt {(p_{1}-q_{1})^{2}+(p_{2}-q_{2})^{2}+\cdots +(p_{i}-

q_{i})^{2}+\cdots +(p_{n}-q_{n})^{2}}}.} (3)

Where p and q are the points to be taken to find the similarity and dissimilarity

Figure 6 Clustering the samples

Then the clustered data samples were passed on to SVM classifier. There additional

information is added to the features of the samples. We considered the parameters that would

affect the student‟s present state in the partitioning. Sometimes the student who has been

considered normal could be moved to high, because of his overall performance in all areas of

the academic programme. Here, parameters such as participating in sports, social activities,

department activities were used and be checked how the students were performed in

examinations, though they participated in these kind of activities . Suppose, for assumption,

a student who did well in sports, NCC, cultural activities, also obtained 7.0 or above that level

in CGPA in the sense, then he/she would be moved to HIGH class though there are other

students who got more CGPA than this student. The SVM classification process is exhibited

in Figure 7.0.

Page 11: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

Student Performance Prediction Using Svm

http://www.iaeme.com/IJMET/index.asp 659 [email protected]

Figure 7 SVM Classification

5. ANALYSIS

When the prediction has been made on a student or on a single class performance, it should

not deviate from the original score of accuracy. The tool, which has been designed for this

work predicted the results with 96.7% accuracy. It matches with the overall assessment done

by the tutors of both the classes. The reason behind this is the change in the count of HIGH,

NORMAL, LOW groups once they were classified with SVM. For example, the following

graphs in Figure 8.0., Figure 9.0., show that the juniors were having a good count than the

seniors in HIGH class. Because more number of students were participating in academic

activities as well as they have obtained good results in their examinations.

Figure 8 Results of Classifier on Seniors

Figure 9 Results of Classifier on Juniors

Page 12: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 660 [email protected]

As mentioned in the previous topic, if a student performs well in sports, then

automatically his/her performance in other activities and academics would be compromised.

The following graphs in Figure 10.0, Figure 11.0, Figure 12.0 show the results of student‟s

individual achievement.

Figure 10 Student under LOW classification

Figure 11 Student under Normal classification

Once the training phase was over, we tested the tool with a sample data and the results are

shown in the following graph (Figure 12.0):

Figure 12 Result of Testing sample

Page 13: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

Student Performance Prediction Using Svm

http://www.iaeme.com/IJMET/index.asp 661 [email protected]

Both individual and class-wise data samples were taken and tested against the tool and

results were matched with the original data set and compared.

6. CONCLUSIONS

The data samples were collected from the tutors and students directly and they were fed into

the system for partitioning and classification. We implemented the rules that are defined in

SVM algorithm to predict the students‟ final grade. The results were obtained and checked

with the original set of data. Some of the samples were omitted because of irrelevant answers

and non-filled state. This may even lead to reduction in the accuracy, since the samples rate

was reduced. As the future enhancement, it is planned to add more number of parameters at

clustering level itself. Still we are lacking in finding the reason behind the performance loss of

a student or a class. We can include psychological parameters, social behaviour parameters,

family circumstances, parental care related parameters. These set of parameters would

increase the accuracy and helpful to reason out the lack of performance of students and we

can give a helping hand to them, guide them in the “right” path. Since, it‟s our duty as a

teacher.

ACKNOWLEDGEMENT

We would like to thank all the tutors, students and staff of Department of CSE, SRC,

SASTRA, who gave us support and encouragement to accomplish this work.

REFERENCES

[1] Overlade, O. J,,Oladipupo, Obagbuwa, I. C, Application of k-Means Clustering algorithm

for prediction of Students‟ Academic Performance‟ (IJCSIS) International Journal of

Computer Science and Information Security, Vol. 7, _o. 1, 2010

[2] Carlos Villagrá-Arnedo, Predicting Academic performance from Behavioural and

Learning data, Int. J. of Design & Nature and Ecodynamics. Vol. 11 No. 3 (2016) 239-

249.

[3] Dr. Saurabh Pal, Mining Educational Data Using Classification to Decrease Dropout Rate

of Students, International Journal Of Multidisciplinary Sciences and Engineering, Vol. 3,

No. 5, May 2012.

[4] M. Ramaswami and R. Bhaskaran, A Study on Feature Selection Techniques in

Educational Data Mining, Journal of Computing, Volume 1, Issue 1, December 2009,

ISSN: 2151-9617.

[5] Guodong Zhao and Jing Bai, Effective feature selection using feature vector graph for

classification, Journal of Computing-Science Direct(2015).

[6] Mrinal Pandey, S. Taruna, Towards the integration of multiple classifiers pertaining to the

Student‟s performance prediction, Journal of Computing-Science Direct(2016).

[7] Zhiyun Ren ,George Karypis, Predicting Student Performance Using Personalized

Analytics Predicting Student Performance Using Personalized Analytics, IEEE(April

2016).

[8] Chih-Wei Hsu, Chih-Chung Chang, A Practical Guide to Support Vector Classification,

[9] J.K. Jothi Kalpana, Intellectual Performance Analysis of Students by Using Data Mining

Techniques, International Journal of Innovative Research in Science, Engineering and

Technology Volume 3, Special Issue 3, March 2014.

[10] Wilhelmiina Hamalainen and Mikko Vinni, Comparison of Machine Learning Methods

for Intelligent Tutoring Systems, Springer-Verlag Berlin Heidelberg 2006.

[11] Andreas Jahn and Andreas Zell, Interpreting linear support vector machine models with

heat map molecule coloring, Journal of Cheminformatics 2011.

Page 14: STUDENT PERFORMANCE PREDICTION USING SVM · 2017-11-28 · Srinivasa Ramanujan Center, Kumbakonam, India D. Ganesh Final year student M.Sc Computer Science, ABSTRACT Education Mining

K. B. Eashwar, R. Venkatesan and D. Ganesh

http://www.iaeme.com/IJMET/index.asp 662 [email protected]

[12] Pooja Thakar, Performance Analysis and Prediction in Educational Data Mining: A

Research Travelogue, International Journal of Computer Applications (0975 – 8887)

Volume 110 – No. 15, January 2015.

[13] P. Usha, Predicting Student Performance Using Genetic and SVM Classifier, International

Journal Of Computer Engineering, July-Dec 2011, Volume 3, Number 2, pp. 97-102.

[14] Gabriel Barata and Sandra Gama, Early Prediction of Student Profiles Based on

Performance and Gaming Preferences, IEEE Transactions on Learning Technologies, Vol.

9, No. 3, July-Sep 2016.

[15] Hector M. Romero Ugalde, Computational cost improvement of neural network models in

black box nonlinear system identification, Journal – Science Direct (Neuro Computing)

2015.

[16] Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques”, Second Edition,

Morgan Kaufmann publishers, 2006.

[17] Dr. T. Arumuga Maria Devi and S. Mariammal, SVM Based Performance of IRIS

Detection, Segmentation, Normalization, Classification and Authentication Using

Histogram Morphological Techniques, International Journal of Computer Engineering and

Technology, 7(4), 2016, pp. 1–11.

[18] Dr. M. Renuka Devi and S. Sridevi, Short-Term Wind Power Forecasting and

Predominant Wind Direction Using SVM Kernel Function, International Journal of Civil

Engineering and Technology (IJCIET) Volume 8, Issue 7, July 2017, pp. 256–263.

[19] Sandip S. Patil and Asha P. Chaudhari, Classification of Emotions from Text Using Svm

Based Opinion Mining, International Journal of Computer Engineering & Technology

(IJCET), Volume 3, Issue 1, January- June (2012), pp. 330-338

[20] V.Anandhi and Dr.R.Manicka Chezian, Comparison of the Forecasting Techniques –

Arima, Ann and Svm - A Review, International Journal of Computer Engineering &

Technology (IJCET), Volume 4, Issue 3, May-June (2013), pp. 370-376