pedro strecht joão mendes moreira carlos soares - … · pedro strecht joão mendes moreira carlos...

21
Pedro Strecht João Mendes Moreira Carlos Soares EUNIS B USINESS I NTELLIGENCE C ONFERENCE M ARCH 6 TH - 7 TH 2014 P ARIS

Upload: phungque

Post on 06-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

Pedro Strecht

João Mendes Moreira

Carlos Soares

EUNIS BUSINESS INTELLIGENCE CONFERENCE

MARCH 6TH-7TH 2014

PARIS

Page 2: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

2

Summary

Data Analysis in Education

… at the University of Porto

An illustrative example of an EDM task

Conclusions and Future work

Page 3: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

3

Data Analysis in Education

For a few decades higher education institutions manage their data using University Information Systems (UIS)

The growing adoption of UIS allowed research to move towards automatic knowledge discovery from academic databases

Over the past 10 years there has been an increase on research using data mining techniques to discover phenomena in the data

An example of application of data mining is:

Predicting the success or failure of student enrolled in a course

Learning the reasons behind it

Page 4: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

4

University of Porto

Founded in 1911

14 faculties, 1 business school

~700 study programs

~32 000 students, ~2 000 teachers and researchers, ~1 800 administrative staff

University Information Systems began being developed in-house and explored since 1992

The SIGARRA system had a major improvement in 2012 which prompts the University to improve their processes using BI and DM

Page 5: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

5

Data Analysis in EducationEducational big data

Academic information

Teaching and learning environments

Others sources

Administrative data

Pedagogical data

Other

types

of data

Page 6: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

Focus Main goalsEducational

data

6

Data Analysis in EducationOverview of research

Educational Data Mining

Learning Analytics

Information

science

Pedagogical data

PsychologySociology

Statistics Data miningMachine

learning

Administrative data

• Inform education practice

• Discover phenomena in data

• Influence education practice

• Adapt teaching to students

Automated discovery

Other

types

of data Interpretation of data

Page 7: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

Predictive analysis

(aka EDM)

Educational data

7

Educational DM & Learning Analytics at U.Porto: general perspective

Pedagogical data

Administrative data

Other

types

of data

Data Warehouse

Descriptive analysis

(aka LA)

Statistical data analysis

Cluster analysis

Exploratory data analysis

Business Intelligence

Social Network Analysis

Feature

engineering

Improve Processes

Thwart

attrition

Adapt student

tutoring

Identify research

collaboration opportunities

Adapt management

workflow

Classification

Regression

Anomaly detection

Pattern mining

Text mining

Page 8: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

8

Learning Analytics at U.Porto: current work

Descriptive analysis

(aka LA)

0

5

10

15

20

AbelSilva

CarlosVaz

IsabelLuís

Analysis

Algebra

OLAP cube Education

...

Front-ends such as• Pivot tables• Report tools

OLAP cube Finance

Educational data Pedagogical data

Administrative data

Data Warehouse

Page 9: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

Predictive analysis

(aka EDM)

9

Educational DM at U.Porto: current work

Data Warehouse

Pedro Strecht

(Ph.D. thesis)

Curriculum

mining

Social

Network

Mining

Student

attrition

models

Educational data Pedagogical data

Administrative data

Other

types

of data

Improve Processes

Discovery of students’

attrition factors

Comparative study on

differences in factors of attrition

Development of strategies

to thwart attrition

Student

attrition

models

Feature

engineering

Luís Cruz

(Ph.D. thesis)

Curriculum

mining

Living

Analytics

Descriptive

analysis

(aka LA)

Page 10: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

10

Summary

Data Analysis in Education

… at the University of Porto

An illustrative example of an EDM task

Conclusions and Future work

Page 11: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

11

An illustrative example of an EDM task

Models’performance

measures

②Classifiers

training and model analysis

③Classifiers evaluation

Variables’importance measures

①Data

extractionAcademicdatabase

Courses data set

System to predict if a student will pass or fail a course

Using administrative data from UIS

Three different processes

Page 12: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

12

Data extraction

Group Variable

Socio-demographic information

Age

Sex

Marital status

Nationality

Displaced

Scholarship

Special needs

Admission information Type of admission

Enrollment information

Type of student

Status of student

Years of enrollment

Delayed courses

Type of dedication

Financial information Debt situation

14 variables extracted relating to each student

8 courses were selected

Page 13: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

13

Data extraction

Data set sample for course Mathematics II

Age

Sex

Mar

ital

sta

tus

Nat

ion

alit

y

Dis

pla

ced

Sch

ola

rsh

ip

Spe

cial

ne

ed

s

Typ

e o

f ad

mis

sio

n

Typ

e o

f st

ud

en

t

Stat

us

of

stu

de

nt

Year

s o

f e

nro

llme

nt

De

laye

d c

ou

rse

s

Typ

e o

f d

ed

icat

ion

De

bt

situ

atio

n

Ap

pro

val

18 m s pt y n n r r o 0 0 f n n

32 m m pt n n n tcs r o 8 12 p n n

18 f s pt y n n r r o 0 0 f n y

18 m s pt n n n r r o 0 0 f n y

22 m s br n n n to r o 1 0 f n y

Page 14: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

14

Classifiers training and model analysisExperimental setup

for each course

Decision tree model

Course data set

Variables’ importancemeasures

Modelanalysis

Classifiertraining

Models’performance

measures

②Classifiers

training and model analysis

③Classifiers evaluation

Variables’importance measures

①Data

extractionAcademicdatabase

Courses data set

Page 15: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

15

Classifiers predict categorical class labels

Students are classified as either having as either having passed or failed

Example of decision tree for course Mathematics II:

Classifiers training and model analysisClassifier training

>4

o23,cog,mce,tcs

palop,re,s,t

n: 60 n: 34

≤4

r

≤2

>2

≤21

y: 3 n: 14y: 350

>21

y: 15

delayed courses

type of admission

years of enrollment

age

Page 16: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

16

Variables’ importance measure for each course

Classifiers training and model analysisPreliminary results

Course #P

Ip (%)

Age

Sex

Mar

ital

sta

tus

Nat

ion

alit

y

Dis

pla

ced

Sch

ola

rsh

ip

Spe

cial

ne

ed

s

Typ

e o

f ad

mis

sio

n

Typ

e o

f st

ud

en

t

Stat

us

of

stu

de

nt

Year

s o

f e

nro

llme

nt

De

laye

d c

ou

rse

s

Typ

e o

f d

ed

icat

ion

De

bt

situ

atio

n

Economic History 1 100.0

Organic Chemistry II 3 11.5 72.1 100.0

Neuroanatomy 2 11.8 100.0

Marketing 1 100.0

Anatomy I 1 100.0

Anatomy II 4 36.7 20.9 18.4 100.0

Mathematics II 4 76.4 87.4 79.6 100.0

Introduction to Linear Signals and Systems

3 100.0 93.1 83.9

Page 17: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

17

Classifiers evaluationExperimental setup

for each course

……

Evaluation 1

Classifier evaluation

Course data set

Averaging

Model performance

measure

Performancemeasures 1

Performancemeasures 2

Performancemeasures 10

10-fold partitioning

Decision tree model

Evaluation 2

Classifiertraining

Classifier evaluation

Decision tree model

Evaluation 10

Classifiertraining

Classifier evaluation

stratified 10-fold cross validation

Models’performance

measures

②Classifiers

training and model analysis

③Classifiers evaluation

Variables’importance measures

①Data

extractionAcademicdatabase

Courses data set

Classifiertraining Decision tree model

Page 18: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

18

Model performance for each course (10 experiments)

Classifiers evaluationPerformance results

CourseNumber of examples

Category distribution (%) F1(avg ± std.dev)y n

Economic History 656 72 28 0.83 ± 0.003

Organic Chemistry II 562 21 79 0.10 ± 0.030

Neuroanatomy 542 94 6 0.96 ± 0.001

Marketing 519 90 10 0.95 ± 0.002

Anatomy I 518 73 27 0.85 ± 0.003

Anatomy II 477 73 27 0.84 ± 0.004

Mathematics II 476 61 39 0.78 ± 0.005

Introduction to Linear Signals and Systems

475 55 45 0.71 ± 0.099

Page 19: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

19

Conclusions

There is a global effort of University of Porto to improve their processes using

BI and DM

This work presents the preliminary experiments on Educational Data Mining

Using administrative data

Collecting 14 variables from students enrolled in 8 courses

Interpreting results from decision tree models

Results indicate that

Decision trees are quite different from one another

Delayed courses is the most important variable

Will this pattern hold if more courses are used?

Model performance is quite acceptable overall

Page 20: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

20

Future work

Study the reasons for the variability of variables in each course

Alternatives to combine decision trees into

a single consensual tree

small set of trees

that represent the general knowledge about the success/failure behavior

across all the University

Although the focus is on EDM, such an approach will be interesting for other

areas of application

Page 21: Pedro Strecht João Mendes Moreira Carlos Soares - … · Pedro Strecht João Mendes Moreira Carlos Soares ... The SIGARRA system had a major improvement in 2012 which prompts the

21

Questions