scoring with r - free · dominique desbois (2008),\introduction to scoring methods: financial...

$: Scoring with R - Free · Dominique Desbois (2008),\Introduction to Scoring Methods: Financial Problems of Farm Holdings", CS-BIGS, 2(1): 56-76. Objectives: analysis of the causes$
Introduction Preparing the database Exploratoty Data Analysis Logistic Regression

Scoring with RSummer School on Mathematical Methods in Finance and Economy

Hanoi

Thibault LAURENT

Toulouse School of Economics

June 2010 (Slides modified in August 2010)

Thibault LAURENT Toulouse School of Economics

Scoring with R


Introduction

Preparing the database

Exploratoty Data Analysis

Logistic Regression


Scoring with R


Background study

Dominique Desbois (2008), “Introduction to Scoring Methods:Financial Problems of Farm Holdings”, CS-BIGS, 2(1): 56-76.

Objectives: analysis of the causes of farm’s bankruptcy. Find amodel which may identify farms with financial difficulties in orderto prevent them.

Analysis plan:

1. Preparing the database

2. Exploratory data analysis

3. Logistic regression


Scoring with R


Description of the data set

I 1260 farms specialized in field crops

I response variable Y takes the value “failing” (Y = 1) if thefarm failed and “healthy” otherwise (Y = 0)

I explanatory variables X contain informations about thestructure (legal status, type of farming index, agricultural areaused, etc.) and 22 ratios according to the following topics:Capitalization, Weight of the Debt, Liquidity, Debt servicing,Capital profitability, Earnings and Productive activity.

See p. 4 of Desbois (2008) fore more details


Scoring with R


Packages used in this course

You may download (function install.packages) or update(function update.packages) these following packages at thebeginning of your R session:

> install.packages(c("foreign", "xtable", "lattice"))

> install.packages(c("car", "classInt", "ROCR",

+ "BMA"))


Scoring with R


Introduction



Logistic Regression


Scoring with R


Importing the data set

I Download the “desbois.zip” file fromhttp://www.bentley.edu/csbigs/csbigs-v2-n1.cfm

I Unzip the file.

I Import the “desbois.sav”SPSS file in R after loading theforeign package (functions for reading and writing data storedby statistical packages such as Minitab, SAS, Stata, etc.) :> library(foreign)

> farms <- read.spss("desbois.sav", to.data.frame = TRUE)


Scoring with R

http://www.bentley.edu/csbigs/csbigs-v2-n1.cfm


Recoding ?

The main objective of recoding is to obtain a first working versionof the data set:

1. choose the right format of variables,

2. verify if there are missing values,

3. choose short and intuitive names of variable and attributelevels.


Scoring with R


Recoding with R

1. checking the structure of our data set:> str(farms)

2. re-order the levels of the interest variable:> farms$DIFF <- relevel(farms$DIFF, ref = "failing")

3. create a binary variable for the logistic regression:> farms$Y <- factor(ifelse(farms$DIFF == "failing",

+ 1, 0))

4. simplify the levels of some attributes:> levels(farms$STATUS) <- c("company", "proprietorship")

> levels(farms$ToF) <- c("cereals", "gen.cropping",

+ "dairy.farm", "mix.livestock", "var.crops-livestock",

+ "soilless.breed")


Scoring with R


Missing values ?

Is there any missing value in the data set ?

> any(is.na(farms))

[1] FALSE

No Missing values here. If the awnser were YES, possibility tochange the missing values by using imputation techniques (see forexamplehttp://en.wikipedia.org/wiki/Imputation_(statistics))


Scoring with R

http://en.wikipedia.org/wiki/Imputation_(statistics)


Introduction



Logistic Regression


Scoring with R


Exploratory Data Analysis ?

Objectives:

1. obtain some elements of answers to the problem: which arethe causes of bankruptcy of the farms ?

2. detect outliers in observations or collinearity betweenvariables.

3. create new pertinent variables (transforming with log, exp,etc., or crossing some variables, etc).


Scoring with R


Analysis of the data.frame object

farms belongs to a class with common methods (print, plot,summary); the data live in a data.frame, the workhorse datacontainer for analysis in R.

> class(farms)

> summary(farms)

> plot(farms)

Useful function to visualize the data set:

> edit(farms)


Scoring with R


Basic statistics with R

For numeric variable:

> n <- nrow(farms)

> min(farms$r1)

> max(farms$r1)

> mean(farms$r1)

> median(farms$r1)

> quantile(farms$r1)

> sd(farms$r1) == sqrt(var(farms$r1))

> stem(farms$r1)

For attribute variable:

> dis.Y <- table(farms$DIFF)

> margin.table(dis.Y)

> all(prop.table(dis.Y) ==

+ dis.Y/margin.table(dis.Y))

> addmargins(dis.Y)

I Sweness and Kurtosis statistics can be calculated by loadinge1071 package

I the package r2lh provides functionalities to export some Ranalysis in a LATEXformat


Scoring with R


Graphics

Main advantages of using graphics:

I a good summary of the data

I easy to understand and comment

Be careful: graphics may bring some intuitions but comments mustbe confirmed by statistical test! Here some links with R graphics:

I http://addictedtor.free.fr/graphiques/

I http://csg.sph.umich.edu/docs/R/graphics-1.pdf


Scoring with R

http://addictedtor.free.fr/graphiques/

http://csg.sph.umich.edu/docs/R/graphics-1.pdf


Attribute variable analysis: Bar plot

> col.y = colors()[c(641, 615)]

> barplot(dis.Y, main = "Y", col = col.y, space = 0.5)

failing healthy

Y

010

020

030

040

050

060

0

In this study, the number of failing farms is close to the number ofthe healthy farms. colors() returns a vector of the names ofavailable colors in R.


Scoring with R


Attribute variable analysis: Pie Chart

> label.ToF = paste(round(prop.table(table(farms$ToF)),

+ 3) * 100, "%")

> with(farms, pie(table(ToF), main = "Type of Farms",

+ labels = label.ToF, col = heat.colors(6),

+ cex = 0.8))

> legend("bottomleft", legend = levels(farms$ToF),

+ fill = heat.colors(6), cex = 0.7)

26.9 %24.3 %

37.1 %

4 %

6.2 %

1.4 %

Type of Farms

cerealsgen.croppingdairy.farmmix.livestockvar.crops−livestocksoilless.breed


Scoring with R


Numerical variable analysis: boxplot

> boxplot(farms$r2, main = "variable r2", col = "lightgrey")

0.0

0.2

0.4

0.6

0.8

1.0

variable r2

This variable does not seem to contain any outlier...


Scoring with R


Numerical variable analysis: histogram

> plot(density(farms$r3), col = "red", type = "n", main = "")

> hist(farms$r3, breaks = 15, freq = FALSE, col = "royalblue", add = T)

> rug(farms$r3)

> lines(density(farms$r3), col = "red")

−1.5 −1.0 −0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

N = 1260 Bandwidth = 0.04652

Den

sity

Remark: r3 contains outliers (negative values)Thibault LAURENT Toulouse School of Economics

Scoring with R


What can be done after a univariate analysis

I deleting/modifying observations with abnormal values:high/low values for a numeric variable or levels with too fewfrequencies for an attribute

> low.index <- which(farms$r3 < 0)

> farms$r3 <- with(farms, replace(r3, low.index, mean(r3)))




I transforming variable (x 7−→ log(a + x)) to obtain a morenormal distribution

I More general Box-Cox transformation (function BoxCox offorecast)


Scoring with R


Bivariate analysis: 2 numerical variables

> with(farms, cov(r1, r2))

> with(farms, cov(r1, r2)/(sd(r1) * sd(r2)) == cor(r1,

+ r2))

> tab.cor <- cor(farms[, c("r1", "r2", "r3", "r4", "r5")])

Reproducible research with LATEX:> library(xtable)

> matable <- xtable(tab.cor, digits = 3, caption = "Correlation tabular")

> print(matable, file = "corr.tex", size = "tiny")

r1 r2 r3 r4 r5r1 1.000 -0.908 0.121 0.759 0.818r2 -0.908 1.000 0.026 -0.643 -0.790r3 0.121 0.026 1.000 0.642 -0.370r4 0.759 -0.643 0.642 1.000 0.283r5 0.818 -0.790 -0.370 0.283 1.000

Table: Correlation tabular of Capitalization variables

We notice a strong correlation between Capitalization variables.Thibault LAURENT Toulouse School of Economics

Scoring with R


Scatter plot (with lattice package)

> library(lattice)

> xyplot(r2 ~ r1, data = farms, groups = DIFF, auto.key = list(columns = 2,

+ title = "Scatter plot"), par.settings = simpleTheme(col = col.y))

r1

r2

0.0

0.2

0.4

0.6

0.8

1.0

0 1 2 3

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●●● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●● ●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ● ●

●

●

●

●

●

●

●● ●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●● ●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●●●

Scatter plotfailing healthy● ●

(low values of r2 + high values of r1) → high probability of failing


Scoring with R


Scatterplot Matrices (with car package)

> library(car)

> scatterplotMatrix(~r6 + r7 + r8 | DIFF, data = farms,

+ col = col.y, main = "Weight of the debt variables")

● failinghealthy

r6

0.0 1.0 2.0 3.0

●

●

●

●●

●

●●

●

●●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●

●●

●

●

●●

●●●

●

●

●●

●

●

●●●

●

●

●●

●

● ●

●

●●

●●

●

●●

●

●

●●●

●

●

●

●

●●

●

●●●

●●

●

●●

●●

●

●

●

●

●● ●

●●● ●●

●

●

●●

●

●●●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●● ●

●●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●●

● ●

●

●

●●

●

●● ●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●●●

●●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●●

●●●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

● ●●●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

● ●

●

●●

●

●●

●

● ●

●

●

●

●

●

●

●

●●●

●

●●

●●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●●

●● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●●

●●

●●

●

●●

● ●●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●

●●●

●● ●●

●●●

●●●

●

●● ●

●

●

●

●

●

●

●● ●

●

●

●

●●

●●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●●● ●●●

●

●●●

●

●

●

● ●

●

●

●

●●●

●●

●

● ●●

●

●

●

●

●●●

●

●

●●

12

34

●

●

●

●●

●

●●

●

● ●

●

●

●

●●

●

●

●●

●

●

● ●

●●

●

●

●●

●

●

●●

●● ●

●

●

● ●

●

●

●●●

●

●

●●

●

●●

●

●●

●●

●

● ●

●

●

●●●

●

●

●

●

●●

●

●●●

●●

●

●●

●●

●

●

●

●

●●●

●● ●● ●

●

●

●●

●

●● ●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

● ●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●● ●●

●●

●●

●

●

●●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●●

●●

●

●

●●

●

●●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

● ●

●

●●

●

●

●

●

●

● ●●

●●

● ●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●●

●● ●

●

●

●

● ●

●

●

●

●

●●

●

●●

●

●●

●

●●●

●

●

●●

●

●

●

●

●

●

● ●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●●●●

●●

●

●●

●

●

●

●●●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

● ● ●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

●●

●●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●●

● ●

●●

●●●

●●●● ●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●

● ●●

●●● ●

●●●

●●●

●

●●●

●

●

●

●

●

●

●●●

●

●

●

●●

●●

●●

●● ●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●●●●

●●

●

●●●

●

●

●

●●

●

●

●

●●

●

●●

●

●●●

●

●

●

●

●● ●●

●

●●

0.0

1.0

2.0

3.0

● ●●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●●

●

●●

●●

●

●●●

●

●

●●

●●

●●

●●

●

●

●

● ●●●

●

●

●●

●●

●

●

●

●●●

●

●

●

●

●●

●

●● ●

●

●

●●●

●●●●

●

●●

●●

●

●●

●

●

●

●●●

●

●●

●● ●

●

●

●●

●●

●

●●●

●

●

●

● ●

●●

●

●●

●●

●

●●●

●●

●

●

●

●●

●●

●●

●

●●

● ●

●

●

●●

●

●

●

●●

●

●●●

●

●

●

●

● ●●●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

● ●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●●●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●

●●●

●

●

●●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●●

●●●

●

●●

●

●

●●

●●

●●

●

●

●

●●●

●●

●●●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●●●

●●

●● ●

●

●●

●

●

●

●

●●

● ●

●●

●●

●● ●●●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●● ●

●●●●

●●

●

●

●● ●

●

●

●●

●

●

●

●●●

●●

●

●

●●

●

●

●●

●

●

●

● ●

●●

r7

● ●●

●

●

●

●●●● ●

●

●

●

●

●

●

●

● ●●

●

● ●

●●

●

●●●

●

●

●●

● ●

●●

●●

●

●

●

● ●●●

●

●

●●

●●

●

●

●

●●●

●

●

●

●

●●●

●● ●

●

●

●●●

●●●

●

●

●●

●●

●

●●

●

●

●

●●●

●

●●

●● ●

●

●

●●

●●

●

●●●

●

●

●

● ●

●●

●

●●

●●

●

●●●

●●

●

●

●

●●

●●

●●

●

●●

● ●

●

●

●●

●

●

●

●●

●

●●●

●

●

●

●

● ●●●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

● ●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●● ●●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

● ●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●●

●

●

●

●

●●

●

●

●● ●

●

●

●●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

● ●

●●●

●

●●

●

●

●●

●●

●●

●

●

●

●●●

●●

●●●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●●

●●●

●● ●

●

●●

●

●

●

●

●●

● ●

●●

●●

●● ●●

●●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●● ●

●●●●

●●

●

●

●● ●

●

●

●●

●

●

●

●●●

●●

●

●

● ●

●

●

●●

●

●

●

● ●

●●

1 2 3 4

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●●

●●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

● ●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●

●●

●

●●

●

●

●● ●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

● ●●●

●

●

●●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

● ●●

●●

●●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●

●●

●

●●

●

●

●

●

●

●● ●

●

● ●

●●

●

●●

●

●●

● ●

●

●●

●

●

●

●●

● ●

●●

●●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

● ●

●

●

●●●●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●●

●●

●●

●●

●

●

●

●

●●

● ●●

●

●

●

●

●

●

●●

●●●

●

●

●●

●●

●

●

●

●

●

●●

●●

● ●

●

●

●●●

●●

●

●

●● ●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●●

●

●

●

●●

●●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

● ●

● ●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●●

●●

●

●●

●

●

●● ●

●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

● ●●●

●

●

●●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

● ●●

●●

●●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●●

●

●

●●●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●

●●

●

●●

●

●

●

●

●

●● ●

●

● ●

●●

●

● ●

●

●●

● ●

●

●●

●

●

●

●●

● ●

●●

●●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

● ●

●

●

●●●●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●●

●●

●●

●●

●

●

●

●

●●

● ●●

●

●

●

●

●

●

●●

●●●

●

●

●●

●●

●

●

●

●

●

●●

●●

● ●

●

●

●●●

●●

●

●

●●●●

●

●

●

●

●●

0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5r8

weight of the debt variables


Scoring with R


Bivariate analysis: 2 attributes

> op <- par(mfrow = c(1, 2), cex.axis = 0.6, cex.lab = 0.6)

> mosaicplot(table(farms[, c(3, 2)]), color = TRUE, main = "")

> barplot(table(farms[c(2, 5)]), beside = TRUE, legend.text = c("failing",

+ "healthy"), horiz = TRUE, cex.names = 0.5, col = col.y,

+ args.legend = list(cex = 0.5), las = 2)

> par(op)

STATUS

DIF

F

company proprietorship

faili

nghe

alth

y

cereals

gen.cropping

dairy.farm

mix.livestock

var.crops−livestock

soilless.breed healthyfailing

0 50 100

150

200


Scoring with R


Pearson’s Chi-squared Test

Pearson’s Chi-squared Test:

> t1 <- with(farms, chisq.test(table(DIFF, CNTY)))

> print(t1)

Pearson's Chi-squared test

data: table(DIFF, CNTY)

X-squared = 5.9929, df = 3, p-value = 0.1120

We notice that the value of χ2 is not large enough to be“abnormal” compared to a χ2 distribution. The link between thetwo variables is not significant...


Scoring with R


Cramer’s V statistic

We have to first calculate first the Pearson’s Chi-squared statistic:

> t2 <- with(farms, chisq.test(table(DIFF, STATUS)))

We obtain the Cramer’s V statistic like that:

> V.2 <- sqrt(t2$statistic/n/min(nlevels(farms$DIFF), nlevels(farms$STATUS)))

> names(V.2) <- "Cramer's V statistic"

> print(V.2)

Cramer's V statistic

0.05408548

If 0 < V ≤ 0.25 the link is low, if 0.25 < V ≤ 0.6 the link ismedium, if V > 0.6 the link is strong. In this case, the link is low.We will see in the next slide that the links between attributes andY are not strong.


Scoring with R


Cramer’s V statistic summary

> res.cramer <- NULL

> for (i in c(1, 3, 5, 6, 8)) {

+ t.k <- with(farms, chisq.test(table(DIFF, farms[,

+ i])))

+ res.cramer <- c(res.cramer, sqrt(t.k$statistic/n/min(nlevels(farms$DIFF),

+ nlevels(farms[, i]))))

+ }

> names(res.cramer) <- names(farms)[c(1, 3, 5, 6, 8)]

> res <- data.frame(t(res.cramer))

> row.names(res) <- "values"

> matable <- xtable(res, digits = 3, caption = "Cramer's V statistic")

> print(matable, file = "vstat.tex", size = "tiny")

CNTY STATUS ToF OWNLAND HARVESTvalues 0.049 0.054 0.087 0.040 0.044

Table: Cramer’s V statistic


Scoring with R


Empirical Odds, Odds Ratio and Relative Risk (1)

Consider the variable STATUS with two levels and the followingcontingency table:

> tab1 <- with(farms, addmargins(table(DIFF, STATUS)))

> matable <- xtable(tab1, digits = 3, align = "l|cc|r",

+ caption = "Contingency Table")

> print(matable, hline.after = c(0, 2), file = "V.tex",

+ size = "tiny")

company proprietorship Sumfailing 89.000 518.000 607.000healthy 135.000 518.000 653.000Sum 224.000 1036.000 1260.000

Table: Contingency Table


Scoring with R


Empirical Odds, OR and RR (2)

Prevalences :

I π(Company) = (#Y=1|X=Company)(#X=Company)

I π(prop) = (#Y=1|X=prop)(#X=prop)

I p1 = (#Y=1)n :

> res.preval <- tab1[1, ]/tab1[3, ]

> names(res.preval) <- c("pi.comp", "pi.prop", "p.1")

> print(res.preval)

pi.comp pi.prop p.1

0.3973214 0.5000000 0.4817460


Scoring with R


Empirical Odds, OR and RR (3)

I Odds: among the farms included in company, the chances offailing are 0.66 (= #Y=1|X=company

#Y=0|X=company = 89/135). Note the

chances are equal for proprietorship (518/518).

I OR =(#Y=1|X=prop)

1−(#Y=1|X=prop)(#Y=1|X=comp)

1−(#Y=1|X=comp)

= (518/518)/(89/135) = 1.5.

I RR = π(prop)/π(Company) = 0.5/0.4 = 1.25

→ the chances of failing are higher in the group of proprietorship


Scoring with R


Bivariate analysis: one attribute and one numerical variable(1)

> par(mfrow = c(1, 3))

> boxplot(r11 ~ DIFF, data = farms, xlab = "r11", col = col.y)



> par(op)

> title("Liquidity variables")

●

●

●

●

●

●●

●

●

●

●●●●

●●

●●●●●●

●

●●●●●●●

●

●

●

●

●

●

●

●

failing healthy

−1.

00.

00.

51.

01.

52.

0

r11

●

●

●

●

●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●

●

failing healthy

−1

01

23

4

r12

●

●●●●●●●●●●●●●●●●●●

●●●

●●

●●●

●●

●

●

●

●

●●●

●●

●

●●●●●

●

●

●

●●●

●

●

●

●

failing healthy

01

23

45

r14

Liquidity variables


Scoring with R


Bivariate analysis: one attribute and one numerical variable(2)

> library(lattice)

> histogram(~r17 | DIFF,

+ layout = c(1, 2),

+ nint = 20, data = farms,

+ panel = function(x,

+ ...) {

+ panel.histogram(x,

+ ..., col = col.y[panel.number()])

+ })

r17

Per

cent

of T

otal

0

5

10

15

0.00 0.05 0.10 0.15 0.20

failing

0

5

10

15

healthy


Scoring with R


Correlation ratio

η2 =r∑

l=1

nl(Xl − X )2

nσ2X

> n <- nrow(farms)

> deno <- (n - 1) * var(farms$r1)

> eta.r1 <- with(farms, sum(table(DIFF) * (by(r1,

+ DIFF, mean) - mean(r1))^2)/deno)

> print(eta.r1)

[1] 0.419557


Scoring with R


Correlation ratio (2)

Objective: calculate the correlation ratio of each numerical variablewith Y and draw a dot chart depending on the topic of thevariables (“capitalization”, “liquidity”, etc.)

> res <- NULL

> for (k in c(4, 7, 9:30)) {

+ deno <- (n - 1) * var(farms[, k])

+ res <- c(res, with(farms, sum(table(DIFF) *

+ (by(farms[, k], DIFF, mean) - mean(farms[,

+ k]))^2)/deno))

+ }

> names(res) <- names(farms[c(4, 7, 9:30)])

> topics <- factor(c("structure", "structure", rep("capitalization",

+ 5), rep("Weight of the debt", 3), rep("Liquidity",

+ 3), rep("Debt servicing", 5), "Capital profitability",

+ rep("Earnings", 3), rep("Productive activity",

+ 2)))


Scoring with R


Correlation ratio (3)

> dotchart(res, groups = topics, main = "Correlation ratio by topics")

> abline(v = 0.25, col = "red", lty = 2)

r6r7r8

HECTAREAGE

r36r37

r11r12r14

r28r30r32

r17r18r19r21r22

r1r2r3r4r5

r24

●●

●

●●

●●

●●

●

●●

●

●●

●●

●

●●

●●

●

●Capital profitability

capitalization

Debt servicing

Earnings

Liquidity

Productive activity

structure

Weight of the debt

0.0 0.1 0.2 0.3 0.4

Correlation ratio by topics


Scoring with R


Student’s t-Test

> with(farms, t.test(r36[DIFF == "failing"], r36[DIFF ==

+ "healthy"]))

Welch Two Sample t-test

data: r36[DIFF == "failing"] and r36[DIFF == "healthy"]

t = 3.4574, df = 1087.007, p-value = 0.0005666

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

0.04941916 0.17912308

sample estimates:

mean of x mean of y

1.241827 1.127556

See the formula of Welch’s test inhttp://en.wikipedia.org/wiki/Welch%27s_t-test


Scoring with R

http://en.wikipedia.org/wiki/Welch%27s_t-test


How to discretize a numerical variable ?

1. start by using traditional methods such as “quantile”,“Fisher-Jenks”, etc included in package classInt with a highnumber of classes.

2. try to aggregate classes using the weights of evidence criteria

WoE = Log(Odds) = Log (#y=1|X )(#y=0|X )

3. in the end, 5-6 classes seem to be enough


Scoring with R


Discretization of r1

> library(classInt)

> interval <- classIntervals(farms$r1, n = 12,

+ style = "quantile")$brks

> nb.int <- findInterval(farms$r1, interval,

+ all.inside = TRUE)

> woe <- by(farms$DIFF, as.factor(nb.int),

+ function(x) log(length(which(x ==

+ "failing"))/length(which(x ==

+ "healthy"))))

> plot((interval[1:12] + interval[2:13])/2,

+ woe, main = "Weight Of Evidence",

+ xlab = "variable r1")

> abline(v = interval, lty = 2, col = "grey")

> abline(v = interval[c(4, 8, 11)], col = "red")


Scoring with R


Choice of classes

4 classes seem to be enough to discretize r1: “low”, “medium”,“high” and “very high”.

●

●●

●

●

● ●

●●

●

● ●

0.5 1.0 1.5 2.0

−2

02

4Weight Of Evidence

variable r1

woe


Scoring with R


Multivariate analysis

I Principal ComponentAnalysis (PCA) to completethe analysis ofcovariance/variance ofexplanatory variables

I Hierarchical cluster analysis(if n < 1000) with hclust

function or k-meansclustering with kmeans

function

> res.pca <- princomp(farms[,

+ 9:30])

> biplot(res.pca, col = c("grey",

+ "blue"))


Scoring with R


Other R Tools

I package Rcmdr: a Tk menu with several graphics and testswith a minimum of programming

I package rattle: a package which depends on a lot ofpackages, dedicated to scoring methods

I package iplots: interactive selection on basic graphic such ashistogram, barplot, etc., useful for the detection ofmultivariate outliers

I package ggplot2: an other “generation” of graphics


Scoring with R


Conclusion of this part

Do you already have an idea of the characteristics of the farmswhich failed ?If no, you may continue to explore the data...


Scoring with R


Introduction



Logistic Regression


Scoring with R


Sampling

I working sample (70%) farms.work: used for model selection

I test sample (30%) farms.test: used to test the selectedmodel

> set.seed(121181)

> ind <- sample(1:n, round(0.7 * n))

> farms.work <- farms[ind, ]

> farms.test <- farms[-ind, ]


Scoring with R


Ordinary Least Square (OLS) model (2)

How to explain a numerical variable by other explanatory variables(both numerical and attribute) ?

I use of the function lm: r1 is the variable to explain, DIFF andHECATRE are explanatory variables:> res.lm <- lm(r1 ~ DIFF + HECTARE, data = farms.work)

I What results are included in res.lm ?> names(res.lm)

I function anova.lm calculates the analysis of variance table:> anova(res.lm)

I function summary.lm computes a list of summary statitics (Fstatistic, adjusted R2, etc) of the OLS model:> summary(res.lm)


Scoring with R


Ordinary Least Square (OLS) model (2)

I function plot.lm returns plot diagnostics:> dev.new()


> plot(res.lm)

> par(op)

I function influence.measures returns statistics as Cook’sdistance to detect influent observations:> influence.measures(res.lm)


Scoring with R


Generalized Linear Model (GLM) model

How to explain a normal, binomial, poisson or gamma variable byexplanatory variable (both numerical and attribute) ?

I use of the function glm and option family to give the name ofthe distribution and the link used:> res.glm <- glm(Y ~ ., family = binomial(link = "logit"),

+ data = farms.work[, -2])

I the functions used for glm object are the same than lm:> names(res.glm)

> anova(res.glm)

> summary(res.glm)

> dev.new()


> plot(res.glm)

> par(op)

> influence.measures(res.lm)


Scoring with R


Choice of variables

I You can choose the function stepAIC which performs stepwisemodel selection by AIC, applied to the res.step objectconstructed previously:> res.step <- stepAIC(res.glm, direction = "backward",

+ k = log(nrow(farms.work)))

I You can also choose the function bic.glm of package BMA:> library(BMA)

> choix.bic.glm <- bic.glm(farms.work[, -c(2, 31)], farms.work$Y,

+ strict = FALSE, OR = 20, data = x, glm.family = "binomial",

+ factor.type = TRUE)

> summary(choix.bic.glm, conditional = T, digits = 2)

> imageplot.bma(choix.bic.glm)


Scoring with R


Comparing two models

I with stepAIC, we keep the variables: CNTY, STATUS, HECTARE,r1, r5, r12, r14, r21 and r36

I with the first model of bic.glm, we keep the variables: CNTY,STATUS, HECTARE, r1, r3, r17, r24 and r36

To compare the two methods, we can use the AIC criteria:> res.bic.glm <- glm(Y ~ STATUS + CNTY + HECTARE + r1 +

+ r3 + r17 + r24 + r36, family = binomial(link = "logit"),

+ data = farms.work)

> AIC(res.step)

[1] 423.0119

> AIC(res.bic.glm)

[1] 422.6086

We keep the second model ...Thibault LAURENT Toulouse School of Economics

Scoring with R


Coefficients of the model

> matable <- xtable(res.bic.glm, digits = 3, caption = "Coefficient of the selected model")

> print(matable, file = "coeff.tex", size = "tiny")

Estimate Std. Error z value Pr(>|z|)(Intercept) -6.118 1.089 -5.616 0.000

STATUSproprietorship -1.543 0.400 -3.860 0.000CNTYNord -2.257 0.413 -5.465 0.000CNTYOrne -1.472 0.393 -3.748 0.000

CNTYSeine-Maritime -0.186 0.388 -0.478 0.633HECTARE -0.035 0.004 -7.836 0.000

r1 11.642 0.892 13.051 0.000r3 5.915 0.785 7.531 0.000

r17 31.362 6.214 5.047 0.000r24 -7.437 2.008 -3.705 0.000r36 1.532 0.332 4.618 0.000

Table: Coefficient of the selected model

Be careful before interpreting the coefficients β: notice for examplethe sign associated to STATUS, contrary to what we observed inEDA, due certainly to a problem of multi-collinearity ...


Scoring with R


Estimated adjusted Odds ratio

We may calculate the odds ratio and confidence interval by usingthe functions summary and coeff.

> lreg.coeffs <- coef(summary(res.bic.glm))

> lreg.coeffs[c("r1", "r3", "r17", "r24", "r36"), 1] <- lreg.coeffs[c("r1",

+ "r3", "r17", "r24", "r36"), 1] * 0.01

> odds <- data.frame(signif(cbind(exp(lreg.coeffs[, 1]),

+ exp(lreg.coeffs[, 1] - 1.96 * lreg.coeffs[, 2]),

+ exp(lreg.coeffs[, 1] + 1.96 * lreg.coeffs[, 2])),

+ 3))

> names(odds) <- c("odds", "l.95", "u.95")

In order to interpret the odds-ratio associated to the ratios (r1,etc.), we have multiplied these coefficients by 0.01.


Scoring with R


Other issues in modelling

I transforming all numeric variables into attributes such as wehave seen in previous section.

I transforming all the attributes into numeric variables (MultipleCorrespondence Analysis with function dudi.acm of packageade4).

I choose an econometric approach: for example, try a model bytaking into account an economic “a priori” on the variables.The dot chart of the weight of evidence may also bring a goodintuition.


Scoring with R


Prediction on the test sample

Calculate the following term by using the function predict:

Y ∗ = X β

> eta <- predict(res.step, newdata = farms.test)

Calculate then the score:

µ = exp(Y ∗)/(1 + exp(Y ∗))

> mu <- exp(eta)/(1 + exp(eta))

If we choose an arbitrary cut off equal to 0.5, we calculate Y suchas:

> Y.pred = relevel(factor(ifelse(mu > 0.5, 1, 0)), ref = "1")


Scoring with R


Confusion Matrix and vocabulary

actual value Y = 1 actual value Y = 0 Total

predicted value Y = 1 TP FP

predicted value Y = 0 FN TNTotal P N P+N

I True Positive Rate TPR: TP/P

I False Positive Rate FPR: FP/N

I Accuracy: (TP + TN)/(P + N)

I Positive predictive value PPV: TP/(TP + FP)

I Sensitivity: TP/(TP + FN)

I Specificity: TN/(FP + TN)


Scoring with R


Example with a Cutoff equal to 0.5

Construction of the confusion matrix:> ma.conf <- addmargins(table(Y.pred, farms.test$DIFF))

TPR and FPR:> ma.conf[1, 1:2]/ma.conf[3, 1:2]

failing healthy

0.89637306 0.07027027

Accuracy:> (ma.conf[1, 1] + ma.conf[2, 2])/sum(ma.conf[3, 1:2])

[1] 0.9126984

Sensitivity and Specificity:> ma.conf[1, 1]/ma.conf[3, 1]

[1] 0.896373

> ma.conf[2, 2]/ma.conf[3, 2]

[1] 0.9297297


Scoring with R


The ROC curve (1)

How to choose the cut-off ?

I Choose two criteria seen previously, for example the TPR andFPR citeria.

I We would like to choose a cut off such as the TPR be largewhereas the FPR be small.

I The ROC curve draws simultaneously these two criteria whenvarying the cut off: when cut off equal to 1, non farm hasbeen predicted as failing, so TPR=FPR=0, etc.

I Use of the ROCR package (seehttp://rocr.bioinf.mpi-sb.mpg.de/).> library(ROCR)

> pred <- prediction(mu, farms.test$Y)


Scoring with R

http://rocr.bioinf.mpi-sb.mpg.de/


The ROC curve (2)

In this case, a cut off equal to 0.4 seems to be a good compromiseto obtain both a good TPR and FPR.> perf <- performance(pred, measure = "tpr", x.measure = "fpr")

> plot(perf, colorize = T, print.cutoffs.at = seq(0, 1,

+ by = 0.1), text.adj = c(1.2, 1.2), lwd = 3)

False positive rate

True

pos

itive

rat

e

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

00.

20.

40.

60.

81●●

●●●

●●

●

●

●

●

00.10.20.30.4

0.50.60.7

0.8

0.9

1


Scoring with R


Cross-validation for GLM

This method is an alternative to the AIC criteria for the choice ofthe model and may be recommanded when the size of the sampleis not large enough:

1. Divide the data (of size n) into K groups

2. For each group, fit a GLM omitting that group and calculatethe percent of badly classified with function cost in the groupthat was omitted from the fit

> require(boot)

> res.glm <- glm(Y ~ STATUS + CNTY + HECTARE + r1 + r3 +

+ r17 + r24 + r36, family = binomial(link = "logit"),

+ data = farms)

> cost <- function(r, pi = 0) mean(abs(r - pi) > 0.6)

> res.cv <- cv.glm(farms, res.glm, cost, K = 10)


Scoring with R


Conclusion

I The EDA gave us some answers to the problem and helped usunderstand the data.

I We found one possible model which has good properties byconsidering the usual criteria even if the interpretation of thismodel is not easy because of the inhomogeneity andcorrelation between the variables: other model could be found.

I However, we may use this model to prevent some farms offailing, in the case where we had all explanatory variablesexcepted Y , farms with values of Y > 0.4 would be warned...


Scoring with R


Other methods for scoring with R

See http://cran.r-project.org/doc/contrib/

Sharma-CreditScoring.pdf which deals with the followingmethods

I Bagging: package adabag

I Random Forest: package randomForest

I Support Vector Machines: package e1071

I Generalized Additive Model: package gam


Scoring with R

http://cran.r-project.org/doc/contrib/Sharma-CreditScoring.pdf

http://cran.r-project.org/doc/contrib/Sharma-CreditScoring.pdf

scoring with r - free · dominique desbois (2008),\introduction to scoring methods: financial...

Documents