mock presentation slides report
TRANSCRIPT
![Page 1: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/1.jpg)
MOCK PRESENTATION SLIDES REPORTPLEASE NOTE: this document has purely illustrative purposes. Minerva statistical consulting does not guarantee the accuracy andcorrectness of the content, nor it accepts any responsibility for any damages or losses incurred by its use.
![Page 2: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/2.jpg)
Introduction
Can we select, among several algorithms,
the best performing machine learning
algorithm in predicting house prices?
Research Objective
Dataset used: Boston Housing Data-set.
This dataset contains information collected by the
US Census Service concerning housing in the area
of Boston (Massachusetts, USA).
Data Source:The original dataset is available (Use QR
Code) and has been used extensively
throughout the literature to benchmark
and compare the accuracy of different
algorithms.
In this project, we will use an already “cleaned
dataset” – with no missing values.
Kernel RidgeRegressor
XGBoost Regressor
Light GMB Regressor
Gradient BoostingRegressor
Light GMB Regressor is the best
performing model for this data-set.
Results
2
![Page 3: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/3.jpg)
MethodologyProgramming Language And Libraries Used
The programming language used throughout
is : Python 3.9.1
Numpy
Pandas
Matplotlib
Scipy
Xgboost
Libraries and packages used:
3
![Page 4: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/4.jpg)
MethodologyData Set OverviewHereafter, we refer to the variable MEDV as our target variable
while the rest (13 variables) are referred to as features.
CRIM: per capita crime rate by town.
ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: proportion of non-retail business acres per town.
CHAS: Charles River dummy variable (= 1 if tract bounds river; =0 otherwise).
NOX: nitric oxides concentration (parts per 10 million).
RM: average number of rooms per dwelling.
AGE: proportion of owner-occupied units built prior to 1940.
DIS: weighteddistances to five Boston employmentcenters.
RAD: index of accessibility to radial highways .
TAX: full-value-property-tax-rate per $10,000.
PTRATIO: pupil-teacher ratio by town.
B: 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town.
LSTAT: % lower status of the population.
MEDV: Medianvalue of owner-occupied homes in $1000s.
CRIM ZN INUDS CHAS NOX RM AGE
DIS RAD TAX PTRATIO B LSTAT MEDV
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Data set information
0# Missing Values
▪ As mentioned above, the data has already been cleaned, as such it does not contain any missing values.
▪ In python, we can double-check this using the isnull() command, which returns the number of missing values.4
![Page 5: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/5.jpg)
Variable Plots(Distribution Plots)
Next, we proceed to visually explore the features by plotting the distribution of each variable below.
Variable = DIS Variable = RAD Variable = TAX Variable = BVariable = PTRATIO Variable = LSTAT Variable = MEDV
Variable = CRIM Variable = Zn Variable = INDUS Variable = CHAS Variable = NOX Variable = RM Variable = AGE
5
![Page 6: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/6.jpg)
PRE-PROCESSING ICORRECTING FOR SKEWNESS
After plotting , there was evidence of skwness in the data.
Correcting for skwness in the data prior to model fitting is important.
▪ For instance, if the response variable is skewed to the right, the model will be trained on a much
larger number of relatively inexpensive priced homes, and thus, it will be less likely to successfully
predict the range of most expensive houses.
▪ Furthermore, if the values of a certain independent variable (feature) are skewed, depending on
the model, skewness may go against the model assumptions or may make the interpretation of
feature importance difficult.
After various transformations (log,BoxCox), we decided to opt for log transformation.
NOTEWe should keep track of the transformation performed on
the features, because we would need to reverse these
transformation, once we make predictions.6
![Page 7: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/7.jpg)
Variable Plots (Distribution) After TransformationAfter correcting for skewness, some variables’ distributions display more symmetrical shapes (see for
instance NOX, LSTAT and DIS) . Hence, the transformations help.
Variable = DIS Variable = RAD Variable = TAX Variable = BVariable = PTRATIO Variable = LSTAT Variable = MEDV
Variable = CRIM Variable = Zn Variable = INDUS Variable = CHAS Variable = NOX Variable = RM Variable = AGE
7
![Page 8: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/8.jpg)
Bimodal Variables
Some features displayed a
bi-modal density function,
see for instance features:
INDUS, TAX and RAD.
Therefore, to take this into
account we modelled
these variables as a mixture of gaussians.
This option is available in
the python scikit-learn
package.
8
![Page 9: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/9.jpg)
9
CORRELATION HEAT MAP (Features vs Target)
From the correlation
matrix, the features
seem to be significantly
correlated with the
target variable (MEDV),
see last row or last
column on the right. This
is important as we
would want the feature
to be related to the
target variable.
![Page 10: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/10.jpg)
10
PRE-PROCESSING II : STANDARDISATION
▪ Next, as customary, we proceed to a standardization of the features.
▪ Although some algorithms are not sensitive to standardisation, it’s
usually customary to standardise the data prior to model building.
▪ We employ the inbuilt function RobustScaler, which is invoked with the
sklearn.preprocessing command.
▪ This scaler uses statistics that are robust to outliers.
▪ More precisely, this scaler removes the median and scales the data
according to the quantile range (defaults to IQR: Interquartile Range).
![Page 11: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/11.jpg)
11
PRE-PROCESSING III : REMOVAL OF REDUNDANT FEATURES
▪ Further, we can use the in-built function RFECV (Feature ranking with
recursive feature elimination and cross-validated selection of the best
number of features. ) to help us eliminate redundant features.
▪ The main idea is that if some of the variables don't add new
information, these are redundant, and hence need to be excluded
from the model.
▪ The RFECV only eliminates the ZN variable (this is deemed as
determined by the other variables, hence redundant).
![Page 12: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/12.jpg)
12
MODELLING AND PREDICTION
▪ We start by splitting the data into test and train data (# obs = 506).
▪ The size of the test data is 20% the over-all size of the data (0.20 * 506
= 101 obs).
▪ The objective is to compare the performance of the following
machine learning algorithms:
▪ Kernel Ridge Regressor; XGBoost Regressor; Light GMB Regressor;
Gradient Boosting Regressor and select the one that performs the
best.
![Page 13: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/13.jpg)
13
HOW DO WE TUNE/PARAMETRIZE
EACH MODEL?
▪ However, before comparing the models’ performance, we need to
decide how to parametrize each model. To select the best model
parametrization, we perform a grid search across several
combinations of parameters and select the best model
parametrization for each model based on the lowest Root Mean
Square Forecast Error (RMSFE)
▪ Then select, amongst all the best parametrized models, the best
overall performing model, again the one with the lowest RMSFE (this
time across models).
![Page 14: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/14.jpg)
14
BEST PERFORMANCE, RESULTS
ACROSS MODELS - RMSFE
ModelTraining
set
Validation
set
Kernel Ridge
Regression0.121 0.21
XG Boost Regressor 0.075 0.171
Light GBM Regressor 0.104 0.149
Gradient Boosting
Regressor0.011 0.170
The light GBM
Regressor is the best
performing model
across ALL models,
since it has the lowest
Validation Set RMSFE.
![Page 15: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/15.jpg)
15
L-GBM MODEL PREDICTION vs
ACTUAL VALUES - SCATTER PLOT
From the scatter plot we can
see that the predicted and
actual values are highly
correlated, which would
suggest that the model performs well when
compared to actual data.
![Page 16: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/16.jpg)
16
![Page 17: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/17.jpg)
MOCK PRESENTATION SLIDES REPORTPLEASE NOTE: this document has purely illustrative purposes. Minerva statistical consulting does not guarantee the accuracy andcorrectness of the content, nor it accepts any responsibility for any damages or losses incurred by its use.
![Page 18: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/18.jpg)
Introduction
Can we select, among several algorithms,
the best performing machine learning
algorithm in predicting house prices?
Research Objective
Dataset used: Boston Housing Data-set.
This dataset contains information collected by the
US Census Service concerning housing in the area
of Boston (Massachusetts, USA).
Data Source:The original dataset is available (Use QR
Code) and has been used extensively
throughout the literature to benchmark
and compare the accuracy of different
algorithms.
In this project, we will use an already “cleaned
dataset” – with no missing values.
Kernel RidgeRegressor
XGBoost Regressor
Light GMB Regressor
Gradient BoostingRegressor
Light GMB Regressor is the best
performing model for this data-set.
Results
18
![Page 19: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/19.jpg)
MethodologyProgramming Language And Libraries Used
The programming language used throughout
is : Python 3.9.1
Numpy
Pandas
Matplotlib
Scipy
Xgboost
Libraries and packages used:
19
![Page 20: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/20.jpg)
MethodologyData Set OverviewHereafter, we refer to the variable MEDV as our target variable
while the rest (13 variables) are referred to as features.
CRIM: per capita crime rate by town.
ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: proportion of non-retail business acres per town.
CHAS: Charles River dummy variable (= 1 if tract bounds river; =0 otherwise).
NOX: nitric oxides concentration (parts per 10 million).
RM: average number of rooms per dwelling.
AGE: proportion of owner-occupied units built prior to 1940.
DIS: weighteddistances to five Boston employmentcenters.
RAD: index of accessibility to radial highways .
TAX: full-value-property-tax-rate per $10,000.
PTRATIO: pupil-teacher ratio by town.
B: 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town.
LSTAT: % lower status of the population.
MEDV: Medianvalue of owner-occupied homes in $1000s.
CRIM ZN INUDS CHAS NOX RM AGE
DIS RAD TAX PTRATIO B LSTAT MEDV
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Data set information
0# Missing Values
▪ As mentioned above, the data has already been cleaned, as such it does not contain any missing values.
▪ In python, we can double-check this using the isnull() command, which returns the number of missing values.20
![Page 21: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/21.jpg)
Variable Plots(Distribution Plots)
Next, we proceed to visually explore the features by plotting the distribution of each variable below.
Variable = DIS Variable = RAD Variable = TAX Variable = BVariable = PTRATIO Variable = LSTAT Variable = MEDV
Variable = CRIM Variable = Zn Variable = INDUS Variable = CHAS Variable = NOX Variable = RM Variable = AGE
21
![Page 22: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/22.jpg)
PRE-PROCESSING ICORRECTING FOR SKEWNESS
After plotting , there was evidence of skwness in the data.
Correcting for skwness in the data prior to model fitting is important.
▪ For instance, if the response variable is skewed to the right, the model will be trained on a much
larger number of relatively inexpensive priced homes, and thus, it will be less likely to successfully
predict the range of most expensive houses.
▪ Furthermore, if the values of a certain independent variable (feature) are skewed, depending on
the model, skewness may go against the model assumptions or may make the interpretation of
feature importance difficult.
After various transformations (log,BoxCox), we decided to opt for log transformation.
NOTEWe should keep track of the transformation performed on
the features, because we would need to reverse these
transformation, once we make predictions.22
![Page 23: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/23.jpg)
Variable Plots (Distribution) After TransformationAfter correcting for skewness, some variables’ distributions display more symmetrical shapes (see for
instance NOX, LSTAT and DIS) . Hence, the transformations help.
Variable = DIS Variable = RAD Variable = TAX Variable = BVariable = PTRATIO Variable = LSTAT Variable = MEDV
Variable = CRIM Variable = Zn Variable = INDUS Variable = CHAS Variable = NOX Variable = RM Variable = AGE
23
![Page 24: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/24.jpg)
Bimodal Variables
Some features displayed a
bi-modal density function,
see for instance features:
INDUS, TAX and RAD.
Therefore, to take this into
account we modelled
these variables as a mixture of gaussians.
This option is available in
the python scikit-learn
package.
24
![Page 25: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/25.jpg)
25
CORRELATION HEAT MAP (Features vs Target)
From the correlation
matrix, the features
seem to be significantly
correlated with the
target variable (MEDV),
see last row or last
column on the right. This
is important as we
would want the feature
to be related to the
target variable.
![Page 26: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/26.jpg)
26
PRE-PROCESSING II : STANDARDISATION
▪ Next, as customary, we proceed to a standardization of the features.
▪ Although some algorithms are not sensitive to standardisation, it’s
usually customary to standardise the data prior to model building.
▪ We employ the inbuilt function RobustScaler, which is invoked with the
sklearn.preprocessing command.
▪ This scaler uses statistics that are robust to outliers.
▪ More precisely, this scaler removes the median and scales the data
according to the quantile range (defaults to IQR: Interquartile Range).
![Page 27: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/27.jpg)
27
PRE-PROCESSING III : REMOVAL OF REDUNDANT FEATURES
▪ Further, we can use the in-built function RFECV (Feature ranking with
recursive feature elimination and cross-validated selection of the best
number of features. ) to help us eliminate redundant features.
▪ The main idea is that if some of the variables don't add new
information, these are redundant, and hence need to be excluded
from the model.
▪ The RFECV only eliminates the ZN variable (this is deemed as
determined by the other variables, hence redundant).
![Page 28: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/28.jpg)
28
MODELLING AND PREDICTION
▪ We start by splitting the data into test and train data (# obs = 506).
▪ The size of the test data is 20% the over-all size of the data (0.20 * 506
= 101 obs).
▪ The objective is to compare the performance of the following
machine learning algorithms:
▪ Kernel Ridge Regressor; XGBoost Regressor; Light GMB Regressor;
Gradient Boosting Regressor and select the one that performs the
best.
![Page 29: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/29.jpg)
29
HOW DO WE TUNE/PARAMETRIZE
EACH MODEL?
▪ However, before comparing the models’ performance, we need to
decide how to parametrize each model. To select the best model
parametrization, we perform a grid search across several
combinations of parameters and select the best model
parametrization for each model based on the lowest Root Mean
Square Forecast Error (RMSFE)
▪ Then select, amongst all the best parametrized models, the best
overall performing model, again the one with the lowest RMSFE (this
time across models).
![Page 30: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/30.jpg)
30
BEST PERFORMANCE, RESULTS
ACROSS MODELS - RMSFE
ModelTraining
set
Validation
set
Kernel Ridge
Regression0.121 0.21
XG Boost Regressor 0.075 0.171
Light GBM Regressor 0.104 0.149
Gradient Boosting
Regressor0.011 0.170
The light GBM
Regressor is the best
performing model
across ALL models,
since it has the lowest
Validation Set RMSFE.
![Page 31: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/31.jpg)
31
L-GBM MODEL PREDICTION vs
ACTUAL VALUES - SCATTER PLOT
From the scatter plot we can
see that the predicted and
actual values are highly
correlated, which would
suggest that the model performs well when
compared to actual data.
![Page 32: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/32.jpg)
32
![Page 33: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/33.jpg)
MOCK PRESENTATION SLIDES REPORTPLEASE NOTE: this document has purely illustrative purposes. Minerva statistical consulting does not guarantee the accuracy andcorrectness of the content, nor it accepts any responsibility for any damages or losses incurred by its use.
![Page 34: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/34.jpg)
Introduction
Can we select, among several algorithms,
the best performing machine learning
algorithm in predicting house prices?
Research Objective
Dataset used: Boston Housing Data-set.
This dataset contains information collected by the
US Census Service concerning housing in the area
of Boston (Massachusetts, USA).
Data Source:The original dataset is available (Use QR
Code) and has been used extensively
throughout the literature to benchmark
and compare the accuracy of different
algorithms.
In this project, we will use an already “cleaned
dataset” – with no missing values.
Kernel RidgeRegressor
XGBoost Regressor
Light GMB Regressor
Gradient BoostingRegressor
Light GMB Regressor is the best
performing model for this data-set.
Results
34
![Page 35: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/35.jpg)
MethodologyProgramming Language And Libraries Used
The programming language used throughout
is : Python 3.9.1
Numpy
Pandas
Matplotlib
Scipy
Xgboost
Libraries and packages used:
35
![Page 36: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/36.jpg)
MethodologyData Set OverviewHereafter, we refer to the variable MEDV as our target variable
while the rest (13 variables) are referred to as features.
CRIM: per capita crime rate by town.
ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: proportion of non-retail business acres per town.
CHAS: Charles River dummy variable (= 1 if tract bounds river; =0 otherwise).
NOX: nitric oxides concentration (parts per 10 million).
RM: average number of rooms per dwelling.
AGE: proportion of owner-occupied units built prior to 1940.
DIS: weighteddistances to five Boston employmentcenters.
RAD: index of accessibility to radial highways .
TAX: full-value-property-tax-rate per $10,000.
PTRATIO: pupil-teacher ratio by town.
B: 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town.
LSTAT: % lower status of the population.
MEDV: Medianvalue of owner-occupied homes in $1000s.
CRIM ZN INUDS CHAS NOX RM AGE
DIS RAD TAX PTRATIO B LSTAT MEDV
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Data set information
0# Missing Values
▪ As mentioned above, the data has already been cleaned, as such it does not contain any missing values.
▪ In python, we can double-check this using the isnull() command, which returns the number of missing values.36
![Page 37: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/37.jpg)
Variable Plots(Distribution Plots)
Next, we proceed to visually explore the features by plotting the distribution of each variable below.
Variable = DIS Variable = RAD Variable = TAX Variable = BVariable = PTRATIO Variable = LSTAT Variable = MEDV
Variable = CRIM Variable = Zn Variable = INDUS Variable = CHAS Variable = NOX Variable = RM Variable = AGE
37
![Page 38: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/38.jpg)
PRE-PROCESSING ICORRECTING FOR SKEWNESS
After plotting , there was evidence of skwness in the data.
Correcting for skwness in the data prior to model fitting is important.
▪ For instance, if the response variable is skewed to the right, the model will be trained on a much
larger number of relatively inexpensive priced homes, and thus, it will be less likely to successfully
predict the range of most expensive houses.
▪ Furthermore, if the values of a certain independent variable (feature) are skewed, depending on
the model, skewness may go against the model assumptions or may make the interpretation of
feature importance difficult.
After various transformations (log,BoxCox), we decided to opt for log transformation.
NOTEWe should keep track of the transformation performed on
the features, because we would need to reverse these
transformation, once we make predictions.38
![Page 39: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/39.jpg)
Variable Plots (Distribution) After TransformationAfter correcting for skewness, some variables’ distributions display more symmetrical shapes (see for
instance NOX, LSTAT and DIS) . Hence, the transformations help.
Variable = DIS Variable = RAD Variable = TAX Variable = BVariable = PTRATIO Variable = LSTAT Variable = MEDV
Variable = CRIM Variable = Zn Variable = INDUS Variable = CHAS Variable = NOX Variable = RM Variable = AGE
39
![Page 40: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/40.jpg)
Bimodal Variables
Some features displayed a
bi-modal density function,
see for instance features:
INDUS, TAX and RAD.
Therefore, to take this into
account we modelled
these variables as a mixture of gaussians.
This option is available in
the python scikit-learn
package.
40
![Page 41: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/41.jpg)
41
CORRELATION HEAT MAP (Features vs Target)
From the correlation
matrix, the features
seem to be significantly
correlated with the
target variable (MEDV),
see last row or last
column on the right. This
is important as we
would want the feature
to be related to the
target variable.
![Page 42: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/42.jpg)
42
PRE-PROCESSING II : STANDARDISATION
▪ Next, as customary, we proceed to a standardization of the features.
▪ Although some algorithms are not sensitive to standardisation, it’s
usually customary to standardise the data prior to model building.
▪ We employ the inbuilt function RobustScaler, which is invoked with the
sklearn.preprocessing command.
▪ This scaler uses statistics that are robust to outliers.
▪ More precisely, this scaler removes the median and scales the data
according to the quantile range (defaults to IQR: Interquartile Range).
![Page 43: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/43.jpg)
43
PRE-PROCESSING III : REMOVAL OF REDUNDANT FEATURES
▪ Further, we can use the in-built function RFECV (Feature ranking with
recursive feature elimination and cross-validated selection of the best
number of features. ) to help us eliminate redundant features.
▪ The main idea is that if some of the variables don't add new
information, these are redundant, and hence need to be excluded
from the model.
▪ The RFECV only eliminates the ZN variable (this is deemed as
determined by the other variables, hence redundant).
![Page 44: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/44.jpg)
44
MODELLING AND PREDICTION
▪ We start by splitting the data into test and train data (# obs = 506).
▪ The size of the test data is 20% the over-all size of the data (0.20 * 506
= 101 obs).
▪ The objective is to compare the performance of the following
machine learning algorithms:
▪ Kernel Ridge Regressor; XGBoost Regressor; Light GMB Regressor;
Gradient Boosting Regressor and select the one that performs the
best.
![Page 45: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/45.jpg)
45
HOW DO WE TUNE/PARAMETRIZE
EACH MODEL?
▪ However, before comparing the models’ performance, we need to
decide how to parametrize each model. To select the best model
parametrization, we perform a grid search across several
combinations of parameters and select the best model
parametrization for each model based on the lowest Root Mean
Square Forecast Error (RMSFE)
▪ Then select, amongst all the best parametrized models, the best
overall performing model, again the one with the lowest RMSFE (this
time across models).
![Page 46: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/46.jpg)
46
BEST PERFORMANCE, RESULTS
ACROSS MODELS - RMSFE
ModelTraining
set
Validation
set
Kernel Ridge
Regression0.121 0.21
XG Boost Regressor 0.075 0.171
Light GBM Regressor 0.104 0.149
Gradient Boosting
Regressor0.011 0.170
The light GBM
Regressor is the best
performing model
across ALL models,
since it has the lowest
Validation Set RMSFE.
![Page 47: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/47.jpg)
47
L-GBM MODEL PREDICTION vs
ACTUAL VALUES - SCATTER PLOT
From the scatter plot we can
see that the predicted and
actual values are highly
correlated, which would
suggest that the model performs well when
compared to actual data.
![Page 48: MOCK PRESENTATION SLIDES REPORT](https://reader031.vdocuments.net/reader031/viewer/2022020705/61fba562d7aa4467f07d9f08/html5/thumbnails/48.jpg)
48