1 peter fox data analytics – itws-4963/itws-6965 week 7b, march 13, 2015 interpreting weighted...
TRANSCRIPT
![Page 1: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/1.jpg)
1
Peter Fox
Data Analytics – ITWS-4963/ITWS-6965
Week 7b, March 13, 2015
Interpreting weighted kNN, decision trees, cross-validation,
dimension reduction and scaling
![Page 2: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/2.jpg)
Contents
2
![Page 3: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/3.jpg)
Numeric v. non-numeric
3
![Page 4: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/4.jpg)
In R – data frame and types• Almost always at input R sees categorical
data as “strings” or “character”
• You can test for membership (as a type) using is.<x> (x=number, factor, etc.)
• You can “coerce” it (i.e. change the type) using as.<x> (same x)
• To tell R you have categorical types (also called enumerated types), R calls them “factor”…. Thus – as.factor()
4
![Page 5: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/5.jpg)
In Rfactor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA),
ordered(x, ...), is.factor(x), is.ordered(x), as.factor(x), as.ordered(x), addNA(x, ifany = FALSE)
levels - values that x might have taken.
labels - either an optional vector of labels for the levels (in the same order as levels after removing those in exclude), or a character string of length 1.
Exclude -
Ordered -
Nmax - upper bound on the number of levels;
Ifany -
5
![Page 6: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/6.jpg)
Relate to the datasets…• Abalone - Sex = {F, I, M} • Eye color?• EPI - GeoRegions?
• Sample v. population – levels and names IN the dataset versus all possible levels/names
6
![Page 7: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/7.jpg)
Weighted KNNrequire(kknn)
data(iris)
m <- dim(iris)[1]
val <- sample(1:m, size = round(m/3), replace = FALSE, prob = rep(1/m, m))
iris.learn <- iris[-val,] # train
iris.valid <- iris[val,] # test
iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1, kernel = "triangular") # Possible choices are "rectangular" (which is standard unweighted knn), "triangular", "epanechnikov" (or beta(2,2)), "biweight" (or beta(3,3)), "triweight" (or beta(4,4)), "cos", "inv", "gaussian", "rank" and "optimal".
7
![Page 8: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/8.jpg)
names(iris.kknn)• fitted.values Vector of predictions.• CL Matrix of classes of the k nearest neighbors.• W Matrix of weights of the k nearest neighbors.• D Matrix of distances of the k nearest neighbors.• C Matrix of indices of the k nearest neighbors.• prob Matrix of predicted class probabilities.• response Type of response variable, one of
continuous, nominal or ordinal.• distance Parameter of Minkowski distance.• call The matched call.• termsThe 'terms' object used. 8
![Page 9: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/9.jpg)
Look at the output> head(iris.kknn$W)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.4493696 0.2306555 0.1261857 0.1230131 0.07914805 0.07610159 0.014184110
[2,] 0.7567298 0.7385966 0.5663245 0.3593925 0.35652546 0.24159191 0.004312408
[3,] 0.5958406 0.2700476 0.2594478 0.2558161 0.09317996 0.09317996 0.042096849
[4,] 0.6022069 0.5193145 0.4229427 0.1607861 0.10804205 0.09637177 0.055297983
[5,] 0.7011985 0.6224216 0.5183945 0.2937705 0.16230921 0.13964231 0.053888244
[6,] 0.5898731 0.5270226 0.3273701 0.1791715 0.15297478 0.08446215 0.010180454
9
![Page 10: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/10.jpg)
Look at the output> head(iris.kknn$D)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.7259100 1.0142464 1.1519716 1.1561541 1.2139825 1.2179988 1.2996261
[2,] 0.2508639 0.2695631 0.4472127 0.6606040 0.6635606 0.7820818 1.0267680
[3,] 0.6498131 1.1736274 1.1906700 1.1965092 1.4579977 1.4579977 1.5401298
[4,] 0.2695631 0.3257349 0.3910409 0.5686904 0.6044323 0.6123406 0.6401741
[5,] 0.7338183 0.9272845 1.1827617 1.7344095 2.0572618 2.1129288 2.3235298
[6,] 0.5674645 0.6544263 0.9306719 1.1357241 1.1719707 1.2667669 1.3695454
10
![Page 11: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/11.jpg)
Look at the output> head(iris.kknn$C)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 86 38 43 73 92 85 60
[2,] 31 20 16 21 24 15 7
[3,] 48 80 44 36 50 63 98
[4,] 4 21 25 6 20 26 1
[5,] 68 79 70 65 87 84 75
[6,] 91 97 100 96 83 93 81
> head(iris.kknn$prob)
setosa versicolor virginica
[1,] 0 0.3377079 0.6622921
[2,] 1 0.0000000 0.0000000
[3,] 0 0.8060743 0.1939257
[4,] 1 0.0000000 0.0000000
[5,] 0 0.0000000 1.0000000
[6,] 0 0.0000000 1.000000011
![Page 12: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/12.jpg)
Look at the output> head(iris.kknn$fitted.values)
[1] virginica setosa versicolor setosa virginica virginica
Levels: setosa versicolor virginica
12
![Page 13: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/13.jpg)
Contingency tables
fitiris <- fitted(iris.kknn)
table(iris.valid$Species, fitiris)
fitiris
setosa versicolor virginica
setosa 17 0 0
versicolor 0 18 2
virginica 0 1 12
# rectangular – no weight
fitiris2
setosa versicolor virginica
setosa 17 0 0
versicolor 0 18 2
virginica 0 2 1113
![Page 14: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/14.jpg)
(Weighted) kNN• Advantages
– Robust to noisy training data (especially if we use inverse square of weighted distance as the “distance”)
– Effective if the training data is large
• Disadvantages– Need to determine value of parameter K (number of
nearest neighbors)– Distance based learning is not clear which type of
distance to use and which attribute to use to produce the best results. Shall we use all attributes or certain attributes only?
14
![Page 15: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/15.jpg)
Additional factors• Dimensionality – with too many dimensions
the closest neighbors are too far away to be considered close
• Overfitting – does closeness mean right classification (e.g. noise or incorrect data, like wrong street address -> wrong lat/lon) – beware of k=1!
• Correlated features – double weighting• Relative importance – including/ excluding
features15
![Page 16: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/16.jpg)
More factors• Sparseness – the standard distance measure
(Jaccard) loses meaning due to no overlap• Errors – unintentional and intentional• Computational complexity• Sensitivity to distance metrics – especially
due to different scales (recall ages, versus impressions, versus clicks and especially binary values: gender, logged in/not)
• Does not account for changes over time• Model updating as new data comes in 16
![Page 17: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/17.jpg)
Glasslibrary(e1071)
library(rpart)
data(Glass, package="mlbench")
index <- 1:nrow(Glass)
testindex <- sample(index, trunc(length(index)/3))
testset <- Glass[testindex,]
trainset <- Glass[-testindex,]
17
![Page 18: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/18.jpg)
Now what?# now what happens?
> rpart.model <- rpart(Type ~ ., data = trainset)
> rpart.pred <- predict(rpart.model, testset[,-10], type = "class”)
18
![Page 19: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/19.jpg)
General idea behind trees• Although the basic philosophy of all the classifiers
based on decision trees is identical, there are many possibilities for its construction.
• Among all the key points in the selection of an algorithm to build decision trees some of them should be highlighted for their importance:– Criteria for the choice of feature to be used in each
node– How to calculate the partition of the set of examples– When you decide that a node is a leaf– What is the criterion to select the class to assign to
each leaf19
![Page 20: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/20.jpg)
• Some important advantages can be pointed to the decision trees, including:– Can be applied to any type of data– The final structure of the classifier is quite simple and can
be stored and handled in a graceful manner– Handles very efficiently conditional information,
subdividing the space into sub-spaces that are handled individually
– Reveal normally robust and insensitive to misclassification in the training set
– The resulting trees are usually quite understandable and can be easily used to obtain a better understanding of the phenomenon in question. This is perhaps the most important of all the advantages listed
20
![Page 21: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/21.jpg)
Stopping – leaves on the tree• A number of stopping conditions can be used
to stop the recursive process. The algorithm stops when any one of the conditions is true:– All the samples belong to the same class, i.e.
have the same label since the sample is already "pure"
– Stop if most of the points are already of the same class. This is a generalization of the first approach, with some error threshold
– There are no remaining attributes on which the samples may be further partitioned
– There are no samples for the branch test attribute21
![Page 22: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/22.jpg)
Recursive partitioning• Recursive partitioning is a fundamental tool in data mining. It
helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome.
• The rpart programs build classification or regression models of a very general structure using a two stage procedure; the resulting models can be represented as binary trees.
22
![Page 23: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/23.jpg)
Recursive partitioning• The tree is built by the following process:
– first the single variable is found which best splits the data into two groups ('best' will be defined later). The data is separated, and then this process is applied separately to each sub-group, and so on recursively until the subgroups either reach a minimum size or until no improvement can be made.
– second stage of the procedure consists of using cross-validation to trim back the full tree.
23
![Page 24: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/24.jpg)
Why are we careful doing this?• Because we will USE these trees, i.e. apply
them to make decisions about what things are and what to do with them!
24
![Page 25: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/25.jpg)
> printcp(rpart.model)
Classification tree:
rpart(formula = Type ~ ., data = trainset)
Variables actually used in tree construction:
[1] Al Ba Mg RI
Root node error: 92/143 = 0.64336
n= 143
CP nsplit rel error xerror xstd
1 0.206522 0 1.00000 1.00000 0.062262
2 0.195652 1 0.79348 0.92391 0.063822
3 0.050725 2 0.59783 0.63043 0.063822
4 0.043478 5 0.44565 0.64130 0.063990
5 0.032609 6 0.40217 0.57609 0.062777
6 0.010000 7 0.36957 0.51087 0.06105625
![Page 26: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/26.jpg)
plotcp(rpart.model)
26
![Page 27: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/27.jpg)
> rsq.rpart(rpart.model)
Classification tree:
rpart(formula = Type ~ ., data = trainset)
Variables actually used in tree construction:
[1] Al Ba Mg RI
Root node error: 92/143 = 0.64336
n= 143
CP nsplit rel error xerror xstd
1 0.206522 0 1.00000 1.00000 0.062262
2 0.195652 1 0.79348 0.92391 0.063822
3 0.050725 2 0.59783 0.63043 0.063822
4 0.043478 5 0.44565 0.64130 0.063990
5 0.032609 6 0.40217 0.57609 0.062777
6 0.010000 7 0.36957 0.51087 0.061056
Warning message:
In rsq.rpart(rpart.model) : may not be applicable for this method 27
![Page 28: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/28.jpg)
rsq.rpart
28
![Page 29: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/29.jpg)
> print(rpart.model)
n= 143
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 143 92 2 (0.3 0.36 0.091 0.056 0.049 0.15)
2) Ba< 0.335 120 70 2 (0.35 0.42 0.11 0.058 0.058 0.0083)
4) Al< 1.42 71 33 1 (0.54 0.28 0.15 0.014 0.014 0)
8) RI>=1.517075 58 22 1 (0.62 0.28 0.086 0.017 0 0)
16) RI< 1.518015 21 1 1 (0.95 0 0.048 0 0 0) *
17) RI>=1.518015 37 21 1 (0.43 0.43 0.11 0.027 0 0)
34) RI>=1.51895 25 10 1 (0.6 0.2 0.16 0.04 0 0)
68) Mg>=3.415 18 4 1 (0.78 0 0.22 0 0 0) *
69) Mg< 3.415 7 2 2 (0.14 0.71 0 0.14 0 0) *
35) RI< 1.51895 12 1 2 (0.083 0.92 0 0 0 0) *
9) RI< 1.517075 13 7 3 (0.15 0.31 0.46 0 0.077 0) *
5) Al>=1.42 49 19 2 (0.082 0.61 0.041 0.12 0.12 0.02)
10) Mg>=2.62 33 6 2 (0.12 0.82 0.061 0 0 0) *
11) Mg< 2.62 16 10 5 (0 0.19 0 0.37 0.37 0.062) *
3) Ba>=0.335 23 3 7 (0.043 0.043 0 0.043 0 0.87) *
29
![Page 30: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/30.jpg)
Tree plot
30
plot(object, uniform=FALSE, branch=1, compress=FALSE, nspace, margin=0, minbranch=.3, args)
> plot(rpart.model,compress=TRUE)> text(rpart.model, use.n=TRUE)
![Page 31: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/31.jpg)
And if you are bravesummary(rpart.model)
… pages….
31
![Page 32: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/32.jpg)
Remember to LOOK at the data
> names(Glass)
[1] "RI" "Na" "Mg" "Al" "Si" "K" "Ca" "Ba" "Fe" "Type"
> head(Glass)
RI Na Mg Al Si K Ca Ba Fe Type
1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0 0.00 1
2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0 0.00 1
3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0 0.00 1
4 1.51766 13.21 3.69 1.29 72.61 0.57 8.22 0 0.00 1
5 1.51742 13.27 3.62 1.24 73.08 0.55 8.07 0 0.00 1
6 1.51596 12.79 3.61 1.62 72.97 0.64 8.07 0 0.26 1
32
![Page 33: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/33.jpg)
rpart.pred> rpart.pred
91 163 138 135 172 20 1 148 169 206 126 157 107 39 150 203 151 110 73 104 85 93 144 160 145 89 204 7 92 51
1 1 2 1 5 2 1 1 5 7 2 1 7 1 1 7 2 2 2 1 2 2 2 2 1 2 7 1 2 1
186 14 190 56 184 82 125 12 168 175 159 36 117 114 154 62 139 5 18 98 27 183 42 66 155 180 71 83 123 11
7 1 7 2 2 2 1 1 5 5 2 1 1 1 1 7 2 1 1 1 1 5 1 1 1 5 2 1 2 2
195 101 136 45 130 6 72 87 173 121 3
7 2 1 1 5 2 1 2 5 1 2
Levels: 1 2 3 5 6 7
33
![Page 34: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/34.jpg)
plot(rpart.pred)
34
![Page 35: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/35.jpg)
Cross-validation• Cross-validation is a model validation
technique for assessing how the results of a statistical analysis will generalize to an independent data set.
• It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
• I.e. predictive and prescriptive analytics…35
![Page 36: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/36.jpg)
Cross-validationIn a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset).
Sound familiar?
36
![Page 37: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/37.jpg)
Cross-validationThe goal of cross validation is to define a dataset to "test" the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting
And, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc.
37
![Page 38: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/38.jpg)
Common type of x-validation• K-fold• 2-fold (do you know this one?)• Rep-random-subsample• Leave out-subsample
• Lab in a few weeks … to try these out
38
![Page 39: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/39.jpg)
K-fold• Original sample is randomly partitioned into k
equal size subsamples. • Of the k subsamples, a single subsample is
retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data.
• Repeat cross-validation process k times (folds), with each of the k subsamples used exactly once as the validation data. – The k results from the folds can then be
averaged (usually) to produce a single estimation.
39
![Page 40: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/40.jpg)
Leave out subsample• As the name suggests, leave-one-out cross-
validation (LOOCV) involves using a single observation from the original sample as the validation data, and the remaining observations as the training data.
• i.e. K=n-fold cross-validation
• Leave out > 1 = bootstraping and jackknifing
40
![Page 41: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/41.jpg)
boot(strapping)• Generate replicates of a statistic applied to
data (parametric and nonparametric).– nonparametric bootstrap, possible methods:
ordinary bootstrap, the balanced bootstrap, antithetic resampling, and permutation.
• For nonparametric multi-sample problems stratified resampling is used: – this is specified by including a vector of strata in
the call to boot. – importance resampling weights may be specified.
41
![Page 42: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/42.jpg)
Jackknifing• Systematically recompute the statistic
estimate, leaving out one or more observations at a time from the sample set
• From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated.
• Often use log(variance) [instead of variance] especially for non-normal distributions
42
![Page 43: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/43.jpg)
Repeat-random-subsample• Random split of the dataset into training and
validation data. – For each such split, the model is fit to the training
data, and predictive accuracy is assessed using the validation data.
• Results are then averaged over the splits.
• Note: for this method can the results will vary if the analysis is repeated with different random splits.
43
![Page 44: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/44.jpg)
Advantage?• The advantage of K-fold over repeated
random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. – 10-fold cross-validation is commonly used
• The advantage of rep-random over k-fold cross validation is that the proportion of the training/validation split is not dependent on the number of iterations (folds). 44
![Page 45: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/45.jpg)
Disadvantage• The disadvantage of rep-random is that some
observations may never be selected in the validation subsample, whereas others may be selected more than once. – i.e., validation subsets may overlap.
45
![Page 46: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/46.jpg)
New dataset to work with treesfitK <- rpart(Kyphosis ~ Age + Number + Start, method="class", data=kyphosis)
printcp(fitK) # display the results
plotcp(fitK) # visualize cross-validation results
summary(fitK) # detailed summary of splits
# plot tree
plot(fitK, uniform=TRUE, main="Classification Tree for Kyphosis")
text(fitK, use.n=TRUE, all=TRUE, cex=.8)
# create attractive postscript plot of tree
post(fitK, file = “kyphosistree.ps", title = "Classification Tree for Kyphosis") # might need to convert to PDF (distill)
46
![Page 47: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/47.jpg)
47
![Page 48: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/48.jpg)
48
> pfitK<- prune(fitK, cp= fitK$cptable[which.min(fitK$cptable[,"xerror"]),"CP"])> plot(pfitK, uniform=TRUE, main="Pruned Classification Tree for Kyphosis")> text(pfitK, use.n=TRUE, all=TRUE, cex=.8)> post(pfitK, file = “ptree.ps", title = "Pruned Classification Tree for Kyphosis”)
![Page 49: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/49.jpg)
49
> fitK <- ctree(Kyphosis ~ Age + Number + Start, data=kyphosis)> plot(fitK, main="Conditional Inference Tree for Kyphosis”)
![Page 50: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/50.jpg)
50
> plot(fitK, main="Conditional Inference Tree for Kyphosis",type="simple")
![Page 51: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/51.jpg)
Trees for the Titanicdata(Titanic)
rpart, ctree, hclust, etc. for Survived ~ .
51
![Page 52: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/52.jpg)
randomForest> require(randomForest)
> fitKF <- randomForest(Kyphosis ~ Age + Number + Start, data=kyphosis)
> print(fitKF) # view results
Call:
randomForest(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 1
OOB estimate of error rate: 20.99%
Confusion matrix:
absent present class.error
absent 59 5 0.0781250
present 12 5 0.7058824
> importance(fitKF) # importance of each predictor
MeanDecreaseGini
Age 8.654112
Number 5.584019
Start 10.168591 52
Random forests improve predictive accuracy by generating a large number of bootstrapped trees (based on random samples of variables), classifying a case using each tree in this new "forest", and deciding a final predicted outcome by combining the results across all of the trees (an average in regression, a majority vote in classification).
![Page 53: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/53.jpg)
More on another dataset.# Regression Tree Example
library(rpart)
# build the tree
fitM <- rpart(Mileage~Price + Country + Reliability + Type, method="anova", data=cu.summary)
printcp(fitM) # display the results….
Root node error: 1354.6/60 = 22.576
n=60 (57 observations deleted due to missingness)
CP nsplit rel error xerror xstd
1 0.622885 0 1.00000 1.03165 0.176920
2 0.132061 1 0.37711 0.51693 0.102454
3 0.025441 2 0.24505 0.36063 0.079819
4 0.011604 3 0.21961 0.34878 0.080273
5 0.010000 4 0.20801 0.36392 0.075650 53
![Page 54: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/54.jpg)
Mileage…plotcp(fitM) # visualize cross-validation results
summary(fitM) # detailed summary of splits
<we will look more at this in a future lab> 54
![Page 55: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/55.jpg)
55
par(mfrow=c(1,2)) rsq.rpart(fitM) # visualize cross-validation results
![Page 56: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/56.jpg)
# plot tree
plot(fitM, uniform=TRUE, main="Regression Tree for Mileage ")
text(fitM, use.n=TRUE, all=TRUE, cex=.8)
# prune the tree
pfitM<- prune(fitM, cp=0.01160389) # from cptable
# plot the pruned tree
plot(pfitM, uniform=TRUE, main="Pruned Regression Tree for Mileage")
text(pfitM, use.n=TRUE, all=TRUE, cex=.8)
post(pfitM, file = ”ptree2.ps", title = "Pruned Regression Tree for Mileage”)
56
![Page 57: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/57.jpg)
57
![Page 58: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/58.jpg)
# Conditional Inference Tree for Mileage
fit2M <- ctree(Mileage~Price + Country + Reliability + Type, data=na.omit(cu.summary))
58
![Page 59: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/59.jpg)
Enough of trees!
59
![Page 60: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/60.jpg)
Dimension reduction..• Principle component analysis (PCA) and
metaPCA (in R)• Singular Value Decomposition• Feature selection, reduction• Built into a lot of clustering• Why?
– Curse of dimensionality – or – some subset of the data should not be used as it adds noise
• What is it?– Various methods to reach an optimal subset
60
![Page 61: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/61.jpg)
Simple example
61
![Page 62: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/62.jpg)
More dimensions
62
![Page 63: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/63.jpg)
Feature selection• The goodness of a feature/feature subset is
dependent on measures
• Various measures– Information measures – Distance measures – Dependence measures – Consistency measures – Accuracy measures
63
![Page 64: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/64.jpg)
On your own…library(EDR) # effective dimension reduction
library(dr)
library(clustrd)
install.packages("edrGraphicalTools")
library(edrGraphicalTools)
demo(edr_ex1)
demo(edr_ex2)
demo(edr_ex3)
demo(edr_ex4)64
![Page 65: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/65.jpg)
Some examples• Lab8b_dr1_2015.R• Lab8b_dr2_2015.R• Lab8b_dr3_2015.R• Lab8b_dr4_2015.R
65
![Page 66: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/66.jpg)
Multidimensional Scaling• Visual representation ~ 2-D plot - patterns of
proximity in a lower dimensional space• "Similar" to PCA/DR but uses dissimilarity as input -
> dissimilarity matrix– An MDS algorithm aims to place each object in N-
dimensional space such that the between-object distances are preserved as well as possible.
– Each object is then assigned coordinates in each of the N dimensions.
– The number of dimensions of an MDS plot N can exceed 2 and is specified a priori.
– Choosing N=2 optimizes the object locations for a two-dimensional scatterplot 66
![Page 67: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/67.jpg)
Four types of MDS• Classical multidimensional scaling
– Also known as Principal Coordinates Analysis, Torgerson Scaling or Torgerson–Gower scaling. Takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain.
• Metric multidimensional scaling– A superset of classical MDS that generalizes the
optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization.
67
![Page 68: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/68.jpg)
Four types of MDS ctd• Non-metric multidimensional scaling
– In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using isotonic regression.
• Generalized multidimensional scaling– An extension of metric multidimensional scaling, in which
the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. 68
![Page 69: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/69.jpg)
In Rfunction (library)• cmdscale() (stats)• smacofSym() (smacof)• wcmdscale() (vegan)• pco() (ecodist)• pco() (labdsv)• pcoa() (ape)
• Only stats is loaded by default, and the rest are not installed by default 69
![Page 70: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/70.jpg)
cmdscale()cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE)
d - a distance structure such as that returned by dist or a full symmetric matrix containing the dissimilarities.
k - the maximum dimension of the space which the data are to be represented in; must be in {1, 2, …, n-1}.
eig - indicates whether eigenvalues should be returned.
add - logical indicating if an additive constant c* should be computed, and added to the non-diagonal dissimilarities such that the modified dissimilarities are Euclidean.
x.ret - indicates whether the doubly centred symmetric distance matrix should be returned.
70
![Page 71: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/71.jpg)
Distances between Australian cities
row.names(dist.au) <- dist.au[, 1]
dist.au <- dist.au[, -1]
dist.au
## A AS B D H M P S
## A 0 1328 1600 2616 1161 653 2130 1161
## AS 1328 0 1962 1289 2463 1889 1991 2026
## B 1600 1962 0 2846 1788 1374 3604 732
## D 2616 1289 2846 0 3734 3146 2652 3146
## H 1161 2463 1788 3734 0 598 3008 1057
## M 653 1889 1374 3146 598 0 2720 713
## P 2130 1991 3604 2652 3008 2720 0 3288
## S 1161 2026 732 3146 1057 713 3288 071
![Page 72: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/72.jpg)
Distances between Australian cities
fit <- cmdscale(dist.au, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]
plot(x, y, pch = 19, xlim = range(x) + c(0, 600))
city.names <- c("Adelaide", "Alice Springs", "Brisbane", "Darwin", "Hobart",
"Melbourne", "Perth", "Sydney")
text(x, y, pos = 4, labels = city.names)
72
![Page 73: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/73.jpg)
73
![Page 74: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/74.jpg)
R – many ways (of course)library(igraph)
g <- graph.full(nrow(dist.au))
V(g)$label <- city.names
layout <- layout.mds(g, dist = as.matrix(dist.au))
plot(g, layout = layout, vertex.size = 3)
74
![Page 75: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/75.jpg)
75
![Page 76: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/76.jpg)
On your own• Lab8b_mds1_2015.R• Lab8b_mds2_2015.R• Lab8b_mds3_2015.R
• http://www.statmethods.net/advstats/mds.html
• http://gastonsanchez.com/blog/how-to/2013/01/23/MDS-in-R.html
76
![Page 77: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction](https://reader037.vdocuments.net/reader037/viewer/2022110402/56649e575503460f94b4f7be/html5/thumbnails/77.jpg)
Next week…• A5 – oral presentations on Tues (maybe Frid)• Rest of Friday – lab
• Next week - Spring break
• March 31 – SVM• April 3 – SVM and lab
77