conistency of random forests
TRANSCRIPT
![Page 1: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/1.jpg)
Consistency of Random ForestsHoang N.V.
[email protected] of Computer Science
FITA – Viet Nam Institute of Agriculture
Seminar IT R&D, HANUHa Noi, December 2015
![Page 2: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/2.jpg)
Machine Learning, what is?
“true”
Parametric
Non-parametric
Supervised problems: not too difficult
Unsupervised problems: is very difficult
Find a parameter which minimize the loss function
![Page 3: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/3.jpg)
Supervised Learning
ℒ𝑛
L is a loss function
Classification: zero-one loss function
Regression: 𝕃1, 𝕃2
![Page 4: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/4.jpg)
Bias-variance tradeoff
If the model is too simple, the solution is biased and does not fit the data.
If the model is too complex then it is very sensitive to small changes in the data.
![Page 5: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/5.jpg)
[Hastie et all., 2005]
![Page 6: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/6.jpg)
Ensemble Methods
![Page 7: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/7.jpg)
Bagging[Random Forest]
![Page 8: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/8.jpg)
Tree Predictor
ℒ𝑛
ℒ𝑛
Pick an internal node to split
Pick the best split in
Split A into two child nodes ( and )
Set
A splitting scheme induces a partition Λ of the feature space into non-overlapping rectangles 𝑃1, … , 𝑃ℓ.
![Page 9: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/9.jpg)
Tree Predictor
ℒ𝑛
ℒ𝑛
Select an internal node to split
Select the best split in
Split A into two child nodes ( and )
Set
A splitting scheme induces a partition Λ of the feature space into non-overlapping rectangles 𝑃1, … , 𝑃ℓ.
Predicting Rule
Λ
Λ ℒ𝑛
Λ ℒ𝑛
![Page 10: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/10.jpg)
Tree Predictor
ℒ𝑛
ℒ𝑛
Select an internal node to split
Select the best split in
Split A into two child nodes ( and )
Set
A splitting scheme induces a partition Λ of the feature space into non-overlapping rectangles 𝑃1, … , 𝑃ℓ.
Training Methods
ID3 (Iterative Dichotomiser 3)
C4.5
CART (Classification and Regression Tree)
CHAID
MARS
Conditional Inference Tree
Predicting Rule
Λ
Λ ℒ𝑛
Λ ℒ𝑛
![Page 11: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/11.jpg)
Forest = Aggregation of trees
Aggregating Rule
ℒ𝑛 ℒ𝑛
ℒ𝑛 ℒ𝑛 ℒ𝑛
![Page 12: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/12.jpg)
![Page 13: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/13.jpg)
Grow different trees from same learning set ℒ𝑛
Sampling with replacement [Breiman, 1994]
Random subspace sampling [Ho, 1995 & 1998]
Random output sampling [Breiman, 1998]
Randomized C4.5 [Dietterich, 1998]
Purely random forest [Breiman, 2000]
Extremely random trees [Guerts, 2006]
![Page 14: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/14.jpg)
Grow different trees from same learning set ℒ𝑛
Sampling with replacement - random subspace [Breiman, 2001]
Sampling with replacement - weighted subspace [Amaratunga, 2008; Xu, 2008; Wu, 2012]
Sampling with replacement - random subspace and regularized [Deng, 2012]
Sampling with replacement - random subspace and guided-regularized [Deng, 2013]
Sampling with replacement - random subspace and random split position selection [Saïp Ciss, 2014]
![Page 15: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/15.jpg)
Some RF extensions
quantile estimation Meinshausen, 2006
survival analysis Ishwaran et al., 2008
ranking Clemencon et al., 2013
online learning Denil et al., 2013;
Lakshminarayanan et al., 2014
GWA problems Yang et al., 2013; Botta et al., 2014
![Page 16: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/16.jpg)
What is a good learner?[What is friendly with my data?]
![Page 17: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/17.jpg)
What is good in high-dimensional settings
Breiman, 2001
Wu et al., 2012
Deng, 2012
Deng., 2013
Saïp Ciss, 2014
![Page 18: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/18.jpg)
Simulation Experiment
![Page 19: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/19.jpg)
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y1 y2 y3 y4
AA ABB B
![Page 20: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/20.jpg)
Random Forest [Breiman, 2001]
![Page 21: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/21.jpg)
WSRF [Wu, 2012]
![Page 22: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/22.jpg)
Random Uniform Forest [Saïp Ciss, 2014]
![Page 23: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/23.jpg)
RRF [Deng, 2012]
![Page 24: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/24.jpg)
GRRF [Deng, 2013]
![Page 25: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/25.jpg)
![Page 26: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/26.jpg)
Simulation Experiment
![Page 27: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/27.jpg)
Random Forest [Breiman, 2001]
![Page 28: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/28.jpg)
WSRF [Wu, 2012]
![Page 29: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/29.jpg)
Random Uniform Forest [Saïp Ciss, 2014]
![Page 30: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/30.jpg)
RRF [Deng, 2012]
![Page 31: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/31.jpg)
GRRF [Deng, 2013]
![Page 32: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/32.jpg)
GRRF with AUC [Deng, 2013]
![Page 33: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/33.jpg)
GRRF with ER [Deng, 2013]
![Page 34: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/34.jpg)
Simulation Experiment
B
A
C
D
B E
A
E D
Multiple Class Tree
![Page 35: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/35.jpg)
Random Forest [Breiman, 2001]
![Page 36: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/36.jpg)
WSRF [Wu, 2012]
![Page 37: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/37.jpg)
Random Uniform Forest [Saïp Ciss, 2014]
![Page 38: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/38.jpg)
RRF [Deng, 2012]
![Page 39: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/39.jpg)
GRRF [Deng, 2013]
![Page 40: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/40.jpg)
GRRF with ER [Deng, 2013]
![Page 41: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/41.jpg)
What is a good learner?[Nothing you do will convince me]
[I need rigorous theoretical guarantees]
![Page 42: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/42.jpg)
Asymptotic statistics and learning theory[go beyond experiment results]
![Page 43: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/43.jpg)
Machine Learning, what is?
Parametric
Non-parametric
Supervised problems: not too difficult
Unsupervised problems: is very difficult
Find a parameter which minimize the loss function
![Page 44: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/44.jpg)
“right”
Pattern (which learnt from ℒ𝑛 ) is “true”, isn’t it? How much do I believe?
Is this procedure friendly with my data?
What is the best possible procedure for my problem?
What is if our assumptions are wrong?
“efficient”
How many observations do I need in order to achieve a “believed” pattern?
How many computations do I need?
Assumption: There are some patterns
![Page 45: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/45.jpg)
Learning Theory [Vapnik, 1999]
asymptotic theory
necessary and sufficient conditions
the best
possible
![Page 46: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/46.jpg)
Supervised Learning
ℒ𝑛
L is a loss function
Classification: zero-one loss function
Regression: 𝕃1, 𝕃2
![Page 47: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/47.jpg)
Supervised Learning
Generator 𝒙
𝒙
𝑦
𝑦 𝑦′
Supervisor
Machine Learning
Two different goals
imitate (prediction accuracy)
identify (interpretability)
![Page 48: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/48.jpg)
What is the best predictor?
![Page 49: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/49.jpg)
What is the best predictor
Bayes model
residual error
A model built from any learning set ℒ, Err( ) ≤ Err( )
In theory, when 𝑃(𝑋𝑌) is known
![Page 50: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/50.jpg)
What is the best predictor
If is zero-one loss function, the Bayes model is
In classification, the best possible classifier consists in systematically predicting the most likely class 𝑦 ∈ {𝑐1, … , 𝑐𝐽} given 𝑋 = 𝒙
![Page 51: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/51.jpg)
What is the best predictor
If is the squared error loss function, the Bayes model is
In regression, the best possible regressor consists in systematically predicting the average value of 𝑌 given 𝑋 = 𝒙
![Page 52: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/52.jpg)
Given a learning algorithm 𝒜 and a loss function
𝜑𝑛 = 𝒜(ℒ𝑛)
ℒ𝑛
𝒜 ℒ𝑛 }
Learning algorithm 𝒜 is consistent in L if and only if
𝐸𝑟𝑟( ) ⟶𝑛→∞𝑃 𝐸𝑟𝑟(𝜑𝐵)
𝐸𝑟𝑟(𝜑𝑛 ) ⟶𝑛→∞𝑃 𝐸𝑟𝑟(𝜑𝐵)
![Page 53: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/53.jpg)
Random Forests are consistent, aren’t they?
![Page 54: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/54.jpg)
𝒜
Θ 𝒜
Θ is used to sample the training set or to select the candidate directionsor positions for splitting
Θ is independent of the dataset and thus unrelated to the particularproblem
In some new variants of RF, Θ is depend on the dataset
Generalized Random Forest
![Page 55: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/55.jpg)
Bagging Procedure
Θ Θ1, … , Θm
Θi ℒ𝑛 𝒜(Θ𝑖 ℒ𝑛)
ℒ𝑛 Θ𝑖 ℒ𝑛
ℒ𝑛 Θ1 ℒ𝑛 Θ𝑚 ℒ𝑛
Generalized Random Forest
![Page 56: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/56.jpg)
Consistency of Random Forests
lim𝑛→∞
Err(𝐻𝑚 . ; ℒ𝑛 ) = Err(𝜑𝐵)Problem
- m is finite ⇒ predictor depend on trees that formed forest
- structure of a tree depend on Θi and learning set⟹ finite forest is actually a subtle combination of randomness and
depending-on-data structures⟹ finite forests predictions can be difficult to interpret (randomprediction or not)
- Non-asymptotic rate of convergence
Challenges
![Page 57: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/57.jpg)
𝐻∞(𝑥; ℒ𝑛) = 𝔼Θ{ Θ ℒ𝑛 }
lim𝑚→∞
𝐻𝑚(𝑥; ℒ𝑛) = 𝐻∞(𝑥; ℒ𝑛)
Problem
- infinite forest is good than finite forest, isn’t it?- What is good m? (rate of convergence)
Challenge
Consistency of Random Forests
![Page 58: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/58.jpg)
Review some recent results
![Page 59: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/59.jpg)
Strength, Correlation and Err [Breiman, 2001]
Θi ℒ𝑛 Θi ℒ𝑛
Θ ℒ𝑛 Θ ℒ𝑛
Theorem 2.3 An upper bound for the generalization error is given by:𝑃𝐸∗ ≤ 𝜌(1 − 𝑠2)/𝑠2
where 𝜌 is the mean value of the correlation, s is the strength of theset of classifiers.
![Page 60: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/60.jpg)
RF and Adaptive Nearest Neighbors [Lin et al, 2006]
Θi ℒ𝑛 𝒜(Θ𝑖 ℒ𝑛)
Θ
Θi ℒ𝑛=
1
𝒙𝐣:𝒙
𝐣∈ 𝐿 Θ𝑖,𝒙
𝑗:𝒙𝐣∈ 𝐿 Θ𝑖,𝒙
𝑦j = 𝑗=1𝑛 𝑤
𝑗Λ𝑖
𝑦𝑗
ℒ𝑛 Θ𝑖 ℒ𝑛 𝑗=1𝑛 𝑤𝑗𝑦𝑗
𝑤𝑗 𝑤𝑗Λ𝑖
𝐻∞(𝒙; ℒ𝑛) ΘΘ
Θ
Non-adaptive if 𝑤𝑗 not depend on 𝑦𝑖’s of the learning set
![Page 61: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/61.jpg)
RF and Adaptive Nearest Neighbors [Lin et al, 2006]
Θi ℒ𝑛 𝒜(Θ𝑖 ℒ𝑛)
Θ
Θi ℒ𝑛=
1
𝒙𝐣:𝒙
𝐣∈ 𝐿 Θ𝑖,𝒙
𝑗:𝒙𝐣∈ 𝐿 Θ𝑖,𝒙
𝑦j = 𝑗=1𝑛 𝑤
𝑗Λ𝑖
𝑦𝑗
ℒ𝑛 Θ𝑖 ℒ𝑛 𝑗=1𝑛 𝑤𝑗𝑦𝑗
𝑤𝑗 𝑤𝑗Λ𝑖
𝐻∞(𝒙; ℒ𝑛) ΘΘ
Θ
Non-adaptive if 𝑤𝑗 not depend on 𝑦𝑖’s of the learning set
The terminal node size k should be made to increase with the sample size 𝑛. Therefore, growing large trees (k being a small constant) does not always
give the best performance.
![Page 62: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/62.jpg)
Biau et al, 2008Given a learning set ℒ𝑛 = 𝒙1, 𝑦1 , … , 𝒙𝑛, 𝑦𝑛 of ℝ𝑑 ∗ {0, 1}
Binary classifier 𝜑𝑛 which trained from ℒ𝑛: ℝ𝑑 ⟶ {0, 1}
𝜑𝑛 𝜑𝑛(𝑋) ≠ 𝑌
𝜑𝐵 𝑥 = 𝕀{ } 𝜑𝐵
A sequence {𝜑𝑛} of classifiers is consistent for a certain distribution of (𝑋, 𝑌) if 𝐸𝑟𝑟(𝜑𝑛) ⟶ in probability
Assume that the sequence {𝑇𝑛} of randomized classifiers is consistent for a certain distribution of 𝑋, 𝑌 . Then the voting classifier 𝐻𝑚(for any value of m) and the averaged classifier 𝐻∞ are also consistent.
![Page 63: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/63.jpg)
Biau et al, 2008
Growing Trees
Node 𝐴 is randomly selected
The split feature j is selected uniformly at random from [1, … , 𝑝]
Finally, the selected node is split along the randomly chosen feature at a random location
* Recursive node splits do not depend on the labels 𝑦1, … , 𝑦𝑛
Theorem 2 Assume that the distribution of 𝑋 is supported on [0, 1]𝑑.Then purely random forest classifier 𝐻∞ is consistent whenever 𝑘 ⟶∞ and 𝑘
𝑛 ⟶ 0 as n ⟶ ∞.
![Page 64: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/64.jpg)
Biau et al, 2008
Growing Trees
ℒ𝑛 𝒜( )
Theorem 6 Let {𝑇Λ} be a sequence of classifiers that is consistent for thedistribution of 𝑋𝑌. Consider the bagging classifier 𝐻𝑚 and 𝐻∞, using parameter𝑞𝑛. If 𝑛𝑞𝑛 ⟶ ∞ as n ⟶ ∞ then both classifier are consistent.
![Page 65: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/65.jpg)
Biau et al, 2012
Growing Trees
At each node, a coordinate is selected with 𝑝𝑛𝑗 ∈ (0, 1) is the probability j-th feature is selected
the split is at the midpoint of the chosen side
Theorem 1 Assume that the distribution of 𝑋 has support on [0, 1]𝑑.Then the random forests estimate 𝐻∞(𝒙; ℒ𝑛) is consistent whenever
𝑝𝑛𝑗𝑙𝑜𝑔𝑘𝑛 ⟶ ∞ for all j=1, …, p and 𝑘𝑛𝑛 ⟶ 0 as 𝑛 ⟶ ∞.
![Page 66: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/66.jpg)
Biau et al, 2012
Assume that X is uniformly distributed on [0,1]𝑝
𝒑𝒏𝒋 = (𝟏/𝑺)(𝟏 + 𝝃𝒏𝒋) 𝒇𝒐𝒓 𝒋 ∈ 𝓢
In sparse settings
Estimation Error (variance)
𝔼{[𝐻∞ 𝒙; ℒ𝑛 − 𝐻∞ 𝒙; ℒ𝑛 ]2} ≤ 𝐶𝜎2S2
S − 1
𝑆2𝑝
(1 + 𝜉𝑛)𝑘𝑛
𝑛(𝑙𝑜𝑔𝑘𝑛)𝑆/2𝑝
If 𝑎 < 𝑝𝑛𝑗 < 𝑏 form some constants 𝑎, 𝑏 ∈ 0,1 then
1 + 𝜉𝑛 ≤𝑆 − 1
𝑆2𝑎 1 − 𝑏
𝑆2𝑝
![Page 67: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/67.jpg)
Biau et al, 2012
Assume that X is uniformly distributed on [0,1]𝑝 and 𝜑𝐵 𝒙𝒮 is 𝐿 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 on [0,1]𝑠
𝒑𝒏𝒋 = (𝟏/𝑺)(𝟏 + 𝝃𝒏𝒋) 𝒇𝒐𝒓 𝒋 ∈ 𝓢
In sparse settings
Approximation Error (bias2)
𝔼 𝐻∞ 𝒙; ℒ𝑛 − 𝜑𝐵 𝒙2
≤ 2𝑆𝐿2𝑘𝑛−
0.75𝑆𝑙𝑜𝑔2 1+𝛾𝑛 + [ sup
𝑥∈[0,1]𝑝𝜑𝐵
2(𝒙)]𝑒−𝑛/2𝑘𝑛
where 𝛾𝑛 = min𝑗∈ 𝒮
𝜉𝑛𝑗 tends to 0 as n tends to infinity.
![Page 68: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/68.jpg)
Finite and infinite RFs [Scornet, 2014]
ℒ𝑛 Θ𝑖 ℒ𝑛
𝐻∞(𝒙; ℒ𝑛) = 𝔼Θ{ Θ ℒ𝑛 }
ℒ𝑛
ℒ𝑛 𝐻∞(𝒙; ℒ𝑛)
Theorem 3.1 Conditionally on ℒ𝑛, almost surely, for all 𝑥 ∈ 0, 1 𝑝, wehave: 𝐻𝑚 𝒙; ℒ𝑛
𝑀→∞𝐻∞(𝒙; ℒ𝑛).
![Page 69: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/69.jpg)
Finite and infinite RFs [Scornet, 2014]
One has 𝑌 = 𝑚 𝑋 + 𝜀 where 𝜀 is a centered Gaussian noise withfinite variance 𝜎2, independent of 𝑋,
and 𝑚 ∞ = sup𝑥∈[0,1]𝑝
|𝑚(𝑥)| < ∞.
Assumption H
Theorem 3.3 Assume H is satisfied. Then, for all m, 𝑛 ∈ ℕ∗,
𝐸𝑟𝑟 𝐻𝑚 𝒙; ℒ𝑛 = 𝐸𝑟𝑟 𝐻∞(𝒙; ℒ𝑛) +1
𝑚𝔼𝑋,ℒ
𝑛[𝕍Θ[𝑇Λ (𝒙; Θ, ℒ𝑛)]]
⇒ 𝑚 ≥8 𝑚 ∞
2 + 𝜎2
𝜀+
32𝜎2𝑙𝑜𝑔𝑛
𝜀𝑡ℎ𝑒𝑛 𝐸𝑟𝑟 Hm − 𝐸𝑟𝑟 H∞ ≤ 𝜀
0 ≤ 𝐸𝑟𝑟 Hm − 𝐸𝑟𝑟 H∞ ≤8
m( 𝑚 ∞
2 + 𝜎2(1 + 4𝑙𝑜𝑔𝑛))
![Page 70: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/70.jpg)
RF and Additive regression model [Scornet et al., 2015]
Growing Treeswithout replacement
Assume that 𝐴 is selected node and 𝐴 > 1
Select uniformly, without replacement, a subset ℳ𝑡𝑟𝑦 ⊂ 1, … , 𝑝 , |ℳ𝑡𝑟𝑦| = 𝑚𝑡𝑟𝑦
Select the best split in A by optimizing the CART-split criterion along the coordinates in ℳ𝑡𝑟𝑦
Cut the cell 𝐴 according to the best split. Call 𝐴𝐿 and 𝐴𝑅 the true resulting cell
Set 𝐴 𝐴𝐿 𝐴𝑅
![Page 71: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/71.jpg)
RF and Additive regression model [Scornet et al., 2015]
𝑌 = 𝑗=1𝑝
𝑚𝑗(𝑋(𝑗)) + 𝜀
Assumption H1
Theorem 3.1 Assume that (H1) is satisfied. Then, provided 𝑛 ⟶ ∞ and𝑡𝑛(𝑙𝑜𝑔𝑎𝑛)9/𝑎𝑛 ⟶ 0, is consistent.
Theorem 3.2 Assume that (H1) and (H2) are satisfied and let 𝑡𝑛 = 𝑎𝑛.Then, provided 𝑎𝑛 ⟶ ∞, 𝑡𝑛 ⟶ ∞ and 𝑎𝑛𝑙𝑜𝑔𝑛/𝑛 ⟶ 0, is consistent.
![Page 72: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/72.jpg)
RF and Additive regression model [Scornet et al., 2015]
, 𝑗1,𝑛 𝑋 , … , 𝑗𝑘,𝑛 𝑋 the first cut directions used to construct
the cell containing 𝑋. 𝑗𝑞,𝑛 𝑋 = ∞ if the cell has been cut strictly less than
q times.
Theorem 3.2 Assume that (H1) is satisfied. let k ∈ ℕ∗ and 𝜉 > 0.Assume that there is no interval [𝑎, 𝑏] and no 𝑗 ∈ {1, … , 𝑆} such that𝑚𝑗 is constant on [𝑎, 𝑏]. Then, with probability 1 − 𝜉, for all 𝑛 large
enough, we have, for all 1 ≤ 𝑞 ≤ 𝑘, 𝑗𝑞,𝑛 𝑋 ∈ {1, … , 𝑆}.
![Page 73: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/73.jpg)
[Wager, 2015]
A partition Λ is 𝛼, 𝑘 − 𝑣𝑎𝑙𝑖𝑑 if can generated by a recursive partitioning scheme inwhich each child node contains at least a fraction 𝛼 of the data points in its parentnode for some 0 < 𝛼 < 0.5, and each terminal node contains at least 𝑘 trainingexamples for some k ∈ N.
Given a dataset 𝑋, 𝒱𝛼,𝑘(𝑋) denote the set of 𝛼, 𝑘 − 𝑣𝑎𝑙𝑖𝑑 partitions
𝑇Λ: [0,1]𝑝→ ℝ, 𝑇Λ 𝒙 =1
|{𝒙𝑖: 𝒙𝑖 ∈ 𝐿(𝒙)}| {𝑖: 𝒙𝑖∈𝐿(𝒙)} 𝑦𝑖 (is called valid tree)
𝑇Λ∗: [0,1]𝑝→ ℝ, 𝑇Λ
∗ 𝒙 = 𝔼[𝑌|𝑋 ∈ 𝐿(𝒙)] (is called partition-optimal tree)
Whether we can treat 𝑇Λ as a good approximation to 𝑇Λ∗ the
supported on the partition Λ
![Page 74: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/74.jpg)
Given a learning set ℒ𝑛 of [0,1]𝑝∗ −𝑀
2,𝑀
2with 𝑋~𝑈([0,1]𝑝)
[Wager, 2015]
Theorem 1Given parameters n, p, k such that
lim𝑛→∞
log 𝑛 log 𝑝
𝑘= 0 𝑎𝑛𝑑 𝑝 = Ω(𝑛)
then
lim𝑛,𝑑,𝑘→∞
ℙ sup𝑥∈ 0,1 𝑝,Λ∈𝒱𝛼,𝑘
|𝑇Λ − 𝑇Λ∗| ≤ 6𝑀
log 𝑛 log(𝑝)
klog((1 − 𝛼)−1)= 1
![Page 75: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/75.jpg)
[Wager, 2015]
Growing Trees (Guest-and-check)
Select a currently un-split node 𝐴 containing at least 2k training examples
Pick a candidate splitting variable 𝑗 ∈ {1, … , 𝑝} uniformly at random
Pick the minimum squared error (ℓ( 𝜃)) splitting point 𝜃
If either there has already been a successful split along variable j for some other nod or
ℓ( 𝜃) ≥ 36𝑀2log 𝑛 log(𝑑)
𝑘𝑙𝑜𝑔((1 − 𝛼)−1)
The split succeeds and we cut the node 𝐴 at 𝜃 along the j-th variable; if not we do not split the node 𝐴 this time.
![Page 76: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/76.jpg)
[Wager, 2015]In sparse settings
and a set of sign variables 𝜎𝑗 ∈ ±1 such that, for all
𝑗 ∈ and all 𝑥 ∈ [0,1]𝑝,
𝔼 𝑌 𝑋 −𝑗 = 𝑥 −𝑗 , 𝑋 𝑗 >12
− 𝔼 𝑌 𝑋 −𝑗 = 𝑥 −𝑗 , 𝑋 𝑗 ≤12
≥ 𝛽𝜎𝑗
Assumption H1
is Lipschitz-continuous in
Assumption H2
![Page 77: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/77.jpg)
[Wager, 2015]In sparse settings
and a set of sign variables 𝜎𝑗 ∈ ±1 such that, for all
𝑗 ∈ and all 𝑥 ∈ [0,1]𝑝,
𝔼 𝑌 𝑋 −𝑗 = 𝑥 −𝑗 , 𝑋 𝑗 >12
− 𝔼 𝑌 𝑋 −𝑗 = 𝑥 −𝑗 , 𝑋 𝑗 ≤12
≥ 𝛽𝜎𝑗
Assumption H1
is Lipschitz-continuous in
Assumption H2
Theorem 2 Under the conditions of theorem 1, suppose thatassumptions in the sparse setting hold, then guest-and-check forest isconsistent.
![Page 78: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/78.jpg)
[Wager, 2015]
𝐻{Λ}1𝐵: [0,1]𝑝→ ℝ, 𝐻 Λ 1
𝐵 𝒙 =1
𝐵 𝑏=1
𝐵 𝑇Λ𝑏(𝒙) (is called valid forest)
𝐻{Λ}1
𝐵∗ : [0,1]𝑝→ ℝ, 𝐻
{Λ}1𝐵
∗ 𝒙 =1
𝐵 𝑏=1
𝐵 𝑇Λ𝑏
∗ (𝒙) (is called partition-optimal forest)
Theorem 4
lim𝑛,𝑑,𝑘→∞
ℙ sup𝐻∈ℋ𝛼,𝑘
1
𝑛
𝑖=1
𝑛
(𝑦𝑖 − 𝐻(𝑥𝑖))2 − 𝔼[(𝑌 − 𝐻(𝑋))2] ≤ 11𝑀2log 𝑛 log(𝑝)
klog((1 − 𝛼)−1)= 1
![Page 79: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/79.jpg)
ReferencesB. Efron. Estimation and accuracy after model selection. Journal of the American Statistical Association, 2013.
B. Lakshminarayanan et al. Mondrian forests: Efficient online random forests. arXiv:1406.2673, 2014.
B. Xu et al. Classifying very high-dimensional data with random forests built from small subspaces. International Journal of Data Warehousing and Mining (IJDWM) 8(2), 37:44–63, 2012.
D. Amaratunga et al. Enriched random forests. Bioinformatics (Oxford, England), 24(18): 2010–2014, 2008 doi:10.1093/bioinformatics/btn356
E. Scornet. On the asymptotics of random forests. arXiv:1409.2090, 2014
E. Scornet et al. Consistency of random forests. The Annals of Statistics. 43 (2015), no. 4, 1716--1741. doi:10.1214/15-AOS1321. http://projecteuclid.org/euclid.aos/1434546220.
G. Biau et al. Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9:2015–2033, 2008.
G. Biau. Analysis of a random forests model. Journal of Machine Learning Research, 13:1063–1095, 2012.
H. Deng et al. Feature Selection via Regularized Trees, The 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 2012.
H. Deng et al. Gene Selection with Guided Regularized Random Forest , Pattern Recognition, 46.12 (2013): 3483-3489
H. Ishwaran et al. Random survival forest. The Annals of Applied Statistics, 2:841–860, 2008.
L. Breiman. Bagging predictors. Technical Report No. 421, Statistics Department, UC Berkeley, 1994.
![Page 80: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/80.jpg)
ReferencesL.Breiman. Randomizing outputs to increase prediction accuracy, Technical Report 518, Statistics Department, UC Berkeley, 1998.
L. Breiman. Some infinite theory for predictor ensembles. Technical Report 577, Statistics Department, UC Berkeley, 2000.
L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.
M. Denil et al. Consistency of online random forests. International Conference on Machine Learning (ICML) 2013.
N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7:983–999, 2006.
P. Geurts et al. Extremely randomized trees. Machine Learning, 63(1):3-42, 2006.
Q. Wu et al. Snp selection and classification of genome-wide snp data using stratified sampling random forests. NanoBioscience, IEEE Transactions on 11(3): 216–227, 2012.
T. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization, Machine Learning 1-22, 1998
T.K. Ho. Random decision forests, Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on , 1:278-282, 14-16 Aug 1995, doi: 10.1109/ICDAR.1995.598994
T.K. Ho. The random subspace method for constructing decision forests, IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(8):832-844, 1998
Saïp Ciss. Random Uniform Forests. 2015. <hal-01104340v2>
S. Cl´emen¸con et al. Ranking forests. Journal of Machine Learning Research, 14:39–73, 2013.
![Page 81: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/81.jpg)
ReferencesS. Wager. Asymptotic theory for random forests. arXiv:1405.0352, 2014
S. Wager. Uniform convergence of random forests via adaptive concentration. arXiv:1503.06388, 2015
Y. Lin and Y. Jeon. Random forests and adaptive nearest neighbors. Journal of the American Statistical Association, 101:578–590, 2006.
V. N. Vapnik. An overview of Statistical Learning Theory. IEEE Trans. on Neural Networks, 10(5):988-999, 1999.
![Page 82: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/82.jpg)
RF and Additive regression model [Scornet et al., 2015]
the indicator that falls in the same cell as in the
random tree designed with ℒn and the random parameter
where ′is an independent copy of .
′, 𝑋1, … , 𝑋𝑛,
′, 𝑋1, … , 𝑋𝑛
![Page 83: Conistency of random forests](https://reader031.vdocuments.net/reader031/viewer/2022030214/588b21e51a28abed688b4f4d/html5/thumbnails/83.jpg)
RF and Additive regression model [Scornet et al., 2015]