quantile regression for extraordinarily large...
TRANSCRIPT
![Page 1: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/1.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile Regression for Extraordinarily LargeData
Shih-Kang ChaoDepartment of Statistics
Purdue University
November, 2016A joint work with Stanislav Volgushev and Guang Cheng
![Page 2: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/2.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
With the advance of technology, it is increasingly common thatdata set is so large that it cannot be stored in a single machine
Social media (views, likes, comments, images...)Meteorological and environmental surveillanceTransactions in e-commerceOthers...
Figure: A server room in Council Bluffs, Iowa. Photo: Google/ConnieZhou.
![Page 3: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/3.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Divide and Conquer (D&C)
To take advantage of the opportunities in massive data, we needto deal with storing (disc) and computational (memory,processor) bottlenecks.
Divide and conquer paradigm: Randomly divide N samples intoS groups. n = N/S
Problem(N)
subproblem(n)
subproblem(n)
subproblem(n)
subproblem(n)
![Page 4: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/4.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Aggregate solutions from subproblems to get a solution fororiginal problem
Can be implemented by computational platforms such asHadoop (White, 2012)
Data that are stored in distributed locations can beanalyzed in similar manner
Problem(N)
subproblem(n)
subproblem(n)
subproblem(n)
subproblem(n)
![Page 5: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/5.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Does D&C fit for statistical analysis?
Sometimes it does, but sometimes it doesn’t...
In the following, we give two simple examples.
![Page 6: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/6.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Example 1: sample mean
Problem(N)
subproblem(n)
subproblem(n)
subproblem(n)
subproblem(n)
Xn,1 Xn,2Xn,3
Xn,4
1
4
4∑
s=1
Xs =1
4n
4∑
s=1
n∑
i=1
Xis =1
N
N∑
i=1
Xi = XN .
It fits!
![Page 7: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/7.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Example 2: sample median
Problem(N)
subproblem(n)
subproblem(n)
subproblem(n)
subproblem(n)
X1(0.5) X2
(0.5)X3
(0.5)X4
(0.5)
Xs(0.5) = the middle value of ordered n samples in s group;
X(0.5) = overall median
1
4
4∑
s=1
Xs(0.5)
??= X(0.5)
![Page 8: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/8.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Example 2: sample median
Simulation 1: Xi ∼ N(0, 1); Simulation 2: Xi ∼ Exponential(1).N = 215. True median v.s. simulated S−1
∑Ss=1X
s(0.5)
0.2 0.4 0.6 0.8−0.2
0−0
.10
0.00
0.10
τ = 0.5 , N (0,1)
logN(S)
0.2 0.4 0.6 0.8
0.7
0.8
0.9
1.0
τ = 0.5 , EXP (1)
logN(S)
![Page 9: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/9.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Example 2: sample median
Simulation 1: Xi ∼ N(0, 1); Simulation 2: Xi ∼ Exponential(1).N = 215. True median v.s. simulated S−1
∑Ss=1X
s(0.5)
0.2 0.4 0.6 0.8−0.2
0−0
.10
0.00
0.10
τ = 0.5 , N (0,1)
logN(S)
0.2 0.4 0.6 0.8
0.7
0.8
0.9
1.0
τ = 0.5 , EXP (1)
logN(S)
![Page 10: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/10.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Major issues
When does the D&C algorithm work?
Especially for skewed and heavy tail distribution
Statistical inference
Asymptotic distribution
Inference for the ”whole” distribution?
Take advantage of massive size to discover subtle patternshidden in the ”whole” distribution
![Page 11: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/11.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Outline
1 Quantile regression
2 Two-step algorithm: D&C and quantile projection
3 Oracle rules: linear model and nonparametric model
4 Simulation
![Page 12: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/12.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile
Response Y , predictors X. For τ ∈ (0, 1), conditional quantilecurve Q(·; τ) of Y ∈ R conditional on X is defined through
P (Y ≤ Q(X; τ)|X = x) = τ ∀x.
Q(x; τ) at τ = 0.1, 0.25, 0.5, 0.75, 0.9 under different models
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
23
Y = 0.5*X + N(0,1)
x
y
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
−2
02
46
Y = X + (.5+X)*N(0,1)
x
y
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Y|X=x ~ Beta(1.5−x,.5+x)
x
y
![Page 13: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/13.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile regression v.s. mean regression
Mean regression:Yi = m(Xi) + εi,E[ε|X = x] = 0
m: Regression function, object ofinterest.
εi: ’errors’.
Quantile regression:P (Y ≤ Q(x; τ)|X = x) = τ
No strict distinction between ’signal’and ’noise’.
Object of interest: properties ofconditional distribution of Y |X = x.
Contains much richer informationthan just conditional mean.
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
10 15 20 25
−0.
050.
000.
050.
100.
150.
20
Age
Bon
e M
ass
Den
sity
(B
MD
)
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
10 15 20 25
−0.
050.
000.
050.
100.
150.
20
Age
Bon
e M
ass
Den
sity
(B
MD
)
![Page 14: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/14.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile curves v.s. conditional distribution
Q(x0; τ) = F−1(τ |x0),
where FY |X(y|x) is the conditional distribution function of Y givenX
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.05
0.10
τ
BM
D
Q( Age x0 = 13 ; τ)Q( Age x0 = 18 ; τ)Q( Age x0 = 23 ; τ)
0.00 0.05 0.10
0.0
0.2
0.4
0.6
0.8
1.0
BMD
CD
F
F(y | Age x0 = 13)F(y | Age x0 = 18)F(y | Age x0 = 23)
![Page 15: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/15.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile Regression as Optimization
{(Xi, Yi)}Ni=1 independent and identical samples in Rd+1
Koenker and Bassett (1978): if Q(x; τ) = β(τ)>x, estimate by
βor(τ) := arg minb
N∑
i=1
ρτ (Yi − b>Xi) (1.1)
where ρτ (u) := τu+ + (1− τ)u− ’check function’.
Optimization problem (1.2) is convex (but non-smooth),which can be solved by linear programming
Q(x0; τ) := x>0 βor(τ) for any x0
More generally, we can consider a series approximationmodel
![Page 16: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/16.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
A unified framework:
Q(x; τ) ≈ Z(x)>β(τ)
m := dim(Z) (it is possible that m→∞). Solve
βor(τ) := arg minb
N∑
i=1
ρτ{Yi − b>Z(Xi)
}(1.2)
Examples of Z(x): linear model with fixed/increasingdimension, B-splines, polynomials, trigonometricpolynomials
Q(x; τ) := Z(x)>βor(τ)
Need to control the ”bias” Q(x; τ)− Z(x)>β(τ)
Quantile Regression
![Page 17: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/17.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Summary for quantile regression
βor(τ) is computationally infeasible when N is so largethat cannot be handled with a single machine
To infer the ”whole” conditional distribution for fixed x0,by
FY |X(y|x0) =
∫ 1
01{Q(x0; τ) < y}dτ
where 1(A) = 1 if A is true. To approximate the integral,we need a to compute Q(x0; τ) for many τ
![Page 18: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/18.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
D&C algorithm at fixed τ
Problem(N)
subproblem(n)
subproblem(n)
subproblem(n)
subproblem(n)
β1(τ1) β2(τ1) β3(τ1)β4(τ1)
βs(τ) := arg minb∈Rm
n∑
i=1
ρτ{Yis − b>Z(Xis)
}(2.1)
β(τ) :=1
S
S∑
s=1
βs(τ). (2.2)
However, this is only for a fixed τ !
![Page 19: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/19.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile projection algorithm
To avoid repetitively applying D&C, take B := (B1, ..., Bq)>
B-spline basis defined on equidistant knots {t1, ..., tG} ⊂ T withdegree rτ ∈ N,
β(τ) := Ξ>B(τ). (2.3)
Computation of Ξ:
1 Define a grid of quantile levels {τ1, ..., τK} on [τL, τU ],K > q. For each τk, compute β(τk)
2 Compute for j = 1, ...,m
αj := arg minα∈Rq
K∑
k=1
(βj(τk)−α>B(τk)
)2. (2.4)
3 Set the matrix Ξ := [α1 α2 ... αm].
![Page 20: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/20.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Computation of FY |X(y|x)
Define Q(x; τ) := Z(x)>β(τ) = Z(x)>Ξ>B(τ).
Compute
FY |X(y|x0) := τL +
∫ τU
τL
1{Q(x0; τ) < y}dτ. (2.5)
where 0 < τL < τU < 1.
![Page 21: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/21.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Remarks
Both S and K can grow with N , and they decide thecomputational limit as well as the statistical performance
Find S and K such that β(τ) and β(τ) are ”close” to
βor(τ) in some statistical senseThis results in ”sharp” upper bound of S and lower boundof K which impose computational limit
The two-step procedure requires only one pass through theentire data
Quantile projection requires only {β(τ1), ...,β(τK)} of sizem×K, without the need to access the raw data set
![Page 22: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/22.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Oracle rules
For T = [τL, τU ] ⊂ (0, 1), under regularity conditions, C.,Volgushev and Cheng (2016): Chao et al. (2016)
aNu>(βor(τ)− β(τ)
) N at any fixed τ ∈ T (3.1)
aNu>(βor(τ)− β(τ)
) G(τ) as a process in τ ∈ T (3.2)
where N is a centered normal distribution and G is a centeredGaussian process
The oracle rule holds for β(τ) and β(τ) if β(τ) satisfies(3.1) and β(τ) satisfies (3.2)
Inference follows from the oracle rule
![Page 23: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/23.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Two leading models
Linear model: m = dim(Z(x)) is fixed, and Q(x; τ) = Z(x)>β(τ);
Univariate spline nonparametric model: m = dim(Z(x)) →∞ with N . cN (γN ) :=
∣∣Q(x; τ)− Z(x)>γN (τ)∣∣ 6= 0,
γN (τ) := arg minγ∈Rm
E[(Z>γ −Q(X; τ))2f(Q(X; τ)|X)
]. (3.3)
Oracle rule: spline model
γN can be defined in many other ways. We do not go intodetail here
Conditions imposed throughout the talk: Assumption (A)
Jm(τ) := E[ZZ>fY |X(Q(X; τ)|X)]
![Page 24: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/24.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Linear model with fixed dimension
P1(ξm,M, f , f ′, fmin): all distributions satisfying (A1)-(A3)with some constants 0 < ξm,M, f , f ′ <∞ and fmin > 0
Theorem 3.1 (Oracle Rule for β(τ))
Suppose that S = o(N1/2(logN)−2). ∀τ ∈ T and u ∈ Rm,
√Nu>(β(τ)− β(τ))
(u>Jm(τ)−1E[Z(X)Z(X)>]Jm(τ)−1u)1/2 N
(0, τ(1− τ)
),
If S & N1/2, then the weak convergence result above failsfor some P ∈ P1(1, d, f , f ′, fmin).
S = o(N1/2(logN)−2) is sharp: only miss by a factor of(logN)−2
![Page 25: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/25.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Λητc (T ): Holder class with smoothness ητ > 0.`∞(T ): set of all uniformly bounded real functions on T
Theorem 3.2 (Oracle Rule for β(·))
Suppose that τ 7→ Q(x0; τ) ∈ Λητc (T ) for a x0 ∈ X .
If S = o(N1/2(logN)−2), and K � G� N1/(2ητ ) withrτ ≥ ητ then the projection estimator β(τ) defined in (2.3)satisfies
√N(Z(x0)
>β(·)−Q(x0; ·)) G(·) in `∞(T ), (3.4)
where G is a centered Gaussian process. Covariance function
If G . N1/(2ητ ) the weak convergence in (3.4) fails for someP ∈ P1(1, d, f , f ′, fmin) with τ 7→ Q(x0; τ) ∈ Λητc (T ).
K � N1/(2ητ ) is sharp!
![Page 26: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/26.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Splines nonparametric model
We require an additional condition:
(L) For each x ∈ X , the vector Z(x) has zeroes in all but at mostr consecutive entries, where r is fixed. Furthermore,supx∈X E(Z(x),γN ) = O(1).
which guarantees that the matrix Jm(τ) to be a block matrix
PL(Z,M, f , f ′, fmin, R) :={
all sequences of distributions of (X,Y ) on Rd+1 satisfying
(A1)-(A3) with M, f , f ′ <∞, fmin > 0, and (L) for some r < R;
m2(logN)6 = o(N), cN (γN )2 = o(N−1/2)}. (3.5)
Definition cN
![Page 27: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/27.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Splines nonparametric model
Theorem 3.3 (Oracle rule for β(τ))
Let {(Xi, Yi)}Ni=1 be distributed PN ∈ PL(Z,M, f , f ′, fmin, R),m� N ζ for some ζ > 0.
S = o((Nm−1(logN)−4)1/2) then ∀τ ∈ T , x0 ∈ X ,
√NZ(x0)
>(β(τ)− γN (τ))
(Z(x0)>Jm(τ)−1E[ZZ>]Jm(τ)−1Z(x0))1/2 N
(0, τ(1− τ)
),
If S & (N/m)1/2 the weak convergence above fails for somePN ∈ PL(Z,M, f , f ′, fmin, R)
Sharpness of S = o((N/m)1/2(logN)−2): nonparametric rate isslower, again miss by a factor of (logN)−2
![Page 28: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/28.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Theorem 3.4 (Oracle rule for β(τ))
Let the assumptions of Theorem 3.3 hold and assume thatτ 7→ Q(x0; τ) ∈ Λητc (T ), rτ ≥ ητ ,
Suppose that cN (γN ) = o(N−1/2‖Z(x0)‖) andK � G� N1/(2ητ )‖Z(x0)‖−1/ητ , and the limitH(τ1, τ2) := limN→∞KN (τ1, τ2) > 0 KN ∀τ1, τ2 ∈ T .
√N
‖Z(x0)‖(Z(x0)
>β(·)−Q(x0; ·)) G(·) in `∞(T ),
where G is a centered Gaussian process with covariancestructure E[G(τ)G(τ ′)] = H(τ, τ ′).
If G . N1/(2ητ )‖Z(x0)‖−1/ητ the weak convergence fails forsome τ 7→ Q(x0; τ) ∈ Λητc (T ) for all S.
![Page 29: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/29.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Distribution function
F orY |X(·|x0) := τL +
∫ τU
τL
1{Z(x)>βor(τ) < y}dτ
The oracle rule holds for both models.
Corollary 3.5
Under the same conditions as Theorem 3.2 (linear model) orTheorem 3.4 (nonparametric spline model), we have for anyx0 ∈ X ,
√N(FY |X(·|x0)− FY |X(·|x0)
) −fY |X(·|x0)G
(FY |X(·|x0)
),
in `∞((Q(x0; τL), Q(x0; τU ))
), where G is a centered Gaussian
process defined in respective theorem. The same holds forF orY |X(·|x0).
![Page 30: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/30.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Phase transitions
K
SN1/2
log2N
N1/(2ητ )
N1/2
?
?
N1/2
m1/2 log2N
(Nm
)1/2
(Nm
)1/(2ητ )
Figure: Regions (S,K) for the oracle rule of linear model and splinenonparametric model. ”?” region is the discrepancy between thesufficient and necessary conditions.
![Page 31: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/31.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Setting: linear model
Model: Yi = 0.21 + β>m−1Xi + εi, m = 4, 16, 32
Xi follows a multivariate uniform distribution U([0, 1]m−1)with covariance matrix ΣX := E[XiX
>i ] with
Σjk = 0.120.7|j−k| for j, k = 1, ...,m− 1
ε ∼ N or ε ∼ EXP (skewed)
x0 = (1, (m− 1)−1/2l>m−1)>
From Theorem 3.1, the coverage probability of theα = 95% confidence interval is
P{x>0 β(τ) ∈
[x>0 β(τ)±N−1/2f−1ε,τ
√τ(1− τ)x>0 Σ−1X x0Φ
−1(1− α/2)]}
S∗ : the point of S where the coverage starts to drop
Additional information
![Page 32: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/32.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
β(τ), ε ∼ N (0, 0.12)
● ●● ●
●●
●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , τ = 0.1
logN(S)
Cov
erag
e P
roba
bilit
y
N = 214 , m = 4N = 222 , m = 4N = 214 , m = 16N = 222 , m = 16N = 214 , m = 32N = 222 , m = 32
●●
● ● ● ● ●
●● ● ●
●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , τ = 0.5
logN(S)
Cov
erag
e P
roba
bilit
y
N = 214 , m = 4N = 222 , m = 4N = 214 , m = 16N = 222 , m = 16N = 214 , m = 32N = 222 , m = 32
●● ● ●
● ●●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , τ = 0.9
logN(S)
Cov
erag
e P
roba
bilit
y
N = 214 , m = 4N = 222 , m = 4N = 214 , m = 16N = 222 , m = 16N = 214 , m = 32N = 222 , m = 32
![Page 33: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/33.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
β(τ), ε ∼ Exp(0.8)
●●
● ●
●
●
●
● ● ● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , τ = 0.1
logN(S)
Cov
erag
e P
roba
bilit
y
N = 214 , m = 4N = 222 , m = 4N = 214 , m = 16N = 222 , m = 16N = 214 , m = 32N = 222 , m = 32
● ● ● ●● ●
●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , τ = 0.5
logN(S)
Cov
erag
e P
roba
bilit
y
N = 214 , m = 4N = 222 , m = 4N = 214 , m = 16N = 222 , m = 16N = 214 , m = 32N = 222 , m = 32
● ● ●● ●
● ●●
●●
●
●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , τ = 0.9
logN(S)
Cov
erag
e P
roba
bilit
y
N = 214 , m = 4N = 222 , m = 4N = 214 , m = 16N = 222 , m = 16N = 214 , m = 32N = 222 , m = 32
![Page 34: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/34.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Summary for simulation of β(τ)
m increases, S∗ decreases
N increases, S∗ gets close to N1/2
ε ∼ N , coverage is symmetric in τ , ε ∼ Exp, coverage isasymmetric in τ
t distribution behaves similarly to normal distribution
![Page 35: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/35.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Quantile projection setting
B: cubic B-spline with q = dim(B) defined on G = 4 + qequidistant knots on [τL, τU ]. We require K > q so thatβ(τ) is computable (see (2.4))
N = 214
y0 = Q(x0; τ) so that FY |X(y0|x0) = τ
From Theorem 3.2, the coverage probability with sizeα = 0.95 is
P{τ ∈
[FY |X(Q(x0; τ)|x0)±N−1/2
√τ(1− τ)x>0 Σ−1X x0Φ
−1(1− α/2)]}
![Page 36: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/36.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
FY |X(y|x), ε ∼ N (0, 0.12), m = 4 for β(τ)
● ●● ●
●●
●
●
●
●
● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , m= 4 , y0 = Q( x0 ; τ = 0.1 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ●
●
●●
● ● ●●
●
●
●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , m= 4 , y0 = Q( x0 ; τ = 0.5 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ● ● ●●
●●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , m= 4 , y0 = Q( x0 ; τ = 0.9 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
![Page 37: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/37.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
FY |X(y|x), ε ∼ N (0, 0.12), m = 32 for β(τ)
●●
●●
●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , m= 32 , y0 = Q( x0 ; τ = 0.1 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ●●
● ● ● ●
●
●
●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , m= 32 , y0 = Q( x0 ; τ = 0.5 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
●● ●
●●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Normal(0, 0.12) , m= 32 , y0 = Q( x0 ; τ = 0.9 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
![Page 38: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/38.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
FY |X(y|x), ε ∼ Exp(0.8), m = 4 for β(τ)
● ●● ●
●
●
●
● ● ● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , m= 4 , y0 = Q( x0 ; τ = 0.1 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ● ●● ● ●
●
●
●● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , m= 4 , y0 = Q( x0 ; τ = 0.5 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ●● ● ● ● ●
●
● ●
● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , m= 4 , y0 = Q( x0 ; τ = 0.9 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
![Page 39: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/39.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
FY |X(y|x), ε ∼ Exp(0.8), m = 32 for β(τ)
● ●●
●
●
●
● ● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , m= 32 , y0 = Q( x0 ; τ = 0.1 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ● ● ●
●
●
●
● ●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , m= 32 , y0 = Q( x0 ; τ = 0.5 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
● ● ● ● ●● ●
●
●
0.0 0.2 0.4 0.6 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Exp (0.8) , m= 32 , y0 = Q( x0 ; τ = 0.9 )
logN(S)
Cov
erag
e P
roba
bilit
y
q = 7 , K = 20q = 7 , K = 30q = 10 , K = 20q = 10 , K = 30q = 15 , K = 20q = 15 , K = 30
![Page 40: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/40.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Summary for simulation of FY |X(y|x)
Increase in m lowers both S∗ and q∗ (the critical point in qfor the oracle rule)
Either S > S∗ or q < q∗ (q = G− 4) leads to the collapse ofthe oracle rule
Increase in q and K improves the coverage probability
For ε ∼ N , coverage is no longer symmetric in τ
![Page 41: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/41.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Thank you for your attention
![Page 42: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/42.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
References I
Chao, S., Volgushev, S., and Cheng, G. (2016). Quantile process forsemi and nonparametric regression models. ArXiv Preprint Arxiv1604.02130.
White, T. (2012). Hadoop: The Definitive Guide. O’Reilly Media /Yahoo Press.
![Page 43: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/43.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
Assumption (A): data (Xi, Yi)i=1,...,N form triangular array andare row-wise i.i.d. with Linear models
(A1) Assume that ‖Zi‖ ≤ ξm <∞, where ξm = O(N b) is allowedto increase to infinity, and that
1/M ≤ λmin(E[ZZT ]) ≤ λmax(E[ZZT ]) ≤M
holds uniformly in n for some fixed constant M .
(A2) The conditional distribution FY |X(y|x) is twicedifferentiable w.r.t. y. Denote the correspondingderivatives by fY |X(y|x) and f ′Y |X(y|x). Assume that
f := supy,x|fY |X(y|x)| <∞, f ′ := sup
y,x|f ′Y |X(y|x)| <∞
uniformly in n.
(A3) Assume that
0 < fmin ≤ infτ∈T
infxfY |X(Q(x; τ)|x)
uniformly in n.
![Page 44: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/44.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
E[G(τ)G(τ ′)
](5.1)
= Z(x0)>Jm(τ)−1E
[Z(X)Z(X)>
]Jm(τ ′)−1Z(x0)(τ ∧ τ ′ − ττ ′).
Process: linear model
K(τ1, τ2)
:= ‖Z(x0)‖−2Z(x0)>J−1m (τ1)E[ZZ>]J−1m (τ2)Z(x0)(τ1 ∧ τ2 − τ1τ2)
Process: local polynomial
![Page 45: Quantile Regression for Extraordinarily Large Datafaculty.missouri.edu/~chaosh/doc/pubs/BDQR_Chao.pdf · 2018-08-16 · Quantile regressionTwo-step algorithmOracle rulesSimulationReferences](https://reader036.vdocuments.net/reader036/viewer/2022081517/5f029c727e708231d405204b/html5/thumbnails/45.jpg)
Quantile regression Two-step algorithm Oracle rules Simulation References
β3 = (0.21,−0.89, 0.38)>;
β15 = (β>3 , 0.63, 0.11, 1.01,−1.79,−1.39,
0.52,−1.62, 1.26,−0.72, 0.43,−0.41,−0.02)>;
β31 = (β>15, 0.21,β>15)>.
(5.2)
β(τ) = (0.21 + 0.1× Φ−1σ=0.1(τ),β>m−1)>
fε,τ > 0 is the height of the density of εi evaluated at εi’s τquantile. Simulation setting