air force institute of technologyquantiles, is shown to be the solution to a parametric minimization...
TRANSCRIPT
![Page 1: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/1.jpg)
DUALITY BEHAVIORS OF THE QUANTILE REGRESSION MODEL
ESTIMATION PROBLEM
DISSERTATION
Paul D. Robinson II, Major, USAF
AFIT-ENS-DS-17-S-043
DEPARTMENT OF THE AIR FORCEAIR UNIVERSITY
AIR FORCE INSTITUTE OF TECHNOLOGY
Wright-Patterson Air Force Base, Ohio
DISTRIBUTION STATEMENT A:APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED
![Page 2: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/2.jpg)
The views expressed in this dissertation are those of the author and do not reflectthe official policy or position of the United States Air Force, the Department ofDefense, or the United States Government. This material is declared a work of theU.S. Government and is not subject to copyright protection in the United States.
![Page 3: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/3.jpg)
AFIT-ENS-DS-17-S-043
DUALITY BEHAVIORS OF THE QUANTILE REGRESSION MODEL
ESTIMATION PROBLEM
DISSERTATION
Presented to the Faculty
Department of Operational Sciences
Graduate School of Engineering and Management
Air Force Institute of Technology
Air University
Air Education and Training Command
in Partial Fulfillment of the Requirements for the
Degree of Doctor of Philosophy in Operations Research
Paul D. Robinson II, B.A., M.S.
Major, USAF
September 2017
DISTRIBUTION STATEMENT A:APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED
![Page 4: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/4.jpg)
AFIT-ENS-DS-17-S-043
DUALITY BEHAVIORS OF THE QUANTILE REGRESSION MODEL
ESTIMATION PROBLEM
Paul D. Robinson II, B.A., M.S.Major, USAF
Committee Membership:
James W. Chrissis, PhDChair
Richard F. Deckro, PhDMember
Christine M. Schubert Kabban, PhDMember
James F. Morris, PhDMember
Adedeji B. Badiru, PhDDean, Graduate School of Engineering
and Management
![Page 5: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/5.jpg)
Table of Contents
Page
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . 1-2
1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.3 Solution Method Characteristics . . . . . . . . . . . . . 1-9
1.3.1 Problem Statement . . . . . . . . . . . . . . . 1-12
1.3.2 Research Objectives . . . . . . . . . . . . . . 1-12
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
II. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.1 QRMEP Properties . . . . . . . . . . . . . . . . . . . . 2-1
2.1.1 Exact-Fit or p-Subset Property . . . . . . . . 2-2
2.1.2 Cardinality Range Property . . . . . . . . . . 2-3
2.1.3 Partitioning . . . . . . . . . . . . . . . . . . . 2-4
2.2 Pivoting Methods . . . . . . . . . . . . . . . . . . . . . 2-8
2.2.1 Barrodale-Roberts Algorithm . . . . . . . . . 2-8
2.2.2 Koenker-d’Orey Algorithm . . . . . . . . . . . 2-13
2.2.3 Interval-Linear Programming . . . . . . . . . 2-16
2.2.4 Dual Simplex Method for Bounded Variables . 2-22
2.3 Interior-Point Methods . . . . . . . . . . . . . . . . . . 2-25
2.3.1 Affi ne Scaling . . . . . . . . . . . . . . . . . . 2-26
iii
![Page 6: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/6.jpg)
Page
2.3.2 Log-Barrier Methods . . . . . . . . . . . . . . 2-33
2.4 Finite Smoothing Algorithm . . . . . . . . . . . . . . . 2-40
2.5 Integer Programming Formulations . . . . . . . . . . . 2-42
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 2-46
III. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.1 Simplex Method for Bounded Variables . . . . . . . . . 3-1
3.2 Generalized Interval-Linear Programming . . . . . . . 3-9
3.3 Long-Step Dual Simplex (LSDS) Method . . . . . . . . 3-15
3.4 The QRMEP as an Integer Program . . . . . . . . . . 3-21
3.4.1 The Bounded Interval Generalized Assignment
Problem . . . . . . . . . . . . . . . . . . . . . 3-23
3.4.2 The Bounded Interval Knapsack Problem . . . 3-26
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3-27
IV. Implementation, Testing, and Numerical Results . . . . . . . . 4-1
4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . 4-1
4.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.3 Numerical Results . . . . . . . . . . . . . . . . . . . . 4-2
4.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
V. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.2 Future Research . . . . . . . . . . . . . . . . . . . . . 5-4
5.2.1 Preprocessing . . . . . . . . . . . . . . . . . . 5-4
5.2.2 Integer Programming Alternatives . . . . . . . 5-4
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIB-1
iv
![Page 7: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/7.jpg)
List of FiguresFigure Page
4.1 p = 3. q = 0.05. LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
4.2 (p, q) = (3, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.3 (p, q) = (3, 0.5). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.4 (p, q) = (3, 0.75). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.5 (p, q) = (3, 0.95). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.6 (p, q) = (8, 0.05). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.7 (p, q) = (8, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.8 (p, q) = (8, 0.5). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.9 (p, q) = (8, 0.75). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.10 (p, q) = (8, 0.95). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4.11 (p, q) = (15, 0.05). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4.12 (p, q) = (15, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
4.13 (p, q) = (15, 0.5). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
v
![Page 8: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/8.jpg)
Figure Page
4.14 (p, q) = (15, 0.75). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
4.15 (p, q) = (15, 0.95). LSDS (solid), Dual Simplex (dashed), Interior-
Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
4.16 GILP for p = 3. 5th Quantile (magenta), 25th Quantile (blue),
50th Quantile (black), 75th Quantile (green), 95th Quantile
(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
4.17 GILP for p = 8. 5th Quantile (magenta), 25th Quantile (blue),
50th Quantile (black), 75th Quantile (green), 95th Quantile
(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
4.18 GILP for p = 15. 5th Quantile (magenta), 25th Quantile (blue),
50th Quantile (black), 75th Quantile (green), 95th Quantile
(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
vi
![Page 9: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/9.jpg)
List of TablesTable Page
4.1 Crossover Point Data for p = 3 . . . . . . . . . . . . . . . . . 4-6
4.2 Crossover Point Data for p = 8 . . . . . . . . . . . . . . . . . 4-12
4.3 Crossover Point Data for p = 15 . . . . . . . . . . . . . . . . 4-12
4.4 Average Number of Iterations . . . . . . . . . . . . . . . . . . 4-15
vii
![Page 10: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/10.jpg)
AFIT-ENS-DS-17-S-043
Abstract
A vector of quantile regression model coeffi cients, also known as regression
quantiles, is shown to be the solution to a parametric minimization problem. It
can also be shown that the same model parameters are obtainable by solving a
nonparametric dual linear program, and it is this feature of the quantile regression
model estimation problem (QRMEP) that is of particular interest.
Both the primal and dual linear programs of the QRMEP are shown to possess
special structures. Provided certain model assumptions are met, the QRMEP also
exhibits two unique properties. These properties, along with the duality behaviors
of the problem, are exploited in order to extend two pivoting algorithms to the class
of QRMEPs: a generalization of interval-linear programming (I-LP) and a long-step
variant of the dual simplex method. For problems and/or models up to a certain
size, these extensions are shown to perform well, computationally, against the classic
dual simplex algorithm and interior-point methods.
viii
![Page 11: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/11.jpg)
DUALITY BEHAVIORS OF THE QUANTILE REGRESSION
MODEL ESTIMATION PROBLEM
I. Introduction
Insights into the distribution of a single variable can be obtained through
measures of central tendency and measures of spread. The mean enjoys a dominant
position among centrality measures; such dominance is even more apparent when
models for conditional distributions are required, namely linear regression models.
The ordinary least squares (OLS) normal equations can be solved easily, even for
large problems, and the closed-form solution offered by the normal equations partially
explains the appeal of OLS models. A typical assumption made for linear regression
models is that the errors (residuals) are independently, identically, and normally
distributed. If this assumption holds, then OLS models suffi ciently describe the
distributive behavior of a response about its center. One of the limitations of OLS
models, however, is sensitivity to outliers and skewed distributions, and researchers
seek to mitigate the effects of such extreme values. One method of dealing with
outliers in OLS models is simply to discard them, but eliminating extreme values
could yield misleading conclusions.
Many studies, on the other hand, focus specifically on these extreme values.
Analysis techniques that examine conditional distribution locations other than the
mean are therefore needed in order to provide a more detailed description of a con-
ditional response distribution [38], and quantile regression satisfies this requirement.
The main advantage quantile regression models have over OLS is their robustness.
That is, quantile regression models are insensitive to outliers and skewed distribu-
1-1
![Page 12: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/12.jpg)
tions, so they have applications to studies in the social sciences where the tails of a
distribution are a concern [28].
1.1 Problem Setting
The research in this dissertation is focused exclusively on the problem of es-
timating the quantile regression model coeffi cients; that is, the quantile regression
model estimation problem (QRMEP). Like OLS, the parameter estimates for quan-
tile regression models come from solutions to minimization problems. On the other
hand, a closed form solution is not available for the QRMEP. Instead, conditional
quantile functions are estimated as the solutions to parametric linear programming
(LP) problems.
The idea of generating a hypothetical model of a response distribution via
some manner of least absolute deviation (LAD) spans more than two hundred years
of research, beginning with Boscovich and Laplace in the late 1700s, followed by
Edgeworth nearly a century later [11]. Boscovich essentially introduced median re-
gression, a special case of the QRMEP, while modeling the ellipticity of the earth.
He proposed a linear model for ellipticity, one where the sum of absolute errors is
minimized, and its errors sum to zero [38]. The next major advancement, one that
may be considered the source of modern quantile regression and this research, comes
from Koenker and Bassett [32]. Instead of repeating the traditional and laborious
process of sorting sample observations to obtain quantiles, they proposed formulat-
ing the minimized sum of absolute deviations as a parametric LP, the solution to
which is a vector of quantile regression model parameters. They called this class
of linear models regression quantiles, where the vector of quantile regression model
coeffi cients (solution to the QRMEP) is also the vector of regression quantiles. Con-
ditional mean models are still more popular because of the computational advantage
OLS has over that of LAD methods, but recent advances in LP theory have helped
regression models of the l1-norm variety overcome this deficiency and stimulated
renewed interest in such problems.
1-2
![Page 13: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/13.jpg)
Properly defining a quantile is necessary before formulating the LP to estimate
the quantile regression model. At least three definitions of a quantile are encountered
in the literature, but one seems to convey the idea of a quantile better than the
others. Koenker [38], as well as Batur and Choobineh [3], define the quantile y(q)
of a random variable Y as y(q) = inf {y : F (y) ≥ q}, where F (Y ) = P (Y ≤ y).
Hao and Naiman [28] estimate the quantile in terms of proportions of values of the
random variable Y . For instance, if y(q) is the (100q)th quantile, then the proportion
of values of Y which are less than or equal to y(q) is q. Koenker and Hallock also
use this definition [37]. Chang [16] offers a definition in the context of a financial
management metric known as downside risk, which is the probability of observing
values of a random variable that are less than some critical value. It should also be
noted that although these three definitions are different, each essentially generates
the same (100q)th quantile for a continuous random variable, but downside risk most
adequately describes how to interpret a conditional quantile model.
Suppose a sample obtained from a random variable Y is partitioned into equal
parts, and assume that a value Y = y can be drawn from any of the partitions with
equal probability. The values that define the boundaries of each partition are said
to be the quantiles. In other words, a value y(q) is the (100q)th quantile of Y if the
probability of a random draw from Y being less than or equal to y(q) is exactly q, or
P (Y ≤ y(q)) = q. (1.1)
Certain quantiles are uniquely identified, such as the quartiles (0.25, 0.75), median
(0.50), deciles (0.10, 0.20, etc.), and percentiles (0.01, 0.02, etc.). For simplicity,
however, the general term quantile is used for any y(q) with probability q ∈ (0, 1).
While OLS examines residual behavior about the conditional mean, quantile
regression examines residual behavior about the (100q)th conditional quantile. The
associated probability of interest for the quantile supplies the asymmetric weights to
1-3
![Page 14: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/14.jpg)
the objective function, namely q and its complement (1− q). Let p be the number
of parameters included in a quantile regression model of the form
y = Xb+ r,
where y ∈ Rn is a vector of observations from some dependent random variable
Y (response), X ∈ Rn×p is a design matrix whose rows denote observations from
(p− 1) independent random variables (regressors) X1, X2, . . . , Xp−1; b ∈ Rp is the
vector of quantile regression model parameters (coeffi cients), and r ∈ Rn is a vector
of estimation errors (residuals).
Suppose the conditional median (50th quantile) is to be estimated. Rather
than minimizing the sum of squares of the residuals (i.e., OLS), the symmetric abso-
lute value function is minimized in order to solve for the vector of model parameters
b:
minb∈Rp
n∑j=1
|yj − xjb| = minb∈Rp
n∑j=1
|rj| ,
where xj is the jth row of X of the form xj =(1, xj1, xj2, . . . , xj(p−1)
). For the
median, the sum of positive and negative residuals must be zero. For any q 6= 0.5,
this sum is nonzero, so some modification to the absolute value function is required
such that it covers any probability q ∈ (0, 1). Let ρq (rj) =(1− Irj<0
)rj denote
the tilted absolute value function [37] for the jth residual, where Irj<0 is an indicator
function such that Irj<0 = 1 if rj < 0 and zero otherwise. This residual loss (check)
function is generalized for any probability q ∈ (0, 1) as
ρq (rj) =(q − Irj<0
)rj.
If a sample of size n is partitioned into equal parts, and a random draw coming from
any partition is equiprobable, then it follows that the number of sample values which
are less than or equal to the quantile is at most qn. Now consider the asymmetric
1-4
![Page 15: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/15.jpg)
sum of residualsn∑j=1
ρq (rj) =
n∑j=1
ρq (yj − xjb) .
The objective is to find the vector of model coeffi cients (regression quantiles [32])
b ∈ Rp that minimizes the convex combination of residuals
minb∈Rp
{n∑j=1
ρq (rj)
}= min
b∈Rp
(q − 1)∑rj<0
(yj − xjb) + q∑rj>0
(yj − xjb)
.Since q 6= |q − 1|, except in the median case, it is necessary to distinguish between the
set of residuals weighted by q and the set of residuals weighted by (q − 1). Rewrite
the residual vector as the difference between two nonnegative vectors; that is, let
r = u − v, where u ∈ Rn+ is a vector of positive residuals and v ∈ Rn+ is a vector of
absolute values of negative residuals. In other words,
uj =
rj, rj > 0
0, otherwise
(1.2)
vj =
|rj| , rj < 0
0, otherwise
.
It follows that if the jth residual is zero, then uj = vj = 0. The vector of positive
residuals u is weighted by the probability q, and the vector of absolute negative
residuals v is weighted by its complement (1− q). Clearly, it is unnecessary to
place any kind of weight on the zero residuals, so the QRMEP takes the form of the
minimization problem
minb∈Rp,u≥0n,v≥0n
q
n∑j=1
uj + (1− q)n∑j=1
vj (1.3)
1-5
![Page 16: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/16.jpg)
subject to
x1b+ u1 − v1 = y1
...
xnb+ un − vn = yn,
where the raw residuals are the constraints. The primal QRMEP can be expressed
in matrix notation as
minb∈Rp,u≥0n,v≥0n
quT1n + (1− q)vT1n (1.4)
subject to
Xb+ u− v = y,
where 1n denotes an (n× 1) vector of ones.
The Karush-Kuhn-Tucker (KKT) conditions [4] further demonstrate the spe-
cial structure of the QRMEP. Obviously, the constraints b ∈ Rp, Xb+ u− v = y,
and u,v ≥ 0 from (1.4) constitute the primal feasibility conditions. The method
of Lagrange multipliers is used to derive the remaining KKT conditions. Since the
first n constraintsXb+u−v = y are equality constraints, the associated n-vector of
multipliers w is unrestricted in sign. The n-vectors of multipliers t and s, associated
respectively with the residual vectors u and v, are nonnegative, since the next 2n
constraints u ≥ 0 and v ≥ 0 are inequality constraints. The resulting Lagrangian
function therefore takes the form
L (w, t, s) = quT1n + (1− q)vT1n +wT (y −Xb− u+ v)− tTu− sTv
= uT (q1n −w − t) + vT ((1− q)1n +w − s) + yTw − bTXTw.
1-6
![Page 17: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/17.jpg)
Taking partial derivatives with respect to the model parameters b and the decision
variables (residuals) u,v produces the dual feasibility conditions [4]
∂L
∂b= −XTw = 0p
∂L
∂u= q1n −w − t = 0
∂L
∂v= (1− q)1n +w − s = 0.
Letting t, s ≥ 0 be n-vectors of surplus variables, the dual feasibility conditions be-
come
XTw = 0p (1.5)
w ≤ q1n (1.6)
w ≥ (1− q)1n, (1.7)
where the two inequality constraints provide bounds on w. Thus, the dual LP of the
QRMEP is
maxw∈[q−1,q]n
yTw (1.8)
subject to
XTw = 0p.
The complementary slackness condition is given by
wT (y −Xb− u+ v) = 0. (1.9)
Let S be the indexing set of all observations, so the cardinality of S is n. To
guarantee that yj −xjb−uj + vj = 0 holds for all j ∈ S, conditions on the values uj
1-7
![Page 18: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/18.jpg)
and vj must be satisfied [38]. It follows from (1.2) that no single observation can have
both a positive and negative residual simultaneously, so the situation where uj > 0
and vj > 0 is impossible. As a result, Koenker [37] rewrites the complementary
slackness condition as min {uj, vj} = 0, which must hold for all j ∈ S. In other
words, yj − xjb = uj implies vj = 0 for positive residuals, xjb − yj = vj implies
uj = 0 for negative residuals, and yj = xjb implies uj = vj = 0 for zero residuals.
The observations associated with the vector of zero residuals define the regression
hyperplane, so these observations are said to be basic. Observations associated with
the positive or negative residuals are said to be nonbasic.
1.2 Applications
Social scientists, econometricians, and ecologists are often more concerned with
researching the extremes of certain phenomena rather than their respective central
behaviors. For example, an economist studying income inequality pays particular
interest to the rich and poor, which requires close examination of the upper and lower
quantiles, or tails, of the income distribution [28]. Since linear regression only de-
scribes the conditional behavior of the response about its center, quantile regression
is a robust alternative for analyzing the extremes of a conditional distribution.
Over the past decade, public health and medical studies are two areas in which
quantile regression has seen increasing acceptance, both as a primary method and a
comparative tool. Stifel and Averett [59] show how traditional analyses on obesity
are being challenged by conditional quantile models. Anomaly detection in cyber
operations is another example of a new field to which quantile regression can be ex-
tended, where anomalous, rather than normal (i.e., average), web traffi c is a primary
concern.
Quantile regression models have been applied extensively to various wage gap
studies. Garcia, Hernandez, and Lopez-Nicolas [23] examined the male-female
wage gap in the Spanish labor market. For each gender, Garcia, et al. con-
1-8
![Page 19: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/19.jpg)
structed conditional quantile models on wages and analyzed wage differentials for
q ={
110, 1
4, 1
2, 3
4, 9
10
}. Buchinsky [10] focused on female wage distribution in the
US. Machado and Mata [44], [45] determined the effects of skills preferences, foreign
competition, and education on the Portuguese labor force. Martins and Pereira [46]
investigated the effects of education on wage inequality across 16 nations. Wage
differentials between the public and private sectors have also been described using
quantile regression models. See Melly [52] for details on the public-private sector
wage gap in Germany. Mueller [53] conducted a similar study for Canada. Quantile
regression has been applied to various other economic topics, such as firm start-up
size [47] and hedge fund strategies [51].
Quantile regression is being increasingly employed in ecological studies. Cade
and Noon [13] provide a brief and simple mathematical description of quantile regres-
sion in their introductory article, but the article’s purpose was rather to demonstrate
the current successes of quantile regression in ecology and encourage its increased
use. Heteroscedasticity in species modeling is analyzed using quantile regression in
Cade, et al. [12], and a subsequent study [14] uses regression quantiles to uncover
hidden biases in habitat models. Using quantile regression to model the distribu-
tions of various species is discussed in Vaz, et al. [62]. Each of these articles does
not contain a suffi cient mathematical description of quantile regression, a deficiency
which may slow the increased acceptance of quantile regression as a useful tool in
ecology.
1.3 Solution Method Characteristics
Developing an alternative method to solve the QRMEP presents a unique chal-
lenge, considering that several computationally effi cient algorithms already exist.
Because (1.4) and (1.8) are LPs, it is reasonable to investigate pivoting methods for
potential solution techniques. The special structure of the QRMEP, particularly
(1.8), prevents the classic simplex method from being applied directly, specifically
1-9
![Page 20: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/20.jpg)
with regards to the stopping criteria of the algorithm. The details on this feature
of the problem are discussed in Chapter 3. Barrodale and Roberts [1] developed an
effi cient simplex algorithm for median regression, and Koenker and d’Orey [33] ex-
tended this method to compute regression quantiles for any probability q ∈ (0, 1). In
most software packages containing a dedicated quantile regression solver, this mod-
ified simplex algorithm is the default [19]. It is most effective and effi cient on small
to moderately sized problems, which Chen and Wei [17] defined to be n ≤ 100, 000
observations. What characterizes a large problem also depends on the number of
independent variables (regressors). Chen and Wei established an upper bound of
5, 000 on the number of observations to be a large problem, when the number of
regressors approaches 50.
Interior-point algorithms have a computational effi ciency advantage over sim-
plex methods. Consequently, interior-point methods are generally preferred for
estimating large-scale LPs. Practical experimentation has shown the advantage in
computational effi ciency to be dependent on sample size. Portnoy and Koenker [55]
showed that interior-point methods are actually inferior to simplex methods when
sample sizes are small. Interior-point methods achieve computational dominance
once a sample grows to a certain size, but this crossover point is subject to the
number of regressors in the model. The experiments of Portnoy and Koenker [55]
on the conditional median showed this crossover point to be around n = 20, 000
when the number of regressors was low, say (p− 1) = 4, but n decreased substan-
tially (n < 500) for small increases in the number of regressors (up to (p− 1) = 16).
Specifics of the interior-point method developed by Koenker and Park [35], a variant
of the primal-dual path following method, are presented in Chapter 2.
A third method, which Chen [18] calls the finite smoothing algorithm, is compu-
tationally competitive with both simplex and interior-point methods, and its details
are also discussed in Chapter 2. The primal objective function in (1.4) is a weighted
LAD function and not differentiable, so the finite smoothing algorithm approximates
1-10
![Page 21: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/21.jpg)
the objective function via a Huber function, thus making it differentiable. The finite
smoothing algorithm outperforms the Barrodale-Roberts algorithm when n > 3, 000
or (p− 1) > 50. It is significantly faster than the interior-point method when the
number of model regressors is large.
Other solution techniques, such as interval-linear programming (I-LP), have
been proposed [56], but the dual simplex, interior-point, and finite smoothing meth-
ods have become standard options in many software packages [19]. Each of these
current methods has its own advantages and disadvantages, but the following algo-
rithm features are considered to be essential to extending an alternative method to
the class of QRMEPs:
1. Exact. Being simplex variants, the Barrodale-Roberts and Koenker-d’Orey
[33] algorithms are considered exact solution methods in that each converges
to the optimal basis in a finite number of iterations [1]. The interior-point and
finite smoothing algorithms, on the other hand, are iterative search methods
which approximate an improving model parameter vector solution at each it-
eration. This research focused on extending an alternative pivoting algorithm
to the class of QRMEPs; one that proceeds from (1.8) and converges to the
optimal basis with the same accuracy as that of simplex methods.
2. General. The Barrodale-Roberts and I-LP methods were developed to solve
a special case of the QRMEP, specifically the conditional median. Just as the
Koenker-d’Orey algorithm extended Barrodale-Roberts to any quantile, the
extensions resulting from this research are also applicable for any probability
q ∈ (0, 1).
3. Effi cient. The computational effort required to estimate quantile regression
models is generally greater than that of OLS. As the sample size and/or model
size increases, the computation time for the algorithm also increases, so the
computational effi ciency of a QRMEP solution method is bounded above by
1-11
![Page 22: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/22.jpg)
sample/model size. Two measures of performance serve as proxy metrics for
the required computational effort [29]: processing time (run time) and the
number of iterations.
Much of the existing research on the QRMEP has paid particular attention
to algorithmic speed (run time). However, each of the existing quantile regression
model estimation methods is deficient in at least one of the aforementioned features.
The discussion in Chapter 2 gives details on which feature each method performs
poorly. Taking into account the above characteristics, the following statement cap-
tures the direction of this research.
1.3.1 Problem Statement. There exist simplex, interior-point, and finite
smoothing algorithms for solving the QRMEP. However, an alternative method-
ology is needed which proceeds from (1.8), exploits both the unique properties of
the QRMEP and the special structures of (1.4) and (1.8), and yields an algorithm
that is exact, general, and computationally effi cient. This research responds to
these requirements by extending two alternative pivoting algorithms to the class of
QRMEPs: generalized interval-linear programming and a long-step variant of dual
simplex.
1.3.2 Research Objectives. This research first attempted to extend a spe-
cial simplex implementation, the simplex method for bounded variables (bounded
simplex), to the class of QRMEPs because the form of (1.8) is equivalent to the form
required for the bounded simplex method. Additionally, two pivoting algorithms
were successfully extended to this class of problems: interval-linear programming (I-
LP) and a long-step variant of the dual simplex method. This research progressed
under the following objectives:
1-12
![Page 23: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/23.jpg)
1. Exploit the unique duality properties of the QRMEP and special structures of
(1.4) and (1.8) by extending a generalized form of I-LP and a long-step variant
of the dual simplex method to the class of QRMEPs.
2. Implement the extended algorithms in a commercially available software pro-
gram, specifically MATLAB. Furthermore, compare the extended algorithms
against two methods available as standard options in the MATLAB environ-
ment: dual simplex and Mehrotra’s predictor-corrector variant of primal-dual
path following (interior-point) [50].
3. Test the algorithms using an open-source dataset for a finite set of quantiles.
The dataset is sampled uniformly to obtain various combinations of problem
and model sizes (i.e., varying levels of n and p). Obtain run times and the
numbers of required iterations for all algorithms under comparison. Generate
and analyze plots of the run times, and tabulate the iteration results in order
to evaluate the performance of the extensions.
1.4 Overview
This dissertation is organized in the following manner. Chapter 2 reviews the
relevant literature on the QRMEP, beginning with detailing its two essential prop-
erties. The review continues with a discussion on partitioning the design matrix
and a translated form of (1.8), followed by comprehensive reviews of current piv-
oting algorithms (Barrodale-Roberts, Koenker-d’Orey, I-LP), current interior-point
methods (affi ne scaling, primal path following, primal-dual path following), and the
finite smoothing algorithm. Chapter 2 concludes by noting similarities the QRMEP
has with some familiar integer programs. Chapter 3 details the two alternative
pivoting algorithms which this research extends to the class of QRMEPs. Chapter
4 presents graphical and tabular results from the MATLAB testing of four solution
methods. Chapter 5 summarizes this research, discusses how it contributes to the
field of Operations Research, and presents topics for future work.
1-13
![Page 24: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/24.jpg)
II. Literature Review
In addition to the basic duality structure and KKT optimality conditions presented
in Chapter 1, the QRMEP exhibits some unique properties. These properties im-
pose additional optimality conditions on the QRMEP. They also establish some
necessary assumptions on the model, and these assumptions must hold to guarantee
an optimal solution to the QRMEP. Section 2.1 begins by presenting two properties
of quantile regression found in the current literature, and concludes with alternative
formulations of the primal and dual LPs resulting from these properties.
Section 2.2 identifies the contemporary fields to which quantile regression is
most commonly applied. To date, the social science fields have demonstrated
a greater preference for quantile regression as an alternative to conditional mean
models than have other disciplines, yet other fields are beginning to apply quantile
regression to their respective analyses.
Chapter 2 continues with descriptions of current solution methods for the
QRMEP. Section 2.3 describes the pivoting algorithms: Barrodale-Roberts, Koenker-
d’Orey, I-LP, and dual simplex. Section 2.4 covers the interior-point methods: affi ne
scaling, primal path following, and primal-dual path following. The literature re-
view concludes with identifying structural similarities between the QRMEP and some
familiar integer programming problems.
2.1 QRMEP Properties
In 1978, Koenker and Bassett [32] introduced two properties unique to the
QRMEP. These properties are significant because they establish three model as-
sumptions which must hold in order to guarantee convergence to the optimal basis.
The following subsections restate the theorems that define each property and present
these assumptions. Combining the properties with a specific partitioning scheme
results in some interesting reformulations of the primal (1.4) and dual (1.8) LPs.
2-1
![Page 25: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/25.jpg)
2.1.1 Exact-Fit or p-Subset Property. The observations used to fit the
quantile regression model can be distinguished from those having nonzero residuals
by defining H as the set of all index subsets of size p. That is, each p-subset
h ∈ H consists of the indices of the observations used for model fitting, and these
are referred to as the basic observations [38]. Let Xh denote a square submatrix of
size p whose rows are the observations identified by each index j ∈ h. Each row in
Xh has a zero residual, so
Xhb = yh (2.1)
where yh is the p-vector of the response defined by h, and b is its solution vector
of model parameters. This leads to the following theorem, which is a restatement
of theorems given originally by Koenker and Bassett [32] and later by Koenker [38].
The result is what is known as the exact-fit property.
Theorem 1 If the design matrix X has rank p, then there exists at least one p-
element subset h ∈ H such that
b∗ = X−1h yh.
Furthermore, b∗ is a solution to the QRMEP if and only if
(q − 1)1p ≤ wh ≤ q1p,
where wh is the corresponding p-subset of the dual solution vector w.
If rank (X) = p, then Xh ∈ Rp×p and nonsingular (invertible), so the exact-
fit property is also called the p-subset property [2]. The set of all p-subsets H is
therefore equivalent to the set of all dual extreme points generated by the polytope
from (1.8).
2-2
![Page 26: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/26.jpg)
If all data in the model are continuous, then the exact-fit property can be used
to compute a parameter vector b for any p-subset h ∈ H. However, not every
p-subset satisfies dual feasibility for a specific q. It is possible that more than one
p-subset solves the QRMEP, though Koenker [38] indicates that such degeneracies
are rare and typically occur when discrete data are present. The exact-fit property
therefore states two necessary assumptions on the quantile regression model. If the
data are all continuous and a nondegenerate solution can be assumed to exist, then
b∗ is a unique solution to the QRMEP if and only if
(q − 1)1p < wh < q1p (2.2)
for exactly one h ∈ H. In other words, the unique optimal solution to the QRMEP
for the (100q)th conditional quantile is identified by the p-subset for which dual
feasibility is strictly satisfied.
2.1.2 Cardinality Range Property. A residual cannot be simultaneously
positive and negative. Similarly, it cannot be both zero and positive, or zero and
negative. Each residual, and the observation to which it corresponds, can there-
fore be classified into exactly one of three mutually exclusive sets: zero residuals
(basic observations which define the regression hyperplane), negative residuals (non-
basic observations falling below the regression hyperplane), and positive residuals
(nonbasic observations falling above the regression hyperplane). The respective
cardinalities of these sets can be approximated under the following theorem from
Koenker and Bassett [32],
Theorem 2 Let P , N , and Z denote the numbers of positive, negative, and zero
elements, respectively, in the residual vector r = y−Xb. If the quantile regression
2-3
![Page 27: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/27.jpg)
model contains an intercept, then
N ≤ qn ≤ n− P = N + Z
P ≤ (1− q)n ≤ P + Z
for all b ∈ S, where S is the set of all b that satisfy the above inequality. If the
cardinality of S is one (|S| = 1), then b is unique and
N < qn < N + Z
P < (1− q)n < P + Z.
If a solution is nondegenerate, then Z = p with bounds [38] on N
qn− p < N < qn (2.3)
and bounds on P
(1− q)n− p < P < (1− q)n. (2.4)
Koenker and Bassett did not provide a name for this property, so it is referred to in
this research as the cardinality range property. This property also establishes the
third assumption that the model must contain an intercept.
2.1.3 Partitioning. Koenker and Bassett [32] distinguish only between
basic (h) and nonbasic (h) observations, but the cardinality range property reveals
another way of partitioning the sample. The nomenclature used here is similar to
that given by Bazaraa, Jarvis, and Sherali [4] in their presentation of the simplex
method for bounded variables (bounded simplex). The design matrix can be par-
titioned as X = (B;Nv;Nu), where B is the (p× p) basis matrix, Nv consists of
the observations (rows of X) falling below the regression hyperplane, and Nu con-
sists of the rows of X falling above the hyperplane. If the three model assumptions
2-4
![Page 28: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/28.jpg)
established by the exact-fit and cardinality range properties are met; that is, if a non-
degenerate solution exists, all independent variables are continuous, the response is
continuous, and the quantile regression model contains an intercept (i.e., X·1 = 1n),
then Z = p, the number of rows in Nv must satisfy (2.3), and the number of rows
in Nu must satisfy (2.4). The dual vector can be similarly partitioned such that
w = (wb,wv,wu), where wv = (q − 1)1v, wu = q1u and wb ∈ (q − 1, q)p. The dual
constraints can now be rewritten as
XTw = BTwb +NTvwv +NT
uwu = 0p, (2.5)
and wb can be computed directly:
wb = −(BT)−1
NTvwv −
(BT)−1
NTuwu. (2.6)
The objective coeffi cient vector y can also be partitioned into y = (yb,yv,yu)T , and
(2.6) can be substituted into the objective function, yielding
yTw = yTb wb + yTvwv + yTuwu
= −yTb(BT)−1
NTvwv − yTb
(BT)−1
NTuwu + yTvwv + yTuwu
=(yTv − yTb
(BT)−1
NTv
)wv +
(yTu − yTb
(BT)−1
NTu
)wu. (2.7)
These partitions reveal a unique attribute of the dual LP, namely that for any n
and p, wv is a vector whose indices correspond to negative residuals, wu is a vector
whose indices correspond to positive residuals, and wb is a vector whose indices
correspond to the observations used to fit the regression hyperplane (zero residuals).
This implies that the positive and negative residual vectors can approach ∞ and
−∞, respectively, without changing the solution. That is, the dual LP approach
is concerned not with the magnitude of the residuals, but rather on which side of
2-5
![Page 29: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/29.jpg)
the regression hyperplane each observation lies [11]. The dual LP can therefore be
expressed in terms of the nonbasic observations,
maxwb∈(q−1,q)p
(q − 1)(yTv − yTb
(BT)−1
NTv
)1v + q
(yTu − yTb
(BT)−1
NTu
)1u (2.8)
subject to
wb = (1− q)(BT)−1
NTv 1v − q
(BT)−1
NTu1u,
where dual feasibility is satisfied, and optimality achieved, only when the basis vector
lies strictly within the bounds. Additional characteristics of the QRMEP can be
obtained by partitioning (1.4). Since uj = vj = 0 for all basic observations, (1.4)
can be rewritten as
minb∈Rp,uu≥0u,vv≥0v
(1− q)vTv 1v + quTu1u (2.9)
subject to
Bb = yb
Nvb− vv = yv
Nub+ uu = yu.
The basis matrix B is nonsingular, so the exact-fit property can be used to determine
the model parameters; b = B−1yb. With the model coeffi cients computed, the
components of the residual vector, uu and vv, can also be obtained,
vv = Nvb− yv
= NvB−1yb − yv
uu = yu −Nub
= yu −NuB−1yb.
2-6
![Page 30: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/30.jpg)
The duality gap [4] is zero at optimality, so
(1− q)vTv 1v + quTu1u =(yTv − yTb
(BT)−1
NTv
)wv +
(yTu − yTb
(BT)−1
NTu
)wu.
Even these rewrites of the primal and dual LPs still do not take into account the
cardinality range property for a specific q, so a solution can satisfy primal feasibility
without satisfying (2.3) and (2.4).
Koenker and Bassett [32] present regression quantiles for linear models as natu-
ral extensions of the order statistics of a single sample. That is, the residuals are the
order statistics in the quantile regression model. Nonparametric statistics in loca-
tion models can also be computed based on the rankings of the sample observations,
and the concept of ranking observations was extended to the quantile regression class
of models in 1992, when Gutenbrunner, et al. defined regression rank scores [25].
Consider the translation aj = wj + 1 − q, which shifts the boundaries on the dual
decision variables such that the translated dual LP [38] is
maxa∈[0,1]n
yTa (2.10)
subject to
XTa = (1− q)XT1n,
where the solution a is a vector of regression rank scores [25]. This equivalent form of
the dual was first presented by Koenker and Bassett [32]. One distinct advantage to
using (2.10) is that, unlike the standard form of (1.8), any translated dual solution
has identical bounds, a ∈ [0, 1]n, regardless of the given quantile. The feasible
region of (2.10), however, is dependent on the quantile. Conversely, the feasible
region of (1.8) holds for any quantile, while the bounds on w vary by quantile. This
translated form can also be expressed in terms of the nonbasic variables, using the
same partitioning of the design matrix and response vector as was done in (2.8). Let
2-7
![Page 31: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/31.jpg)
a = (ab, av, au)T , where av = 0v, au = 1u, and ab ∈ (0, 1)p such that
XTa = (1− q)XT1n (2.11)
BTab +NTv av +NT
uau = (1− q)BT1p + (1− q)NTv 1v + (1− q)NT
u1u
BTab + (q − 1)BT1p = (1− q)NTv 1v − qNT
u1u
BTab = (1− q)BT1p + (1− q)NTv 1v − qNT
u1u.
Since av is a zero vector, the objective function simplifies to
yTa = ybab + yTu1u (2.12)
= (1− q)yTb 1p + (1− q)yTb(B−1
)TNTv 1v − qyTb
(B−1
)TNTu1u + yTu1u
= (1− q)yTb 1p + (1− q)yTb(B−1
)TNTv 1v +
(yTu − qyTb
(B−1
)TNTu
)1u,
which leads to
maxab∈(0,1)p
(1− q)yTb 1p+ (1− q)yTb(B−1
)TNTv 1v +
(yTu − qyTb
(B−1
)TNTu
)1u (2.13)
subject to
ab = (1− q)1p + (1− q)(B−1
)TNTv 1v − q
(B−1
)TNTu1u.
These alternative formulations, especially (2.8), exhibit features unique to the
QRMEP and are useful for solving LPs with bounded variables. Chapter 3 shows
how these features are exploited.
2.2 Pivoting Methods
2.2.1 Barrodale-Roberts Algorithm. Although quantile regression model
coeffi cients are obtained via the solution to a LP, the unique properties of regression
quantiles eliminate the classic simplex algorithm as a viable solution method. In
2-8
![Page 32: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/32.jpg)
1973, Barrodale and Roberts [1] introduced a method, which this research calls the
Barrodale-Roberts algorithm, for the l1-approximation problem which modifies the
simplex method in order to take advantage of the special structure of the condi-
tional median LP. Since q = (1− q) for median regression, the Barrodale-Roberts
formulation of the l1-approximation problem as an LP differs from (1.4) in two ways:
the objective function weights (1− q) and q are removed, and the vector of model
coeffi cients is rewritten as the difference between two nonnegative vectors. Since
median regression is a special case of the QRMEP, (1.4) becomes
minb+∈Rp,b−∈Rp,u≥0n,v≥0n
uT1n + vT1n
subject to
Xb+ −Xb− + u− v = y.
Each iteration involves estimating one of the model parameters, so one should expect
to perform at least p pivots in the tableau. Not all columns need to be displayed in
the tableau, and an initial basic feasible solution is readily available for any problem,
namely by letting all observations be basic. That is, for each observation,
yj − 0 = uj
xj(b+j − b−j
)= 0
b+0 + xjb
+1 − b−0 − xjb−1 = 0
and the intial basis consists of all positive residuals. The initial tableau [1] takes
the form
Costs −→ 0 0 1Tn −1Tn↓ Basis RHS (b+)
T(b−)
TuT vT
1n u y X −X In −InMarginal Costs −→ yT1n 1TnX −1TnX 0 1Tn
.
2-9
![Page 33: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/33.jpg)
The algorithm consists of two stages. Stage 1 is a maximal selection from
among the columns of X or −X [18]. Specifically, Stage 1 is chooses the nonbasic
variable with the largest nonnegative marginal cost to enter the basis. Let c = 1TnX
be a (1× p) row vector, where
k = maxj
{maxj{cj} ,max
j{−cj}
}
denotes the index of the model coeffi cient selected to enter the basis, and 0 ≤ k ≤
(p− 1). Once k is identified, a basic uj must be chosen to leave the basis. This is
accomplished by a sequence of three steps. First, candidate slopes must be computed
for each observation. This means that if the jth observation defines the regression
hyperplane, then there exists a unique multiplier b(j)k = b
(j)+k − b(j)−
k such that
yj − xj(b
(j)+k − b(j)−
k
)= 0
b(j)+k − b(j)−
k =yjxj,
where b(j)k denotes the jth candidate slope. The residual vector and objective func-
tion are computed for each candidate slope, and the b(j)k that minimizes uT1n+vT1n
is identified. The row in the tableau for which yj − xjb(j)k = 0 is the pivot row, and
uj leaves the basis. This sequence continues until p observations have been selected
to define the regression hyperplane.
Stage 2 consists of exchanging nonbasic uj, vj with basic uj, vj. That is,
columns of In or −In are interchanged to complete the optimal basis [18]. As in
Stage 1, the nonbasic variable having maximum marginal cost is selected to enter
the basis, and the basic variable which minimizes the objective function is selected
to leave the basis. If a residual becomes negative, then vj replaces uj in the basis.
The following is a reproduction of an example given in [1].
2-10
![Page 34: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/34.jpg)
Example 3 Estimate the median regression model for the following set of 5 obser-
vations [1]:
y =
1
1
2
3
2
X =
1 1
1 2
1 3
1 4
1 5
The initial tableau takes the form
Cost Basis r b+0 b+
1
1 u1 1 1 1
1 u2 1 1 2
1 u3 2 1 3
1 u4 3 1 4
1 u5 2 1 5
Clearly, b+1 has the maximum marginal cost at
∑5j=1 xj = 15. Compute the candidate
slopes for b+1 , where b
(j)+1 = yj/xj. Compute the residual vector for each b(j)+
1 .
Evaluate the objective function for each candidate slope, and choose the minimizing
value.
b(j)+1 1 1/2 2/3 3/4 2/5
y − xb(j)+1 = r(j)
0
−1
−1
−1
−3
1/2
0
1/2
1
−1/2
1/3
−1/3
0
1/3
−4/3
1/4
−1/2
−1/4
0
−7/4
3/5
1/5
4/5
7/5
0
uT1n + vT1n 9/2 7/8 17/12 31/16 3/4
2-11
![Page 35: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/35.jpg)
The minimizer is b(5)+1 = 2/5, but the cardinality range property is satisfied only by
b(3)+1 = 2/3. Pivot on the third row of the tableau such that b+
1 replaces u3 in the
basis.Cost Basis r(j) b+
0 u3
1 u1 1/3 2/3 −1/3
−1 v2 1/3 −1/3 2/3
0 b+1 2/3 1/3 1/3
1 u4 1/3 −1/3 −4/3
−1 v5 4/3 2/3 5/3
Compute candidate slopes, residuals, and objective function values for b+0 .
b(j)+0 1/2 −1 2 −1 2
y − xb(j)+0
0
−1/2
1/2
1/2
−1
1
0
1
0
−2
−1
−1
0
1
0
1
0
1
0
−2
−1
−1
0
1
0
uT1n + vT1n 5/2 4 3 4 3
The minimizer is b(1)+0 = 1/2, and b+
0 replaces u1 in the basis.
Cost Basis RHS u1 u3
0 b+0 1/2 3/2 −1/2
−1 v2 1/2 1/2 1/2
0 b+1 1/2 −1/2 1/2
1 u4 1/2 1/2 −3/2
−1 v5 1 −1 2
2-12
![Page 36: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/36.jpg)
The two observations used to fit the regression line are (x1, y1) and (x3, y3), and the
parameter vector (optimal solution) is b = (1/2, 1/2)T .
Although the modified simplex method in [1] was developed exclusively for
median regression, the next section presents a generalized algorithm applicable to
any quantile [33].
2.2.2 Koenker-d’Orey Algorithm. Koenker and d’Orey [33] extended the
method in [1] to any probability q ∈ (0, 1), so it is referred to in this research as the
Koenker-d’Orey algorithm. The structure of (2.8) is unique in that the basis vector
is defined exclusively by the nonbasic rows of X. If the partitioning of the design
matrix is relaxed such that N denotes an (n− p)× p matrix containing all nonbasic
rows of X, then (2.6) becomes
wb = −(BT)−1
NTwN ,
where NTwN = NTvwv +NT
uwu and
(q − 1)1p ≤ −(BT)−1
NTwN ≤ q1p,
which can be rewritten in primal space according to the subgradient condition [32]
(q − 1)1p ≤∑j∈h
(BT)−1
xTj
(1
2− 1
2sgn (yj − xjb)− q
)≤ q1p, (2.14)
where sgn (rj) = 1 if rj > 0, and sgn (rj) = −1 if rj < 0. The p-vector b of model
coeffi cients is optimal at q if and only if it satisfies the subgradient condition [33].
Bassett showed in [2] that (2.14) can be rewritten in matrix notation as
(q − 1)1p ≤(BT)−1
NT
((1
2− q)In−p −
1
2I
(r)n−p
)1n−p ≤ q1p, (2.15)
2-13
![Page 37: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/37.jpg)
where In−p is an identity matrix of size (n− p), and I(r)n−p is a diagonal matrix of
size (n− p) whose jth diagonal element is sgn (rj) = sgn (yj − xjb). Applying the
transformation from (2.10) yields
0p ≤ (1− q)(BT)−1
XT1n −(BT)−1
NT
(1
2In−p +
1
2I
(r)n−p
)1n−p ≤ 1p. (2.16)
As long as (2.16) is satisfied, b is optimal for a specific range of q ∈ (0, 1). Suppose
that at some iteration t ≥ 1, b(t) solves (1.4) uniquely for some fixed qt ∈ (0, 1) and
specified basis (p-subset) h ∈ H. Let q denote the quantile of interest and assume
that qt < q. Iteration t of the KD algorithm consists of determining the least upper
bound qt+1 > qt, also called the breakpoint [38], at which b(t) ceases to be optimal.
The algorithm computes these breakpoints by executing line searches of the form
b(t) + δdk, where δ denotes the step size, k is the index of the basic variable selected
to leave the basis, and the search direction dk is the kth column of B−1. The dual
counterpart of the line search is obtained from (2.10). Specifically, the equation in
(2.13) for the translated dual basic vector ab can be rewritten such that the optimal
a(t)b at iteration t satisfies the double inequality
0p ≤ (1− q)(BT)−1
XT1n −(BT)−1
NTa(t)N ≤ 1p, (2.17)
which can be further decomposed into two p-vectors
f =(BT)−1(XT1n −NTa
(t)N
)g =
(BT)−1
XT1n
such that
0p ≤ f − qg ≤ 1p. (2.18)
If the current basis does not satisfy (2.17), then the next breakpoint must be com-
puted because at least one a(t)j ∈ a
(t)b is dual infeasible. Either a(t)
j < 0 or a(t)j > 1
2-14
![Page 38: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/38.jpg)
for at least one j ∈ h, so the index of the leaving basic variable corresponds to the
most negative element of the set
k = minj∈h
{−a(t)
j , a(t)j − 1
}.
The leaving variable a(t)k ∈ a
(t)b becomes nonbasic such that either fk − qt+1gk = 0
or fk − qt+1gk = 1, and it is desirable to find the largest breakpoint which does not
exceed the target quantile. That is,
qt+1 = max
{fkgk,fk − 1
gk: qt < qt+1 ≤ q
}.
The boundary to which a(t)k is driven determines the direction of movement along
b(t) + δdk in order to bring a(t)k into dual feasibility. Let σ denote the direction of
movement such that
σ =
1, if qt+1 = fk/gk
−1, if qt+1 = (fk − 1) /gk
.The next task is finding a nonbasic variable to enter the basis, which occurs when
a nonbasic residual is driven to zero. The new residual vector is given by r(t+1) =
y − X(b(t) + δdk
)= y − Xb(t) − δXdk = r(t) − δXdk, so the index m of the
entering (blocking) variable is determined by the smallest positive step size such
that a nonbasic residual becomes zero,
m = minj∈h
{δj =
(r
(t)j /σx
Tj dk
)> 0}.
The mth row of the design matrix, xm, replaces xk in the basis matrix B, the new
vector of model coeffi cients is computed by b(t+1) = b(t) + δmdk, t = t + 1, and the
next iteration begins.
2-15
![Page 39: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/39.jpg)
2.2.3 Interval-Linear Programming. In 1969, Robers and Ben-Israel [56]
presented a method for solving the median regression problem called interval-linear
programming (I-LP). The algorithm proceeds from (1.8) on the intervalw ∈ [−1, 1]n,
and each iteration consists of solving a decomposition of (1.8). The I-LP method
solves problems of the form
maxw∈[−1,1]n
yTw (2.19)
subject to
d− ≤ Aw ≤ d+,
where y,A,d−, and d+ are known. It follows that (1.8) can be rewritten into
(2.19) by substituting XTw ≥ 0p and XTw ≤ 0p as equivalent to XTw = 0p. By
augmenting matrices and vectors, (1.8) becomes
maxw∈[−1,1]n
yTw (2.20)
subject to 0p
−1n
≤XT
In
w ≤0p1n
,where d− =
0p
−1n
,d+ =
0p1n
, and A =
XT
In
.Simplex methods pivot from one basic feasible solution to another, so they
can be classified as special cases of pivoting algorithms. I-LP, when applied to the
QRMEP, operates exclusively in the dual space generated by (1.8) where only one
solution is dual feasible. Therefore, I-LP can only be classified as a pivoting algo-
rithm because it pivots among dual infeasible solutions. The decomposed problem
takes the form
maxw∈[−1,1]n
yTw (2.21)
2-16
![Page 40: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/40.jpg)
subject to
d(t)− ≤ A(t)w ≤ d(t)+
ds(t)− ≤ as(t)w ≤ ds(t)+,
where A(t) is a coeffi cient matrix from a set of n constraints chosen fromA such that
A(t) is nonsingular, as(t) is a coeffi cient vector from a single constraint chosen from
the remaining p constraints, and t ≥ 1 denotes the current iteration. For t ≥ 1,
let w(t−1) be the maximizer of yTw, subject only to d(t)− ≤ A(t)w ≤ d(t)+, and
let w(t) be the optimal solution to (2.21). For t = 1, notice that (2.21) is easily
formulated by letting d(1)− = −1n,d(1)+ = 1n, and A(1) = In. The remaining single
constraint is therefore chosen from 0p ≤ XTw ≤ 0p, say 0 ≤ 1Tnw ≤ 0. Once
the solution to max{yTw(t−1) : d(t)− ≤ A(t)w(t−1) ≤ d(t)+
}is obtained, it must be
substituted into the single constraint. If w(t−1) satisfies ds(t)− ≤ as(t)w(t−1) ≤ ds(t)+,
then w(t) = w(t−1). The constraints not included in (2.21) are then checked for
feasibility. If w(t−1) satisfies all constraints in (2.19), then the solution is optimal.
Otherwise, the quantity as(t)w(t−1) is either less than its lower bound or greater
than its upper bound. Let the nonnegative amount by which w(t−1) fails to satisfy
ds(t)− ≤ as(t)w(t−1) ≤ ds(t)+ be denoted by
∆ =
as(t)w(t−1) − ds(t)+, if as(t)w(t−1) > 0
as(t)w(t−1) − ds(t)−, if as(t)w(t−1) < 0
. (2.22)
It follows that moving as(t)w(t−1) into feasibility requires changing one or more ele-
ments of A(t)w(t−1), but such changes cannot affect the feasibility of A(t)w(t−1). Let
γj represent the marginal cost of changing the jth element of A(t)w(t−1), where
γj =
(yT(A(t)
)−1)j(
as(t) (A(t))−1)j
sgn ∆ (2.23)
2-17
![Page 41: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/41.jpg)
for all(as(t)
(A(t)
)−1)j6= 0. Let m ≤ n be the number of nonnegative marginal
costs, and sort all γj ≥ 0 from smallest to largest. Define Q as the set of indices
corresponding to the sorted marginal costs,
Q =
{jk : 1 ≤ k ≤ m,
(as(t)
(A(t)
)−1)jk6= 0, γjk ≥ 0
}(2.24)
where 1 ≤ j ≤ n. For each jk ∈ Q, the distance from each element(A(t)w(t−1)
)jk
to its closer boundary is determined by
δjk =
(d(t)− −A(t)w(t−1)
)jk, if sgn ∆ = sgn
(as(t)
(A(t)
)−1)jk(
d(t)+ −A(t)w(t−1))jk, if sgn ∆ = − sgn
(as(t)
(A(t)
)−1)jk
. (2.25)
The index of the element of w(t−1) that will become basic, along with the associated
relative cost, is obtained by
jr = min
{jk : 1 ≤ k ≤ m,
∣∣∣∣∣∑jk∈Q
δjk
(as(t)
(A(t)
)−1)jk
∣∣∣∣∣ ≥ |∆|}
(2.26)
and
θ =−∆−
∑jr−1jk=j1
δjk
(as(t)
(A(t)
)−1)jk(
as(t) (A(t))−1)jr
. (2.27)
Now w(t) is computed as
w(t) = w(t−1) +(A(t)
)−1
(jr−1∑jk=j1
δjkejk + θejr
),
where ejk denotes an n-vector of zeros with a one in the jkth position. If w(t) also
satisfies the constraint(s) excluded from (2.21), then w(t) is the optimal solution
to (2.19). Otherwise, the jrth constraint is removed from d(t)− ≤ A(t)w ≤ d(t)+
and replaced by ds(t)− ≤ as(t)w ≤ ds(t)+, and the constraint set becomes d(t+1)− ≤
2-18
![Page 42: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/42.jpg)
A(t+1)w ≤ d(t+1)+. The new single constraint ds(t+1)− ≤ as(t+1)w ≤ ds(t+1)+ is taken
as any constraint from (2.20) not satisfied by w(t), and the next iteration begins.
I-LP can be suffi ciently demonstrated by working an example from the Cars93
data set [19]. This data set consists of information on vehicle sales in the United
States for the 1993 model year.
Example 4 A sample extracted from Cars93 uses the mean retail price (response)
and horsepower (regressor) variables for all vehicle models sold by Ford Motor Com-
pany in 1993,
y =(
7. 4, 10. 1, 11. 3, 15. 9, 19. 9, 14, 20. 2, 20. 9)T
X =
1 1 1 1 1 1 1 1
63 127 96 105 145 115 140 190
T
where n = 8 and p = 2. For t = 1, the relaxed problem is
maxw∈[−1,1]n
yTw
subject to
−1n ≤ Inw ≤ 1n
0 ≤ 1Tnw ≤ 0,
where A(1) = In, as(1) = 1Tn , d(1)− = −1n, d(1)+ = 1n, and ds(1)− = ds(1)+ = 0. The
optimal solution to max{yTw(0) : −1n ≤ Inw(0) ≤ 1n
}is obviously w(0) = 1n, but
as(1)w(0)= 1Tnw(0) = n is positive and does not satisfy 0 ≤ 1Tnw ≤ 0. Let ∆ = n = 8
and γ = y be the vector of marginal costs. Sorting the elements of γ yields Q =
{1, 2, 3, 6, 4, 5, 7, 8}. All elements of as(1)(A(1)
)−1= 1TnIn are of the same sign as
∆, so δjk =(d(t)− −A(t)w(t−1)
)jk
=(−1n − Inw(0)
)jk
= −2 for all jk ∈ Q. Notice
2-19
![Page 43: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/43.jpg)
that∣∣δ1
(1TnIn
)1
+ δ2
(1TnIn
)2
+ δ3
(1TnIn
)3
+ δ6
(1TnIn
)6
∣∣ = |−2− 2− 2− 2| = 8, so
jr = min
{jk : 1 ≤ k ≤ 8,
∣∣∣∣∣∑jk∈Q
δjk(1TnIn
)jk
∣∣∣∣∣ ≥ 8
}= j4 = 6
and
θ =−8−
∑j3jk=j1
δjk(1TnIn
)jk
(1TnIn)j4=−8− (−2− 2− 2)
1= −2,
which leads to
w(1) = w(0) + In
(j3∑
jk=j1
δjkejk − 2ejr
)= 1n + (−2e1 − 2e2 − 2e3 − 2e6)
=(−1, −1, −1, 1, 1, −1, 1, 1
)T.
2-20
![Page 44: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/44.jpg)
Since XT1w
(1) = 179 6= 0, the 6th constraint −1 ≤ w6 ≤ 1 is replaced by 0 ≤ 1Tnw ≤
0. For t = 2,
A(2) =
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
1 1 1 1 1 1 1 1
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
,
as(2) = XT1 ,
d(2)− = −1n + e6,
d(2)+ = 1n − e6,
ds(2)− = ds(2)+ = 0,
∆ = 179,
γ =(
0.1269, −0.3250, 0.1421, −0.1900, 0.1967, 0.1217, 0.2480, 0.0920)T
,
Q = {8, 6, 1, 3, 5, 7} ,
δ =(δ8, δ6, δ1, δ3, δ5, δ7
)T=
(−2, 0, 2, 2, −2, −2
)T,
jr = min
{jk : 1 ≤ k ≤ 6,
∣∣∣∣∣∑jk∈Q
δjk
(XT
1
(A(2)
)−1)jk
∣∣∣∣∣ ≥ 179
}= 1,
θ =−179−
∑j2jk=j1
δjk
(XT
1
(A(2)
)−1)jk(
XT1 (A(2))
−1)j3
= 0.2788,
w(2) = w(1) +(A(2)
)−1
(j2∑
jk=j1
δjkejk + θej3
)
=(−0.7212, −1, −1, 1, 1, 0.7212, 1, −1
)T.
2-21
![Page 45: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/45.jpg)
Since XTw(2) = 0 and w(2) ∈ [−1, 1]8, the vector is optimal to (2.20) with z =
yTw(2) = 18.4596.
Robers and Ben-Israel [56] claim a computational effi ciency advantage over
the simplex method when applied to the l1-approximation problem. The algorithm
is shown to extend easily to any interval [d−,d+] in [57], and numerical results to
support the effi ciency claim in [56] are also given. Since I-LP was developed to
estimate only the conditional median, a major component of this research involves
extending I-LP to the entire class of QRMEPs. This extension is presented in
Chapter 3.
2.2.4 Dual Simplex Method for Bounded Variables. The duality proper-
ties of the QRMEP prevent the direct implementation of the simplex method to
(1.8). To be more specific, a primal simplexing algorithm cannot be used because
it requires pivoting from one primal basic feasible solution to another until dual
feasibility is satisfied. In the case of the QRMEP, (1.8) would have to be treated
as the primal LP, and the goal would be to satisfy primal feasibility. It has been
previously established, however, that any basis of size p satisfies primal feasibility
for the QRMEP. Therefore, implementing an algorithm which proceeds from (1.8)
requires a dual approach, namely the dual simplex method.
The dual simplex algorithm pivots from one primal feasible solution to another,
as does the Koenker-d’Orey algorithm, while using dual space properties to update
the solution at each iteration, so the initial basis need not be necessarily dual feasible
[4]. Unlike the Koenker-d’Orey algorithm, however, the dual simplex method pivots
to an adjacent vertex. The previous discussion on bounded simplex also established
that any nonbasic variable in (1.8) is fixed at one of the bounds, which is additional
confirmation that the dual simplex method is an appropriate implementation [39].
The standard dual simplex method for bounded variables is designed to solve
problems whose general form is maxx{cTx : Ax = b,x ∈ [l, u]n
}, where c and x are
2-22
![Page 46: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/46.jpg)
n-vectors, A is (m× n) with full row rank, b is an m-vector, and l, u ∈ R are the
lower and upper bounds, respectively. Clearly, (1.8) adheres to this general form,
as m = p, c = y, x = w, A = XT , b = 0p, l = (q − 1), and u = q. In 2002,
Kostina [39] presented the steps of what can be called the short-step dual simplex
method, the details of which are reproduced here in the context of the QRMEP.
The algorithm is initialized by selecting from the design matrix a starting basis,
denoted by the (p× p) matrix B, which is not necessarily dual feasible [4]. In fact,
the duality properties of the QRMEP all but guarantee that the initial basis will be
dual infeasible, unless the optimal basis is chosen by happenstance. Once a starting
basis is selected, the exact-fit property is used to estimate the model parameters.
That is, solve the equation Bb = yb for b. Let N be the ((n− p)× p) matrix
of nonbasic observations. The following steps constitute a single iteration of the
short-step dual simplex method.
1. Compute the n-vector of reduced costs (i.e., the residual vector) as r = y−Xb.
Let the triplet λ = (b,u,v) denote a feasible solution to (1.4), and partition
the residual vector as r = (u;v) such that uj = rj, vj = −rj, and r = u − v.
Ensure the numbers of nonzero elements in u and v satisfy the cardinality
range property. Define a search directions ∆r,∆b and a step length σ ≥ 0
such that the residual vector for the next iteration takes the form r (σ) =
r+ σ∆r = y −Xb (σ), where b (σ) = b+ σ∆b. Therefore,
r+ σ∆r = y −X (b+ σ∆b)
y −Xb+ σ∆r = y −Xb− σX∆b
∆r = −X∆b.
2. Partition the dual solution vector as w = (wb;wr), where wb denotes the
vector of basic variables, and wr denotes the vector of nonbasic variables cor-
2-23
![Page 47: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/47.jpg)
responding to nonzero residuals. Fix the elements of wr such that
w(j)r =
q − 1, vj > 0
q, uj > 0
.
Compute the basic variables as wb = −(BT)−1
NTwr.
3. If wb ∈ [q − 1, q]p, then the current basis is optimal, and the algorithm ter-
minates. Otherwise, one of the following inequalities holds for at least one
ik ∈ wb: w(ik)b < (q − 1) or w(ik)
b > q. Select the ikth variable to leave the
basis, where k is the index of the most infeasible element of wb. That is, for
1 ≤ k ≤ p,
k = maxik
{q − 1− w(ik)
b , w(ik)b − q
},
4. Solve Bt = ek for t, where ek denotes a p-vector of zeros with the kth element
at unity. Compute ∆b = ∆rikt, where
∆rik =
−1, w(ik)b > q
1, w(ik)b < (q − 1)
,
and compute ∆r = −X∆b = −∆rikXt.
5. Find the blocking variable by computing step lengths for each j ∈ N, where
∆rj 6= 0. Let σj be the step length for the jth nonbasic variable such that
σj =
−rj/∆rj, rj∆rj < 0
∞, o/w
.
Select the minimum step length according to σh = min {σj}, where h is the
index of the nonbasic variable chosen to enter the basis. If σh =∞, then the
problem is infeasible.
2-24
![Page 48: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/48.jpg)
6. Update the basis and model parameters. That is,
b = b+ σh∆b,
B ←− (B \ xk) ∪ xh,
N ←− (N \ xh) ∪ xk,
and return to Step 1 to begin the next iteration.
In 2002, Kostina [39] developed a variant of the dual simplex algorithm for a
general maximization problem with bounded variables. This variant modifies Step 5
of the dual simplex method to allow for taking longer dual steps. It can therefore be
called the long-step dual simplex (LSDS) method, and this research extends LSDS
to the class of QRMEPs. This extension is also presented in Chapter 3.
2.3 Interior-Point Methods
Interior-point algorithms are the most commonly used methods for estimating
quantile regression models in practice, mainly because of the effi ciency advantage
they offer over simplex-based methods for moderate and large problems. The gen-
eral procedure of any interior-point method involves starting from an initial feasible
solution, which must lie strictly in the interior of the feasible region, and moving
in some direction which improves the objective function value [61]. This action of
taking an improving step is repeated until some stopping criterion is satisfied. Two
types of interior-point methods, both of which have been successfully adapted to
the QRMEP, are reviewed here: affi ne scaling and log-barrier methods. The affi ne
scaling method is presented first under a general maximization problem with equal-
ity constraints, followed by the Koenker-Park [35] adaptation of this method to the
QRMEP. This section concludes by discussing two types of log-barrier methods: the
primal and primal-dual path following algorithms.
2-25
![Page 49: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/49.jpg)
2.3.1 Affi ne Scaling. Vanderbei [61] presents the affi ne scaling algorithm
as a method of solving the maximization problem,
maxg≥0n
cTg (2.28)
subject to
Ag = d,
where c and g are each (n× 1) vectors, d is an (m× 1) vector, and A is an (m× n)
matrix. Orthogonal gradient projection is employed as an appropriate ascent direc-
tion, but g must be scaled such that any step in the steepest ascent direction does
not cross the bounding hyperplanes of the feasible region, thus violating feasibility.
The n-vector g can be represented equivalently as G1n, where G = diag (g), so the
transformation
g = G−1g (2.29)
= 1n
g = Gg.
is applied by Vanderbei [61] as well as Koenker and Park [35]. After substituting
(2.29) into (2.28), the result is an equivalent LP
maxg>0n
cT g (2.30)
subject to
Ag = d,
where A = AG and c = Gc.
Let g(0) denote the initial fesible solution to (2.30). Because of the variable
transformation from (2.29), Ag(0) = d and 0 < g(0) = 1n, so g(0) is an interior
2-26
![Page 50: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/50.jpg)
solution that lies strictly within the bounds of the polytope defined by Ag = d and
g ≥ 0. This initial solution must now be moved towards optimality by stepping in
some direction ∆g(0) such that g(1) = g(0) + α∆g is also a feasible interior solution
to (2.30), and cT g(1) > cT g(0). If g(1) is feasible, then
Ag(1) = d
A(g(0) + α∆g
)= d
must hold. Since Ag(0) = d, then g(1) cannot satisfy primal feasibility unless
αA∆g = 0, implying that ∆g = 0 and no improvement to the initial solution is
obtained. Finding the search direction is the first priority, so the step size α can be
ignored for now. The steepest ascent direction is of course nonzero, so an additional
constraint must be imposed on (2.30) such that both the current and improved
solutions strictly satisfy primal feasibility. This is easily achieved by imposing the
unit length requirement on the steepest ascent direction,
maxg(0)>0n,∆g∈Rn
cT(g(0) + ∆g
)(2.31)
subject to
A(g(0) + ∆g
)= d
‖∆g‖2 = 1,
2-27
![Page 51: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/51.jpg)
where ‖∆g‖ represents the Euclidean norm of ∆g. Introducing h and δ as vectors
of Lagrange multipliers leads to the Lagrangian and first-order conditions
L (∆g,h, δ) = cT(g(0) + ∆g
)+ hT
(d− A
(g(0) + ∆g
))+ δ
(1−∆gT∆g
)=
(cT − hT A− δ∆gT
)∆g +
(cT − hT A
)g(0) + hTd+ δ,
∂L
∂∆g= c− ATh− 2δ∆g = 0n, (2.32)
∂L
∂h= d− A
(g(0) + ∆g
)= 0m, (2.33)
∂L
∂δ= 1−∆gT∆g = 0. (2.34)
The equations (2.33) and (2.34) are clearly the primal feasibility conditions, while
(2.32) is the dual feasibility condition. Dual feasibility implies primal optimality [4],
so if it is assumed that δ = 1/2, then the dual feasibility condition reduces to
∆g = c − ATh. Applying this substitution to ∂L/∂h, while letting r = c −ATh
and recalling that Ag(0) = d, leads to
d− A(g(0) + c− ATh
)= 0m (2.35)
Ag(0) + Ac− AATh = d
Ac− AATh = 0m
h =(AAT
)−1
Ac
=(AG2AT
)−1AG2c
2-28
![Page 52: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/52.jpg)
and
∆g = c− ATh (2.36)
= G(c−ATh
)= Gr
=(In −GAT
(AG2AT
)−1AG
)Gc
= PGc
= cP .
Let P = In − GAT(AG2AT
)−1AG be the matrix which projects c = Gc onto
the null space of A = AG. In other words, if the null space of AG is defined by
N (AG) = {∆g ∈ Rn : AG∆g = 0m}, then ∆g is the orthogonal projection of Gc
onto N (AG) [61]. Because of this attribute, the affi ne scaling method can also be
called a gradient projection method [7]. The step size can now be obtained, and
when coupled with the ascent direction from (2.36), the two can be used to compute
the new solution vector g(1).
Gradient projection moves the new solution towards the feasible region bound-
ary. The optimal solution vector to a LP of the same form as (2.28), assuming non-
degeneracy, consists of a set ofm basic variables which are strictly positive and n−m
nonbasic variables which are exactly zero [4]. If a new solution moves in the pro-
jected gradient direction until it reaches a bounding hyperplane, then gj +α∆gj = 0
for each nonbasic gj, and the step length is chosen as the smallest multiplier which
satisfies this property, or
α =
(maxj
{−∆gj
gj
})−1
(2.37)
=
(maxj
{−eTj G
2r
eTj g
})−1
.
2-29
![Page 53: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/53.jpg)
Since the new solution must be strictly feasible, the step length is further reduced
by some θ ∈ (0, 1) such that g(1) does not reach a bounding hyperplane, so
α = θ
(maxj
{−eTj G
2r
eTj g
})−1
. (2.38)
The choice of θ is also known to have an effect on the convergence of the affi ne scaling
algorithm [61]. Specifically, Hall and Vanderbei [27] confirm that convergence from
affi ne scaling is assured as long as the step size is scaled no more than θ = 2/3. This
is a significant result because it is proven to hold even when nondegeneracy cannot
be assumed. However, if the true optimal solution is nondegenerate, then affi ne
scaling is guaranteed to converge to optimality for any 0 < θ < 1.
The weak duality property supplies a natural stopping criterion. The solution
to the primal maximization problem is bounded above by the dual solution [4],
cTg ≤ dTh
and the two are equal at optimality. It follows that as the estimate in each iteration
approaches the true optimal solution, the duality gap (i.e., the absolute difference
between the dual and primal solutions) decreases, approaching zero in the limit [60].
That is,
limk→∞
(dTh(k)−cTg(k)
)= 0 (2.39)
where g(k) and h(k) are the respective kth iteration estimates of the primal and dual
solutions. Therefore, for some small positive tolerance ξ, the algorithm terminates
when(dTh(k)−cTg(k)
)≤ ξ [35].
Affi ne scaling can be stated simply by the following steps:
1. Choose a suitable θ ∈ (0, 1), a tolerance ξ > 0, and apply the variable trans-
formation from (2.29).
2-30
![Page 54: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/54.jpg)
2. Compute the projected gradient according to (2.36), which also contains for-
mulas for the dual and shadow price vectors.
3. Use (2.38) to obtain the step size and compute the new solution,
g(k+1) = g(k) + αGr(k).
4. If(dTh(k+1)−cTg(k+1)
)≤ ξ, then STOP. Otherwise, return to Step 2.
Vanderbei, Meketon, and Freedman [60] identify affi ne scaling as an effi cient
alternative to Karmarkar’s algorithm. Koenker and Park [35] adapt affi ne scaling
to quantile regression, since (1.8) is nearly equivalent in form to (2.28). As before,
start by applying the transformation,
w = 1TnW = wW. (2.40)
Unlike (2.28), the dual vector from (1.8) is bounded above and below, so W 6=
diag (w). Rather, w is centered [35] relative to the bounds on w. That is, each
diagonal element ofW is determined by the boundary to which it is closest, or
W = diag(
min{
1− q + w(k)j , q − w(k)
j
})(2.41)
where w(k) is the dual vector estimate for the kth iteration. This transformation,
like (2.30), yields X = WX and y = Wy. Notice that the dual feasible region
of (1.8) defines N(XT)(the null space of XT ), and exactly p residuals are zero
(assuming nondegeneracy), so every transposed design matrix has rank p (full row
rank). Therefore, under transformation, (WX)T = XTW also has full row rank,
and the orthogonal projection ofWy onto N(XTW
)(i.e., steepest ascent direction)
2-31
![Page 55: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/55.jpg)
is [60],
∆w =
(In − X
(XT X
)−1
XT
)y (2.42)
=(In −WX
(XTW2X
)−1XTW
)Wy
= W(y −X
(XTW2X
)−1XTW2y
)= W
(y −Xb
)= Wr(k)
where b =(XTW2X
)−1XTW2y is the p-vector of estimated model coeffi cients,
and r(k) is the residual vector estimate. Because the equation for b looks nearly
identical to the solution to the OLS normal equations, Koenker and Park describe
this application of affi ne scaling as an IRLS method [35]. To determine the step
size, and not to be confused with the α from (2.38), let
α = max1≤j≤n
{max
{−eTjW2r(k)
1− q + w(k)j
,eTjW
2r(k)
q − w(k)j
}}(2.43)
and choose some η ∈ (0, 1) such that the dual vector estimate for iteration (k + 1) is
w(k+1) = w(k) +η
αW2r(k). (2.44)
Once the duality gap(
(1− q)∣∣r(k)
∣∣r(q)j <0
+ q∣∣r(k)
∣∣r(q)j >0
)−w(k)y is suffi ciently small,
or less than some defined tolerance ξ, then the current dual vector is optimal.
Koenker and Park [35] focus only on the median case, so this algorithm was
tested for q = 1/4, 3/4, 1/20, 19/20 and compared to the exact results from the
bounded LP. Each test was performed on the same subset of Cars93 as that used to
test the I-LPmethod. Slightly more iterations were required to solve for the quartiles
than the quintiles, yet convergence to a unique optimal solution was achieved for each
test. This problem has a nondegenerate optimal solution, so η was adjusted in order
2-32
![Page 56: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/56.jpg)
to reduce the required number of iterations and speed up convergence. Beginning
at η = 1/20, and continuing at increments of 1/20, the number of iterations was
recorded for each run of the affi ne scaling algorithm. Just as expected from [27],
η = 19/20 resulted in the fewest iterations. So, η was then incremented by 1/100,
but these resulted in no further iteration reductions.
2.3.2 Log-Barrier Methods. Barrier function methods can be developed for
many different LP structures, but they are constructed here to solve (2.28). Barrier
function methods are so named because they, like penalty function methods, help
transform constrained optimization problems into sequences of unconstrained opti-
mization problems by adding a weighted function to the objective such that only
strictly feasible solutions are generated. The barrier function is chosen such that
as a solution approaches the boundary of the feasible region, the barrier function
approaches infinity, resulting in no objective function improvement as the solution
approaches the polytope boundary [5]. Several types of functions satisfy this prop-
erty, but the most commonly used is the natural logarithm. Interior-point methods
using the natural logarithm as a barrier function are therefore called logarithmic
barrier (log-barrier) methods [24].
The polytope boundaries in (2.28) are defined by the nonnegativity constraints
g ≥ 0, so applying the log-barrier function leads to the parametric form of (2.28)
maxg≥0n,µ>0
B (g, µ) = cTg + µ
n∑j=1
ln gj (2.45)
subject to
Ag = d,
where µ > 0. Because limgj→0+ (ln gj) = −∞ for all j, the log-barrier function clearly
penalizes the objective function as the solution estimate approaches the polytope
boundaries. It follows then that any solution vector g to (2.45) lies strictly in the
2-33
![Page 57: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/57.jpg)
interior S of the feasible region, where S = {g : Ag = d,g > 0}. Assume that
an optimal solution to (2.28) exists and denote it by g∗. Assume also that an
optimal solution to (2.45) exists and denote it by g (µ). It follows that when µ is
suffi ciently small, the objective function approaches its optimal value [8]. In other
words, limµ→0
(cTg (µ) + µ
∑nj=1 ln gj (µ)
)= cTg∗.
The KKT conditions for (2.45) are obtained by starting with the Lagrangian
L (g,h, µ) = cTg + µn∑j=1
ln gj + hT (d−Ag)
=(cT − hTA
)g + µ
n∑j=1
ln gj + hTd.
Taking partial derivatives results in the primal feasibility and dual feasibility condi-
tions [61], respectively
∂L
∂h= d−Ag = 0m
∂L
∂g= c−ATh+ µG−11n = 0n.
The complementarity condition [8], which corresponds to the complementary slack-
ness condition when µ = 0, is defined by the substitution s = µG−11n, leading to
the KKT conditions
Ag = d (2.46)
ATh− s = c
GS1n = µ1n.
2.3.2.1 Primal Path Following Algorithm. Since B (g, µ) is neither
linear nor quadratic, the ascent direction is obtained via Newton’s method. The
objective function in the Newton problem is the second-order Taylor series expansion
2-34
![Page 58: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/58.jpg)
(quadratic approximation) of B (g, µ), whose gradient and Hessian take the forms
∇B (g, µ) = c+ µG−11n
∇2B (g, µ) = −µG−2,
respectively. The Newton problem therefore takes the form
maxφ∈Rn,µ>0
cTφ+ µ1TnG−1φ− 1
2µφTG−2φ (2.47)
subject to
Aφ = 0m,
where φ denotes the ascent direction. The first-order conditions are obtained by
applying again the Lagrange multiplier method, which yields
L (φ,h) = cTφ+ µ1TnG−1φ− 1
2µφTG−2φ− hTAφ
c+ µG−11n − µG−2φ = ATh
Aφ = 0m.
Although AT is not invertible, it is assumed to have full row rank, so multiplying
through the dual feasibility condition by AG2 produces an invertible AG2AT term
on the right hand side of the first condition, making it possible to solve for the dual
solution vector h directly [55]:
h =(AG2AT
)−1AG (Gc+ µ1n) . (2.48)
2-35
![Page 59: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/59.jpg)
The result for h is substituted back into the dual feasibility condition such that the
ascent direction φ can be obtained,
φ =(In −G2AT
(AG2AT
)−1A)(G1n +
1
µG2c
)(2.49)
= GPG1n +1
µGPGGc
= G
(1n +
1
µcP
)
and the new solution estimate for the next iteration is computed as g(k+1) = g(k) +
φ(k), where g(k) and φ(k) denote the solution estimate and ascent direction, respec-
tively, for the kth iteration.
As µ varies, the solutions to (2.45) form the central path through the poly-
tope, so this type of log-barrier method can be called a path following algorithm.
Bertsimas and Tsitsiklis [8] present (2.28) as a primal minimization problem, so the
resulting log-barrier method is called a primal path following algorithm. For the
QRMEP, however, the translated dual (2.10) is used when applying the path fol-
lowing algorithm, making it instead a dual path following algorithm. It is therefore
necessary to present the dual LP of (2.28)
minh∈Rm,s≥0n
dTh (2.50)
subject to
ATh− s = c,
which is of the same form as the QRMEP primal. The following steps summarize
the dual path following algorithm:
1. Let k = 0, select an initial solution which strictly satisfies primal feasibility
and dual feasibility (i.e., g(0) > 0n and s(0) > 0n), choose some α ∈ (0, 1), set
the tolerance ξ > 0, and set the barrier parameter µ > 0.
2-36
![Page 60: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/60.jpg)
2. If(s(0))Tg(0) < ξ, then STOP. Otherwise, proceed to Step 3.
3. Use (2.48) and (2.49) to compute the primal solution vector h and ascent
direction φ, respectively.
4. Update the dual solution and primal slack vectors, respectively, as follows:
g(k+1) = g(k) + φ(k)
s(k) = ATh− c.
5. Let µ(k+1) = αµ(k), and return to Step 2.
Portnoy and Koenker [55] apply the dual path following algorithm to the
QRMEP by first eliminating the upper bound on the translated dual vector a in
(2.10) via the substitution a+ s = 1n
maxa,s≥0n
yTa (2.51)
subject to
XTa = (1− q)XT1n
a+ s = 1n,
which puts (2.10) in the same form as that of (2.28). The log-barrier function
becomes B (a, s, µ) = yTa+µ∑n
j=1 ln ajsj, so its gradient and Hessian, respectively,
are
∇B (a, s, µ) = yT + µ(A−1 − S−1
)1n
∇2B (a, s, µ) = −µ(A−2 + S−2
).
2-37
![Page 61: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/61.jpg)
The Newton step φ maximizes the quadratic approximation of B (a, s, µ),
maxφ∈Rn,µ>0
yTφ+ µφT(A−1 − S−1
)1n −
1
2µφT
(A−2 + S−2
)φ (2.52)
subject to
XTφ = 0p.
If h = b, G−1 = (A−1 − S−1), and G−2 = (A−2 + S−2), then the Newton direction
φ satisfies
y + µ(A−1 − S−1
)1n − µ
(A−2 + S−2
)φ = Xb
XTφ = 0p
b =(XTG2X
)−1XT(G2y + µG1n
)(2.53)
=(XT(A−2 + S−2
)−1X)−1
XT((A−2 + S−2
)−1y + µ
(A−1 − S−1
)−11n
)φ =
(In −G2X
(XTG2X
)−1XTG2
)((A−1 − S−1
)1n +
1
µy
)(2.54)
where b ∈ Rp is the vector of model parameters.
2.3.2.2 Primal-Dual Path Following Algorithm. The primal-dual
variant also approximates the central path through the polytope. It differs from the
primal(dual) path following algorithm by applying Newton’s method to the KKT
system of (2.46) rather than to the second-order Taylor series expansion described
in (2.47). Because GS1n = µ1n is nonlinear, the algorithm obtains search direc-
tions in both the primal and dual spaces by applying Newton’s method for solving a
nonlinear system of equations. Despite being computationally more complex than
affi ne scaling, the primal-dual path following IPM performs very well on large prob-
lems. According to Bertsimas and Tsitsiklis [8], it is the preferred algorithm for
commercial solvers implementing interior-point methods.
2-38
![Page 62: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/62.jpg)
If the (2n+m) × 1 vector t = (g,h, s) represents the solution to the KKT
conditions in (2.46), and
F (t) =
Ag − d
ATh− s− c
GS1n − µ1n
represents the KKT system, then the objective is to find t such that F (t) = 0(2n+m).
Start by constructing an approximation of F (t). The first-order Taylor series ex-
pansion is F (t+ φ) ≈ F (t) + J (t)φ, where φ is the Newton direction and J (t) is
the (2n+m)× (2n+m) Jacobian matrix
J (t) =
A 0 0
0 AT −InS 0 G
.
Lettingφ =(φg,φh,φs
), the Newton direction is obtained by solving J (t)φ = −F (t),
or
Aφg = d−Ag (2.55)
ATφh − φs = c−ATh+ s
= c−ATh+ µG−11n
Sφg +Gφs = µ1n −GS1n.
Solving (2.55) for the Newton directions yields
φ =
φg
φh
φs
=
GS−1
(µG−11n + c−ATφh −ATh
)(AGS−1AT
)−1 (AGS−1
(c+ µG−11n −ATh
)+Ag − d
)ATφh +ATh− µG−11n − c
.
2-39
![Page 63: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/63.jpg)
After obtaining the appropriate search directions, proper step lengths must be
computed. The primal-dual path following algorithm requires two step lengths: one
each for the primal and dual directions, respectively. Let θ(k)P denote the primal step
length for the kth iteration and θ(k)D denote the dual step length for the kth iteration.
The step lengths are computed using a ratio test similar to that of (2.38),
θ(k)P = σmin
j
{minj
{−
eTj g(k)
eTj φ(k)g (j)
, eTj φ(k)g (j)
}}
θ(k)D = σmin
j
{minj
{−
eTj s(k)
eTj φ(k)s (j)
, eTj φ(k)s (j)
}}
where σ ∈ (0, 1), φ(k)g (j) is the jth element of φ(k)
g , and φ(k)s (j) is the jth element of
φ(k)s . The scaling factor σ is usually set very close to 1 in practice so that as large
a step as possible can be taken without reaching the polytope boundary. Lustig,
Marsten, and Shanno [42] use σ = 0.99995, as do Portnoy and Koenker [55].
2.4 Finite Smoothing Algorithm
Chen [18] proposed an alternative algorithm to the interior-point method which
applies smoothing to the objective function ρq. Testing and comparisons against
the Barrodale-Roberts and interior-point methods revealed finite smoothing to be
computationally superior to the former. The performance of finite smoothing against
that of the interior-point method is what defines its significance. For large-sample
problems where the number of regressors is small, the finite smoothing and interior-
point methods perform similarly. Finite smoothing performs much faster than the
interior-point method when the design matrix contains a large number of regressors.
It has another advantage in that it provides the same accuracy, or exact solution, as
the Barrodale-Roberts algorithm.
2-40
![Page 64: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/64.jpg)
For the (100q)th conditional quantile, (1.4) can be approximated by the smooth
Huber function [18]n∑j=1
Hγ,q
(r
(q)j
)(2.56)
where
Hγ,q
(r
(q)j
)=
(q − 1) r
(q)j − 1
2(q − 1)2 γ if r
(q)j ≤ (q − 1) γ
12γ
(r
(q)j
)2
if (q − 1) γ ≤ r(q)j ≤ qγ
qr(q)j − 1
2q2γ if r
(q)j ≥ qγ
(2.57)
and γ ∈ R+ is a threshold value. Notice that the inequalities in Hγ, q define three
subregions whose boundaries are the parallel hyperplanes r(q) = (q − 1) γ1n and
r(q) = qγ1n. Each negative residual satisfies r(q)j ≤ (q − 1) γ, and each positive
residual satisfies r(q)j ≥ qγ. The basic residuals lie strictly between the parallel
hyperplanes, which is further demonstrated by defining a sign vector ξ such that
ξj =
−1 if r
(q)j ≤ (q − 1) γ
0 if (q − 1) γ < r(q)j < qγ
1 if r(q)j ≥ qγ
(2.58)
for the jth observation. Define also ωj = 1 − ξ2j such that the smoothing function
can be rewritten as
Hγ,q
(r
(q)j
)=
1
2γωj
(r
(q)j
)2
+ξj
(1
2r
(q)j +
1
4(1− 2q) γ + ξj
(r
(q)j
(q − 1
2
)− 1
4
(2q2 − 2q + 1
)γ
)).
The smoothed objective function is now continuously differentiable, so both its
gradient and Hessian exist. The finite smoothing algorithm is therefore a modified
line search method, where∑n
j=1Hγ,q
(r
(q)j
)is minimized for a series of decreasing
γ [18]. As γ approaches zero, the minimizer of (2.56) approaches the true minimizer
of (1.4).
2-41
![Page 65: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/65.jpg)
Chen [18] cautions not to view the finite smoothing algorithm as a complete
replacement for other quantile regression algorithms. The algorithm performs best
with a large number of regressors, which occurs commonly in certain types of studies,
such as those involving survey data. Chen [18] suggests possibly using a different
method when this is not the case, as characteristics of the data could cause the finite
smoothing method, and others, to fail. A data set with a significant number of
outliers may lead to both the interior-point and finite smoothing algorithms failing,
but the stability of the Barrodale-Roberts algorithm guarantees a solution, despite
its sluggish performance on large problems.
2.5 Integer Programming Formulations
It can be shown that the special structures of (1.4), (1.8), and (2.10) are similar
to certain well-solved problems in other areas of linear optimization, specifically
integer programming. Through simple scalar multiplication, the primal and dual
LPs can be put into the forms necessary for implementing the out-of-kilter method
for solving the minimum cost network flow problem (MCNFP) [22]. The properties
of the QRMEP can also be used to reconceptualize the problem and generate new
formulations based on two well-known integer programs: the generalized assignment
problem and the knapsack problem. These alternative formulations are developed
in detail in Chapter 3.
The structures of (1.4) and (2.10) are quite similar to the dual and primal
structures, respectively, of the MCNFP as given in [4]. Fulkerson [22] presents the
MCNFP more generally as
minz∈[l,u]n
cz (2.59)
subject to
Az = d,
2-42
![Page 66: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/66.jpg)
where A is m× n. The dual LP of the MCNFP therefore takes the form
maxπ∈Rm,λ≥0n,µ≥0n
πb+ λl− µu (2.60)
subject to
πA+ λ− µ = c.
Both (1.8) and (2.10) can be rewritten, respectively, as equivalents to the MCNFP,
minw∈[q−1,q]n
−yTw (2.61)
subject to
XTw = 0p
and
mina∈[0,1]n
−yTa (2.62)
subject to
XTa = (1− q)XT1n.
Similarities can be found also between the (1.4) and (2.60). By simply negating the
objective function, (1.4) assumes the form of (2.60),
maxb∈Rp,u≥0n,v≥0n
(q − 1)vT1n − quT1n (2.63)
subject to
Xb+ u− v = y.
The optimality conditions Fulkerson presents in [22] for the out-of-kilter algo-
rithm are equivalent to a property of the QRMEP. For a feasible z in (2.59), there
2-43
![Page 67: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/67.jpg)
exists a pricing vector π such that
cj + aiπi > 0 =⇒ zj = l (2.64)
cj + aiπi < 0 =⇒ zj = u (2.65)
for each j, where ai denotes the ith column of A. With y = −c, it follows that
the dual feasibility conditions for nonbasic variables in the QRMEP are equivalent
to the necessary conditions, (2.64) and (2.65), for the MCNFP, specifically
yj − xjb < 0 =⇒ wj = q − 1 (2.66)
yj − xjb > 0 =⇒ wj = q, (2.67)
where xj denotes the jth row of the design matrix. By complementary slackness,
each component of these necessary conditions can attain one of three possible levels:
cj + aiπi can be positive, negative, or zero. Similarly, zj can be greater than, less
than, or equal to one of the bounds. There are, consequently, nine possible case
classifications [22] into which each element of z must fall. This is also true for any
feasible basis in the quantile regression dual, but recall from Section 2.1 that not
every basis which satisfies primal feasibility is necessarily a feasible basis for a given
q. For example, a conditional median estimate in which all residuals are positive
may satisfy primal feasibility, but it is an estimate of a lower conditional quantile
rather than the median. The following table lists the nine classes, each with its
corresponding cases, for both the MCNFP and the equivalent quantile regression
2-44
![Page 68: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/68.jpg)
problem.
Case
Class MCNFP QR
α cj + aiπi > 0, zj = l yj − xjb < 0, wj = q − 1
β cj + aiπi = 0, l < zj < u yj − xjb = 0, q − 1 < wj < q
γ cj + aiπi < 0, zj = u yj − xjb > 0, wj = q
α1 cj + aiπi > 0, zj < l yj − xjb < 0, wj < q − 1
β1 cj + aiπi = 0, zj < l yj − xjb = 0, wj < q − 1
γ1 cj + aiπi < 0, zj < u yj − xjb > 0, wj < q
α2 cj + aiπi > 0, zj > l yj − xjb < 0, wj > q − 1
β2 cj + aiπi = 0, zj > u yj − xjb = 0, wj > q
γ2 cj + aiπi < 0, zj > u yj − xjb > 0, wj > q
If all elements of z fall into at least one of the classes α, β, or γ (the in-kilter classes),
then the current solution is optimal. This is also true for quantile regression, and
the exact number of class β elements is known to be p (at optimality), while the
number of class α elements lies within the closed interval from dqn− pe to bqnc.
The optimal basis for quantile regression is unique (assuming nondegeneracy), so for
any other basis in the set of feasible bases, at least one of the p elements in the basis
falls into one of two out-of-kilter classes: β1 or β2. The purpose of the algorithm
is to retain the in-kilter elements and gradually bring the out-of-kilter (infeasible)
elements into kilter [4].
There are several issues to address when considering how to extend the out-
of-kilter method to the class of QRMEPs, chief among them being the fact that the
properties discussed in Section 2.1 hold only for continuous data. Fulkerson [22]
developed the out-of-kilter algorithm to work only with integer, or rational, data.
Any extension to quantile regression models may require some initial transformation
of the data to make it integer or rational. Another option is to modify the steps of
2-45
![Page 69: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/69.jpg)
the algorithm such that it converges even with continuous data. Another issue is
the initial solution. This is theoretically unimportant, since even b = 0p yields a
primal feasible solution for any q, but beginning the algorithm at a primal feasible
solution that also satisfies the cardinality range property can decrease computation
time. A feasible solution that also satisfies (2.3) and (2.4) guarantees that no more
than p elements are out-of-kilter in dual space. However, any processing advantage
gained by starting at such a solution should be weighed against the computational
effort required to obtain it.
2.6 Summary
A review of the literature on quantile regression reveals many significant ad-
vancements that have been achieved in the field, particularly in the area of model
estimation. By introducing the concept of regression quantiles in 1978, Koenker
and Bassett extended the idea of order statistics in single-variable samples (loca-
tion models) to the broader class of linear models [32], and they introduced two
properties that follow from the special structure of the QRMEP. The exact-fit and
cardinality range properties, along with the KKT conditions, constitute the set of
necessary optimality conditions which are specific to the QRMEP. These proper-
ties also established three assumptions on the quantile regression model which must
hold to guarantee a unique vector b ∈ Rp of model coeffi cients: all data are continu-
ous, a nondegenerate solution exists, and the quantile regression model contains an
intercept.
A simple method of partitioning the design matrix X, one which distinguishes
only between basic (Xh ∈ Rp×p) and nonbasic(Xh ∈ R(n−p)×p) observations, is suf-
ficient for employing the exact-fit property to estimate the model parameters. Ex-
ploiting the cardinality range property, however, requires a more detailed partitioning
scheme that further decomposes the matrix of nonbasic observations into two ma-
trices: a matrix of nonbasic observations whose residuals are negative (Nv) and a
2-46
![Page 70: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/70.jpg)
matrix of nonbasic observations whose residuals are positive (Nu). The number of
rows in Nv must satisfy (2.3), while the number of rows in Nu must satisfy (2.4).
Gutenbrunner, et al. extended the idea of ranking sample observations to the
class of conditional quantile models by applying a transformation to (1.8). The
result is a the translated dual LP (2.10), whose solution a ∈ [0, 1]n is a vector of
regression rank scores. Two types of interior-point methods, primal path following
and primal-dual path following, proceed from (2.10) to solve the QRMEP. Affi ne
scaling, by contrast, uses the standard dual LP (1.8). Chen’s finite smoothing
algorithm [18] offers an alternative to interior-point methods, under certain model
and problem sizes.
Three types of pivoting algorithms have been developed for solving the QRMEP:
a primal method (Barrodale-Roberts), a primal-dual method (Koenker-d’Orey), and
a dual method (I-LP). The dual simplex method, contrary to its name, may be
considered a primal method in the context of the QRMEP because it solves (1.8)
by conducting line searches in primal space. Kostina [39] developed a long-step
variant of the dual simplex method by modifying the step size selection process.
The Barrodale-Roberts algorithm and I-LP were developed to solve a special case
of the QRMEP: the l1-approximation (conditional median). While Koenker and
d’Orey [33] extended the Barrodale-Roberts algorithm to all conditional quantiles,
I-LP has not yet been extended to the entire class of QRMEPs. The next chap-
ter discusses extending both I-LP and the long-step dual simplex algorithm to the
quantile regression model class of problems. Additionally, by continuing with the
concept presented in Section 2.5, two suboptimal integer programming formulations
of the QRMEP are also developed by extending the idea of expressing the problem
alternatively as an integer program.
2-47
![Page 71: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/71.jpg)
III. Methodology
This research explores extending three pivoting algorithms to the class of QRMEPs:
the simplex method for bounded variables, the LSDS method, and I-LP. The simplex
method for bounded variables could not be successfully extended, and Section 3.1
discusses why it could not be implemented. Section 3.2 details how I-LP can be
generalized to solve (1.8) for any quantile. A successful extension of the LSDS
method to the class of QRMEPs is provided in Section 3.3. Chapter 3 concludes by
developing two suboptimal formulations from (1.8) which resemble a familiar integer
programming problem.
3.1 Simplex Method for Bounded Variables
The Barrodale-Roberts algorithm was successfully extended by Koenker and
d’Orey [34] to be a primal pivoting algorithm capable of computing regression quan-
tiles for any q ∈ (0, 1). Proceeding from (1.4), each iteration of the Barrodale-
Roberts algorithm consists of estimating an element of the parameter vector b and
then pivoting to a new set of residual vectors u and v. It seems natural to explore
the effi cacy of a simplex-based procedure which proceeds instead from (1.8); that is, a
dual pivoting algorithm. Given the boundary constraints on w in (1.8), the simplex
method for bounded variables (bounded simplex) [4] is a logical starting point. It
turns out that bounded simplex, as described in [4], does not solve (1.8) for a simple
reason. The idea behind any simplex method, and bounded simplex is no exception,
is to pivot from one basic feasible solution to another until optimality is achieved
(dual feasibility is satisfied). If the three model assumptions established in Section
2.1 hold, then only one basic feasible solution exists in dual space for the QRMEP:
the optimal solution. Primal feasibility, on the other hand, is satisfied by any solu-
tion. Pivoting between basic feasible solutions in (1.8) is therefore impossible. It
can be shown that the standard bounded simplex algorithm does not converge to
3-1
![Page 72: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/72.jpg)
the optimal basis when applied to (1.8). The theory behind the bounded simplex
algorithm, in the context of the QRMEP, is presented first. A single iteration of the
bounded simplex method is then conducted, using a small sample extracted from
the Cars93 data set, to demonstrate how the algorithm fails to solve (1.8). One
way to demonstrate this result is by rewriting the objective function,
yTw =(yTv − yTb
(BT)−1
NTv
)wv +
(yTu − yTb
(BT)−1
NTu
)wu (3.1)
= (q − 1)∑j∈Rv
(cj − zj) + q∑j∈Ru
(cj − zj) ,
where Rv denotes the set of indices for the nonbasic variables at the lower bound
(q − 1) and Ru denotes the set of indices for the nonbasic variables at the upper
bound q. Each cj − zj corresponds to the raw residual for the jth observation.
In a bounded variable problem that does not possess the special structure of (1.8),
a feasible solution may exist where all nonbasic variables are fixed at one of the
two bounds, but the fact that w must span the null space of XT prevents such a
solution in the QRMEP. Furthermore, each cj − zj is a raw residual, so it follows
that cj − zj < 0 for all j ∈ Rv and cj − zj > 0 for all j ∈ Ru. For a maximization
problem, the stopping criteria for the bounded simplex algorithm are that cj−zj < 0
holds for all j ∈ Rv and cj − zj > 0 holds for all j ∈ Ru [4]. These conditions are
satisfied by any primal feasible solution to (1.4), so the bounded simplex algorithm
cannot be used to solve (1.8). If a pivoting algorithm is to be designed for solving
(1.8), then it must start with a dual infeasible solution and pivot towards optimality.
An iteration of the bounded simplex method begins by selecting a variable to
enter the basis. Let k be the index of the nonbasic variable selected to enter the
basis,
k = minj
{minj∈Rv{zj − cj} ,min
j∈Ru{cj − zj}
}. (3.2)
3-2
![Page 73: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/73.jpg)
For the QRMEP, k is the index of the smallest absolute nonzero residual. If k ∈ Rv,
then wk enters the basis by increasing from its current value of (q − 1). If k ∈ Ru,
then wk enters the basis by decreasing from its current value of q.
Suppose k ∈ Rv. Let ∆k be the amount by which wk is increased from (q − 1)
such that wk = q − 1 + ∆k. Substituting this into (2.6) and (2.7) yields
wb = −(BT)−1
NTvwv −
(BT)−1
NTuwu −
(BT)−1
xTkwk
= − (q − 1)(BT)−1
NTv 1v − q
(BT)−1
NTu1u − (q − 1 + ∆k)
(BT)−1
xTk
= (1− q)(BT)−1
NTv 1v − q
(BT)−1
NTu1u −∆ksk (3.3)
and
yTw =(yTv − yTb
(BT)−1
NTv
)wv +
(yTu − yTb
(BT)−1
NTu
)wu +
(yk − yTb sk
)wk
= (1− q)∑j∈Rv
(zj − cj) + q∑j∈Ru
(cj − zj) + ∆k (ck − zk)
= z + ∆k (ck − zk) , (3.4)
where xk is the kth row of the design matrix (kth observation) and sk =(BT)−1
xTk .
The increase ∆k can be blocked when one of the basic variables either drops to
(q − 1) or increases to q. Let γ1 = ∆k denote the value at which a basic variable
decreases to (q − 1). This increase is bounded above by
(q − 1)1p < wb (3.5)
(q − 1)1p < −(BT)−1
NTvwv −
(BT)−1
NTuwu −∆ksk
(q − 1)1p < d−∆ksk
∆ksk < d+ (1− q)1p.
If sk ≤ 0p, then ∆k can assume any nonnegative value without violating the in-
equality, so compute the following minimum ratio test only for positive elements of
3-3
![Page 74: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/74.jpg)
sk,
γ1 = min1≤j≤p
{dj + 1− q
sjk: sjk > 0
}(3.6)
=dr + 1− q
srk.
The index r identifies the candidate variable wr ∈ wb to become nonbasic at the
lower bound q − 1, and wk takes the place of wr in the basis.
Let γ2 = ∆k denote the value at which a basic variable increases to q, and this
increase is bounded below by
q1p > wb (3.7)
q1p > d−∆ksk
∆ksk > d− q1p.
If sk ≥ 0p, then ∆k can assume any nonnegative value without violating the in-
equality, so compute the following minimum ratio test only for negative elements of
sk,
γ2 = min1≤j≤p
{dj − qsjk
: sjk < 0
}(3.8)
=dr − qsrk
.
In this case, the index r identifies the candidate variable wr ∈ wb to become nonbasic
at the upper bound q, and wk becomes basic in the place of wr.
The value of ∆k is determined by the minimum amount wk can increase before
being blocked,
∆k = min {γ1, γ2} . (3.9)
3-4
![Page 75: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/75.jpg)
Once ∆k is obtained, a new solution can be computed. The nonbasic variable wk is
updated by wk = q − 1 + ∆k, and this result is substituted into (3.3) to update the
working basis vector.
Now suppose wk must decrease from the upper bound q (i.e., k ∈ Ru). Let ∆k
be the amount by which wk is decreased from q: wk = q−∆k. Substituting this into
(2.6) and (2.7) yields
wb = −(BT)−1
NTvwv −
(BT)−1
NTuwu −
(BT)−1
xTkwk
= − (q − 1)(BT)−1
NTv 1v − q
(BT)−1
NTu1u − (q −∆k)
(BT)−1
xTk
= (1− q)(BT)−1
NTv 1v − q
(BT)−1
NTu1u + ∆ksk (3.10)
and
yTw =(yTv − yTb
(BT)−1
NTv
)wv +
(yTu − yTb
(BT)−1
NTu
)wu +
(yk − yTb sk
)wk
= (1− q)∑j∈Rv
(zj − cj) + q∑j∈Ru
(cj − zj)−∆k (ck − zk)
= z −∆k (ck − zk) . (3.11)
Under the dual feasibility condition, the basic variables are strictly bounded below
by
(q − 1)1p < wb (3.12)
(q − 1)1p < −(BT)−1
NTvwv −
(BT)−1
NTuwu + ∆ksk
(q − 1)1p < d+ ∆ksk
(q − 1)1p − d < ∆ksk
3-5
![Page 76: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/76.jpg)
and strictly bounded above by
q1p > wb (3.13)
q1p > d+ ∆ksk
q1p − d > ∆ksk.
The equations for γ1 and γ2 also take different forms:
γ1 = min1≤j≤p
{q − 1− dj
sjk: sjk < 0
}(3.14)
=q − 1− dr
srk
and
γ2 = min1≤j≤p
{q − djsjk
: sjk > 0
}(3.15)
=q − drsrk
.
The value of ∆k is determined by (3.9), which is the minimum amount wk can
decrease before being blocked. The nonbasic variable wk is updated by wk = q−∆k,
and (3.10) updates the working basis vector.
Recall the Cars93 sample from Chapter 2. The following example executes
one iteration of the bounded simplex method [4] on a QRMEP, where q = 1/3 and
p = 2.
Example 5 Let x1 and x3 form the initial basis,
B =
1 63
1 96
.
3-6
![Page 77: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/77.jpg)
Use the exact-fit property to obtain b and generate r:
b =(−0.0455, 0.1182
)Tand
r =(
0, −4.8636, 0, 3.5364, 2.8091, 0.4545, 3.7, −1.5091)T.
There are two negative residuals and four positive residuals, which satisfies (2.3)
and (2.4). This leads to the initial basic vector wb =(−1.303, 1.303
)T, and both
elements w1 and w3 are clearly infeasible. The smallest absolute residual corresponds
to k = 6; k ∈ Ru, so the currently nonbasic variable w6 must be decreased by ∆6
such that w6 = 0.3333−∆6, and
s6 =(BT)−1
xT6 =(−0.5758, 1.5758
)T.
Since w1 is closer to the lower bound (q − 1) than w3 is to the upper bound q, it
may be falsely concluded that w1 should become nonbasic at (q − 1). Because γ1
is defined to be the value at which a basic variable drops to the lower bound, it can
only be computed for a basic variable whose current value is greater than (q − 1).
Similarly, γ2 is defined to be the value at which a basic variable increases to the
upper bound, so it can only be computed for a basic variable whose current value is
less than q. Therefore, by (3.14), (3.15), and (3.9):
γ1 =0.3333− 1− 1.303
−0.5758= 3.4211
γ2 =0.3333− (−1.303)
1.5758= 1.0385
∆6 = min {γ1, γ2} = 1.0385.
3-7
![Page 78: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/78.jpg)
It follows that w1 is indeed the blocking variable, so it becomes nonbasic at q. The
entering variable w6 is updated to be w6 = 0.3333 − 1.0385 = −0.7051, and (3.10)
updates the basic vector to wb =(−1.709, 2.4141
)T. The new basis and residual
vector, respectively, are
B =
x3
x6
=
1 96
1 115
r =
(0.7895, −5.6053, 0, 3.3211, 1.6368, 0, 2.6474, −3.7579
)T.
As expected, the residual vector confirms that w1 is nonbasic at q. The new basic
vector, if obtained using the exact-fit property and (2.6), is
wb =(−3.1754, 3.1754
)T6=(−1.709, 2.4141
)T.
Clearly, since(−3.1754, 3.1754
)T6=(−1.709, 2.4141
)T, the algorithm cannot
continue.
The example reveals another reason why the bounded simplex algorithm fails to
solve (1.8). If the current solution is not optimal, and the three model assumptions
hold, then dual feasibility is not yet satisfied and one of the following inequalities
is true for at least one element of wb: wj < (q − 1) or wj > q. For this reason,
the blocking variable tests (3.6), (3.8), (3.14), and (3.15) fail to update the working
basis such that it also satisfies the exact-fit property.
The bounded simplex method assumes that a finite number of basic feasible
solutions exist for the LP from which it proceeds. Under the three model assump-
tions established for the QRMEP in Chapter 2, only the optimal solution is basic
feasible in dual space. Thus, a pivoting method operating in the dual space of the
QRMEP must converge to the optimal basis by pivoting among infeasible solutions.
3-8
![Page 79: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/79.jpg)
3.2 Generalized Interval-Linear Programming
There exist four methods for solving the QRMEP, three of which are found in
most commercial quantile regression solvers [18]. The I-LP method, however, is not
among them because it was designed to solve only a special case of the QRMEP. This
section discusses extending the algorithm proposed by Robers and Ben-Israel [56]
such that regression quantiles for any q ∈ (0, 1) can be computed. The extension be-
gins by changing the bounds on the dual vector from w ∈ [−1, 1]n to w ∈ [q − 1, q]n.
Therefore, QRMEPs of the form
maxw∈[q−1,q]n
yTw (3.16)
subject to 0p
(q − 1)1n
≤XT
In
w ≤ 0p
q1n
can be solved by a method that can be called generalized interval-linear programming
(GILP). The GILP method solves a finite sequence of decompositions of (3.16), and
each decomposed problem is of the form
maxw∈[q−1,q]n
yTw (3.17)
subject to
d− ≤ Fw ≤ d+ (3.18)
g− ≤ hTw ≤ g+, (3.19)
where F ∈ Rn×n, h ∈ Rn×1, and F is nonsingular. As with I-LP, (3.18) is a set of n
constraints selected from the (n+ p) constraints in (3.16) such that F is invertible,
and (3.19) is a single constraint selected from the remaining p constraints. Since F
3-9
![Page 80: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/80.jpg)
is invertible, apply the transformation s = Fw so that (3.17) becomes
maxs∈Rn
yTF−1s (3.20)
subject to
d− ≤ s ≤ d+ (3.21)
g− ≤ hTF−1s ≤ g+. (3.22)
Letting w∗ denote the optimal solution to (3.17), it follows that w∗ = F−1s∗, where
s∗ is the optimal solution to (3.20). Therefore, solving (3.20) is equivalent to solving
(3.17).
As in [56], the GILP method begins by first solving the subproblem
maxs∈Rn
yTF−1s (3.23)
subject to
d− ≤ s ≤ d+.
Let s(t) be the maximizer of (3.23), where t ≥ 1 denotes the current iteration. If, in
addition, s(t) satisfies (3.22), then s(t) is also the maximizer of (3.20). To check for
feasibility in (3.16), the reverse transformation w(t) = F−1s(t) is applied, and w(t) is
substituted into each of the (p− 1) constraints removed from (3.16). If w(t) satisfies
all constraints in (3.16), then optimality has been achieved and the algorithm stops.
Suppose s(t) does not satisfy (3.22). Then, either hTF−1s(t) < g− or g+ <
hTF−1s(t). If the former holds, then there exists a solution to (3.20) such that
hTF−1s = g−, and if the latter holds, then there exists a solution to (3.20) such that
3-10
![Page 81: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/81.jpg)
hTF−1s = g+. Let ∆ denote the amount by which (3.22) is violated,
∆ =
hTF−1s(t) − g−, if hTF−1s(t) < 0
hTF−1s(t) − g+, if hTF−1s(t) > 0
. (3.24)
It follows that if hTF−1s(t) < g−, then ∆ < 0. Conversely, if hTF−1s(t) > g+, then
∆ > 0. Let(hTF−1
)jdenote the jth element of the vector hTF−1,
(yTF−1
)jdenote
the jth element of the vector yTF−1, and γj denote the marginal cost of changing the
jth element of s(t) [57]. Let Q be the set of indices identifying which elements of s(t)
are candidates to be changed in order to satisfy (3.22) while maintaining feasibility
in (3.21). Let m denote the cardinality of Q such that |Q| = m ≤ n, and
Q =
{j : 1 ≤ j ≤ n,
(hTF−1
)j6= 0, γj =
((yTF−1
)j
(hTF−1)jsgn ∆
)≥ 0
}.
Reorder the indices in Q such that
Q ={jk : γj1 ≤ γj2 ≤ · · · ≤ γjm
}.
One or more elements from the resulting set{s
(t)j : j = jk ∈ Q
}are altered until
all constraints in (3.20) are satisfied. For each jk ∈ Q, compute the distance δjk
from s(t)jkto its opposite boundary. In other words, each s(t)
jkmoves to its opposite
boundary, one at a time, until s(t) satisfies (3.22). These distances are determined
by
δjk =
d−jk − s(t)jk, if sgn ∆ = sgn
(hTF−1
)jk
d+jk− s(t)
jk, if sgn ∆ = − sgn
(hTF−1
)jk
.The step length δjk is equivalent to the direction of movement σ from the Koenker-
d’Orey algorithm. That is, the magnitude of δjk is equal to the length of the closed
interval [q − 1, q] (i.e., q− (q − 1) = 1), and its sign indicates in which direction s(t)jk
moves. One s(t)jkdoes not need to move the entire distance in order to satisfy (3.22).
3-11
![Page 82: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/82.jpg)
Let s(t)jrdenote this “entering”variable, whose index is determined by
jr = min
{jk ∈ Q :
∣∣∣∣∣r∑
k=1
δjk(hTF−1
)jk
∣∣∣∣∣ ≥ |∆|}.
The elements{s
(t)j1, s
(t)j2, . . . , s
(t)jr−1
}move to their respective opposing boundaries, and
the step length for s(t)jris computed as
θ =−∆−
∑r−1k=1 δjk
(hTF−1
)jk
(hTF−1)jr.
Therefore, the optimal solution to (3.20), and thus (3.17), is given by
w(t+1) = F−1
(s(t) +
r−1∑k=1
δjkejk + θejr
),
where ejk denotes an n-vector of zeros with a one in the jkth position. If w(t+1) also
satisfies dual feasibility (i.e., XTw(t+1) = 0p), then w(t+1) is the optimal solution to
(3.16). Otherwise, the jrth constraint in (3.18) is replaced by (3.19), and a new
g− ≤ hTw ≤ g+ is selected from among the (p− 1) constraints removed from (3.16)
that is not satisfied by w(t+1). Robers and Ben-Israel [57] recommend choosing the
constraint which w(t+1) violates by the greatest amount. Let t = t + 1, and begin
the next iteration.
The following example uses the Cars93 sample to demonstrate GILP for q =
1/5.
Example 6 Recall the sample extracted from the Cars93 data set [19]
y =(
7. 4, 10. 1, 11. 3, 15. 9, 19. 9, 14, 20. 2, 20. 9)T
X =
1 1 1 1 1 1 1 1
63 127 96 105 145 115 140 190
T ,
3-12
![Page 83: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/83.jpg)
where y is the vector of mean retail prices, and X2 is the vector of horsepower ratings
for all vehicle models sold by Ford Motor Company in 1993. Use GILP to solve (1.8)
for the first conditional quintile (q = 1/5). For t = 1, let
d− =
(−4
5
)18,
F = I8,
d+ =
(1
5
)18,
g− = g+ = 0,
h = X1 = 18.
Clearly, F−1 = I8, s(1) = Fw = w, and the optimal solution to (3.23) is s(1) =
(1/5)18, but hTF−1s(1) = 8/5 6= 0. Therefore,
∆ = g+ − hTF−1s(1) = −8
5,
hTF−1 = 1T8 ,
yTF−1 = yT ,
γj =yTj
(1T8 )j= yj.
Since y > 0, it follows that γj ≥ 0 for all 1 ≤ j ≤ 8, so
Q = {1, 2, 3, 6, 4, 5, 7, 8} ,
δjk ={d−jk − s
(1)jk, ∀jk ∈ Q
}= {−1,−1,−1,−1,−1,−1,−1,−1} .
Since∑2
k=1 δjk(hTF−1
)jk
= −2 < −8/5, the index of the element that does not take
the full −1 step is jr = j2 = 2. Thus,
θ =∆− δj1
(hTF−1
)j1
(hTF−1)j2= −3
5
3-13
![Page 84: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/84.jpg)
and
w(2) = F−1(s(1) + δj1ej1 + θej2
)= (−0.8,−0.4, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2)T .
Because XTw(2) = (0, 57)T , dual feasibility is not satisfied. The vector 1T8 replaces
the row (0, 1, 0, 0, 0, 0, 0, 0) in F, d−2 = d+2 = 0, and h = X2. For t = 2,
∆ = −57,
hTF−1 = (−64, 127,−31,−22, 18,−12, 13, 63) ,
yTF−1 = (−2.7, 10.1, 1.2, 5.8, 9.8, 3.9, 10.1, 10.8) ,
γj = {0.0422, 0.0795,−0.0387,−0.2636, 0.5444,−0.325, 0.7769, 0.1714} ,
Q = {1, 2, 8, 5, 7} ,
δjk = {1, 0,−1,−1,−1} ,
δj1(hTF−1
)1
= −64 < −57,
θ =57
64,
w(3) = F−1(s(2) + θej1
),
= (0.0906,−1.2906, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2)T ,
and w(3)2 is not dual feasible. Let XT
2 replace (1, 0, 0, 0, 0, 0, 0, 0) in F, d−1 = d+1 = 0,
g− = −4/5, g+ = 1/5,and h = (0, 1, 0, 0, 0, 0, 0, 0)T . The algorithm continues until
t = 6, where
w(7) = (−0.1528,−0.8, 0.2, 0.2, 0.2, 0.2, 0.2,−0.0472)T .
Since XTw(7) = 0p and w(7) ∈ [−0.8, 0.2]8, it is a dual feasible solution. Fur-
thermore, since each of the pair(w
(7)1 , w
(7)8
)lies strictly inside the open interval
(−0.8, 0.2), then by (2.2), w(7) is the optimal solution to (3.16).
3-14
![Page 85: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/85.jpg)
GILP is distinguished from the dual simplex algorithm, as well as its long-
step variant, because it operates exclusively in the dual space of the QRMEP. It
pivots among infeasible dual solutions until arriving at the optimal basis, rather than
switching to primal space and pivoting among primal feasible solutions.
3.3 Long-Step Dual Simplex (LSDS) Method
Kostina [39] proposed a long-step variant of the dual simplex algorithm to
solve general maximization problems with bounded variables. This research extends
the long-step dual simplex (LSDS) method to a specific class of bounded variable
problems: the QRMEP. The following is a detailed description of how the step size
selection procedure is modified in the context of the QRMEP.
Modifying the step size selection to take longer steps is analogous to how the
Barrodale-Roberts algorithm operates in the primal space. As with the short-step
dual simplex method, let the triplet λ = (b,u,v) denote a feasible solution to
(1.4), where r = y − Xb = u − v. Define search directions and a nonnegative
step size such that an improved feasible solution is identified by the triplet λ (σ) =
(b (σ) ,u (σ) ,v (σ)), where b (σ) = b+σ∆b, u (σ) = u+σ∆u, and v (σ) = v+σ∆v.
The equation φ (λ) = 0Tp b + q1Tuu + (1− q)1Tv v represents the objective function
value generated by the solution λ. Since(XTw
)T= 0Tp , the improved objective
function can be written as
φ (λ (σ)) = φ (λ+ σ∆λ) (3.25)
= φ (λ) + σ(0Tp ∆b+ q1Tu∆u+ (1− q)1Tv ∆v
)= φ (λ) + σ
(wTX∆b+ q1Tu∆u+ (1− q)1Tv ∆v
)= φ (λ) + σ
(q1Tu∆u+ (1− q)1Tv ∆v −∆rTw
).
3-15
![Page 86: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/86.jpg)
For either r or r (σ), its jth element is in exactly one of two possible states. That is,
either rj ≥ 0 or rj ≤ 0, and either rj (σ) ≥ 0 or rj (σ) ≤ 0. Therefore, the following
four cases are possible.
1. Let rj ≥ 0 and rj (σ) ≥ 0. Then,
uj = rj, uj (σ) = uj + σ∆uj = rj (σ)
vj = 0, vj (σ) = vj + σ∆vj = 0.
If wj ∈ wb in the current iteration and remains basic in the next iteration,
then rj = ∆rj = 0 and the jth element does not decrease the objective func-
tion value. Applying these substitutions to the jth element of the improved
objective function yields
σ (q∆uj + (1− q) ∆vj −∆rjwj) = σ (q∆rj −∆rjwj)
= σ∆rj (q − wj)
φ (λ (σ)) = qrj + σ∆rj (q − wj)
= (q − wj) rj (σ) + rjwj
= qrj + (q − wj) rj (σ)− (q − wj) rj.
If rj = 0 and rj (σ) > 0, then j = ik, w(ik)b > q, and wik leaves the basis
to become nonbasic at the upper bound. Since σ ≥ 0 and ∆rj > 0, namely
∆rj = 1, it follows that (q − wj) < 0 and the objective function value decreases.
If rj > 0 and rj (σ) = 0, then wj enters the basis from being nonbasic at the
upper bound. That is, the jth residual is driven to zero. Since wj = q, it
follows that φ (λ (σ)) = qrj, so ∆rj can be either positive or negative. The
same result occurs if rj > 0 and rj (σ) > 0.
3-16
![Page 87: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/87.jpg)
2. Let rj ≥ 0 and rj (σ) ≤ 0. Then,
uj = rj, uj (σ) = uj + σ∆uj = 0
vj = 0, vj (σ) = vj + σ∆vj = −rj (σ) .
Applying these substitutions to the jth element of the improved objective
function yields
σ (q∆uj + (1− q) ∆vj −∆rjwj) = −qrj + (q − 1) (rj + σ∆rj)− σ∆rjwj
= σ∆rj (q − 1− wj)− rj
φ (λ (σ)) = (q − 1) rj + σ∆rj (q − 1− wj)
= (q − 1− wj) rj (σ) + rjwj
= qrj + (q − 1− wj) rj (σ)− (q − wj) rj.
If rj = 0 and rj (σ) < 0, then j = ik, w(ik)b < (q − 1), and w(ik)
b leaves the basis
to become nonbasic at the lower bound. It follows that(q − 1− w(ik)
b
)> 0,
and the objective function value decreases only if ∆rik < 0, namely ∆rik = −1.
If rj > 0 and rj (σ) = 0, or if rj > 0 and rj (σ) < 0, then wj = q and
φ (λ (σ)) = (q − 1) rj − σ∆rj.
3. Let rj ≤ 0 and rj (σ) ≤ 0. Then,
uj = 0, uj (σ) = uj + σ∆uj = 0
vj = −rj, vj (σ) = vj + σ∆vj = −rj (σ) .
3-17
![Page 88: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/88.jpg)
Applying these substitutions to the jth element of the improved objective
function yields
σ (q∆uj + (1− q) ∆vj −∆rjwj) = σ ((q − 1) ∆rj −∆rjwj)
= σ∆rj (q − 1− wj)
φ (λ (σ)) = (q − 1) rj + σ∆rj (q − 1− wj)
= (q − 1− wj) rj (σ) + rjwj
= (q − 1) rj + (q − 1− wj) rj (σ)− (q − 1− wj) rj.
If rj = 0 and rj (σ) < 0, then j = ik, w(ik)b < (q − 1), and w(ik)
b leaves the basis
to become nonbasic at the lower bound. It follows that(q − 1− w(ik)
b
)> 0,
and the objective function value decreases only if ∆rik < 0, namely ∆rik = −1.
If rj < 0 and rj (σ) = 0, or if rj < 0 and rj (σ) < 0, then wj = (q − 1),
φ (λ (σ)) = (q − 1) rj = (q − 1) (rj (σ)− σ∆rj), and either∆rj > 0 or∆rj < 0.
4. Let rj ≤ 0 and rj (σ) ≥ 0. Then,
uj = 0, uj (σ) = uj + σ∆uj = rj (σ)
vj = −rj, vj (σ) = vj + σ∆vj = 0.
Applying these substitutions to the jth element of the improved objective
function yields
σ (q∆uj + (1− q) ∆vj −∆rjwj) = q (rj + σ∆rj) + (1− q) rj − σ∆rjwj
= rj + σ∆rj (q − wj)
φ (λ (σ)) = qrj + σ∆rj (q − wj)
= (q − wj) rj (σ) + rjwj
= (q − 1) rj + (q − wj) rj (σ)− (q − 1− wj) rj.
3-18
![Page 89: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/89.jpg)
If rj = 0 and rj (σ) > 0, then j = ik, w(ik)b > q, and w(ik)
b leaves the basis
to become nonbasic at the upper bound. Since σ ≥ 0 and ∆rj > 0, namely
∆rj = 1, it follows that(q − w(ik)
b
)< 0 and the objective function value
decreases. If rj < 0 and rj (σ) = 0, or if rj < 0 and rj (σ) > 0, then
wj = (q − 1) and φ (λ (σ)) = qrj + σ∆rj = (q − 1) rj + rj (σ).
By summing the results from the four cases, (3.25) can be rewritten as
φ (λ (σ)) =∑
rj(σ)≥0
(qrj + σ∆rj (q − wj)) +∑
rj(σ)≤0
((q − 1) rj + σ∆rj (q − 1− wj))
= φ (λ) + σ
∑rj≥0,rj(σ)≥0
∆rj (q − wj) +∑
rj≤0,rj(σ)≤0
∆rj (q − 1− wj)
+
∑rj≤0,rj(σ)≥0
((q − wj) rj (σ)− (q − 1− wj) rj)
+∑
rj≥0,rj(σ)≤0
((q − 1− wj) rj (σ)− (q − wj) rj) . (3.26)
The goal is to choose a σ ≥ 0 such that the amount by which φ (λ (σ)) decreases
is maximized. Whenever a basic variable leaves the basis, the value of φ (λ (σ)) is
guaranteed to decrease according to the rate [39]
dφ (λ (σ))
dσ=
∑rj=0,rj(σ)>0
∆rj (q − wj) +∑
rj=0,rj(σ)<0
∆rj (q − 1− wj) ,
which is equal to the magnitude of infeasibility of the jth dual variable. Since the
value of φ (λ (σ)) does not decrease whenever a nonbasic variable enters the basis,
additional reductions occur when rj and ∆rj have different signs, which can only be
guaranteed when rj and rj (σ) change signs. If wj = (q − 1), then let ∆wj denote
the amount by which wj changes for the next iteration such that wj + ∆wj = q, so
∆wj = 1. Conversely, when wj = q, then the value of φ (λ (σ)) is guaranteed to
decrease if wj + ∆wj = (q − 1), so ∆wj = −1. This result can be generalized in
3-19
![Page 90: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/90.jpg)
primal space by the amount
dφ (λ (σ))
dσ= −
∑∆rj 6=0
|∆rj| .
The equation (3.26) and the analyses of the four cases lead to the following modifi-
cation of Step 5 of the dual simplex algorithm:
σj =
−rj/∆rj, if rj∆rj < 0
0, if rj = 0,∆rj < 0, wj > q
0, if rj = 0,∆rj > 0, wj < (q − 1)
∞, o/w
. (3.27)
Unlike Step 5 of the standard algorithm, (3.27) computes finite step lengths for all
residuals with a nonzero search direction, and ∆rj = 0 only when rj = rj (σ) = 0.
The set of all σj is then sorted in ascending order such that σ(i)j ≤ σ
(i+1)h , and j 6= h.
Rather than selecting the maximum σ(i)j , the maximum reduction in φ (λ (σ)) is
achieved by computing the net distance remaining after w(i)j has traveled to its
opposite boundary [39],
κi = κi−1 +∣∣∣∆r(i)
j
∣∣∣ ,where κ0 =
(w
(ik)b − q + 1
)if w(ik)
b < (q − 1), and κ0 =(q − w(ik)
b
)if w(ik)
b > q.
Notice that κ0 < 0 in either case. As with GILP, the elements w(i)j move to their
opposite boundary, one at a time, until the cumulative sum of distances∑∣∣∣∆r(i)
j
∣∣∣meets or exceeds the amount κ0 by which w
(ik)b violates dual feasibility. Thus, the
σ(i)j corresponding to the first nonnegative κi is selected to be the step length. That
is,
σ(s)j =
{σ
(i)j : κi ≥ 0, κi−1 < 0
},
3-20
![Page 91: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/91.jpg)
and w(s)j enters the basis. The vector of model coeffi cients and the design matrix
partitions B and N are updated as before in the short-step dual simplex algorithm,
and the next iteration begins.
3.4 The QRMEP as an Integer Program
Simplex algorithms are designed to solve LPs possessing nonnegative vari-
ables [4], but the dual variables in (1.8) are unrestricted in sign. A nonnegativity
transformation can be applied, where w ∈ Rn is rewritten as the difference between
two nonnegative variables, w = w+ −w−. The bounds on w can also be rewritten
as
w− ≤ (q − 1)1n +w+ (3.28)
and
w+ ≤ q1n +w−. (3.29)
Since w+ and w− are both nonnegative by definition, w− must satisfy (3.28), even
when w+ is at its minimum value. Similarly, w+ must satisfy (3.29), even if w−
is at its minimum. This reasoning leads to upper bounds on the n-vectors w+ and
w−, respectively:
w+ ≤ q1n
w− ≤ (1− q)1n.
Thus, substituting w = w+ −w− into (1.8) transforms the dual LP into
maxw+,w−≥0n
yTw+ − yTw− (3.30)
3-21
![Page 92: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/92.jpg)
subject to
XTw+ = XTw−
w+ ≤ q1n
w− ≤ (1− q)1n.
At any extreme point in (3.30), a nonbasic w+j is fixed at exactly one of its bounds
(0 or q). Likewise, any nonbasic w−j is fixed at either 0 or (1− q).
Another transformation must be applied in order to express the QRMEP as
an integer program. Let zu and zv be n-vectors such that
z(j)u =
0 ≤ z(j)u ≤ 1, if w+
j 6= 0
0, otherwise
and
z(j)v =
0 ≤ z(j)v ≤ 1, if w−j 6= 0
0, otherwise
,which yields
w+ = qzu
and
w− = (1− q) zv.
In other words, w+ and w− can be expressed as convex combinations of zu and zv.
Additionally, the sum of all component vectors must equal the unit vector, so let
w+ = Tuzu and w− = Tvzv be the affi ne transformations [5] of the (n× 1) column
vectors zu and zv, respectively, where Tu = qIn, Tv = (1− q) In, and zu + zv = 1n.
Applying these substitutions into (3.30) yields
maxzu,zv∈[0,1]n
qyTzu + (1− q)yTzv (3.31)
3-22
![Page 93: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/93.jpg)
subject to
qXTzu = (1− q)XTzv
zu + zv = 1n.
It can be shown that redefining the component vectors zv and zu, rewriting the
boundary condition on (2.6), and applying the cardinality range property produces
a suboptimal formulation of the QRMEP; specifically, a variant of the generalized
assignment problem.
3.4.1 The Bounded Interval Generalized Assignment Problem. Cattrysse
and Van Wassenhove [15] described the generalized assignment problem (GAP) as a
cost minimizing assignment of n jobs to m workers. It can be equivalently described
as a value maximizing assignment of n jobs to m workers, as in [21]. Each job
must be assigned to exactly one worker, so let zij be a binary variable where zij = 1
indicates the jth job being assigned to the ith worker and zij = 0 otherwise. Let the
value of having the ith worker do the jth job be denoted by cij. Each worker has
capacity restrictions such that a single worker can only take on a linited number of
jobs, so let di denote the work capacity for the ith worker. The amount of resources
consumed when the jth job is performed by the ith worker is identified by aij. The
GAP [21] therefore assumes the form
maxzij∈{0,1}
m∑i=1
n∑j=1
cijzij (3.32)
s.t.
n∑j=1
aijzij ≤ di, for 1 ≤ i ≤ m
m∑i=1
zij = 1, for 1 ≤ j ≤ n
3-23
![Page 94: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/94.jpg)
The dual LP (1.8) can be used to approximate (3.32). Just as the optimal basis
is a unique p-subset, so are the optimal sets of positive and negative residuals. That
is, for any optimal solution, the resulting combination of nonbasic dual variables
constitutes a unique assignment. Consider each observation to be a job, and the
location of each observation relative to the regression hyperplane (above/below) to
be a worker. As with jobs in (3.32), each nonbasic observation in the QRMEP is
assigned to exactly one of two locations: above or below the regression hyperplane.
The inequalities (2.3) and (2.4) are equivalent to the work capacity restrictions in
(3.32). Let the response vector y be the vector of value coeffi cients. The residual
locations are asymmetrically weighted, so the weights q and (q − 1) are also applied
to the objective function. The amount of resource consumed by assigning the jth
observation to the ith location is unity, or aij = 1, for all i and j. Redefine zv from
(3.31) to be a binary n-vector, where z(j)v = 1 when the jth observation is assigned
below the hyperplane, and zero otherwise. Redefine zu to be a binary n-vector,
where z(j)u = 1 when the jth observation is assigned above the hyperplane, and zero
otherwise. A simple GAP formulation of the QRMEP therefore takes the form
maxzv ,zu∈{0,1}n
(q − 1)yTzv + qyTzu (3.33)
subject to
zv + zu = 1n
qn− p < zTv 1n < qn
(1− q)n− p < zTu1n < (1− q)n.
Notice that each location also possesses a lower bound, which is not necessarily
zero, on the number of nonbasic observations assigned to it. These result from the
assumption that the quantile regression model contains an intercept [32]. Thus,
3-24
![Page 95: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/95.jpg)
(3.33) is in the form of a bounded interval generalized assignment problem (BIGAP)
[58].
There are at least two issues with (3.33): the absence of the design matrixX in
the constraints and the requirement that all observations be assigned (zv + zu = 1n).
Without accounting for the independent variables, the solution to (3.33) is simply the
unconditional quantile (sample quantile) of the response vector y. The structure
of (1.8) must be examined such that additional constraints containing the design
matrix X may be added to (3.33). The basic observations constitute a p-subset
which defines the regression hyperplane, so z(j)v = z
(j)u = 0 must hold for any basic
observation. Therefore, zv+zu = 1n must be removed from (3.33) and replaced with
a constraint which guarantees that all nonbasic observations are assigned. Under
the assumption that a nondegenerate solution to the QRMEP exists, exactly (n− p)
nonbasic observations must be assigned, so (3.33) becomes
maxzv ,zu∈{0,1}n
(q − 1)yTzv + qyTzu (3.34)
subject to
qn− p < zTv 1n < qn
(1− q)n− p < zTu1n < (1− q)n
zTv 1n + zTu1n = n− p.
The optimal assignment from (3.34) does not correspond to the optimal solu-
tion to (1.8) because the optimal assignment assumes all basic variables are zero,
which is why the BIGAP is called a suboptimal formulation of the QRMEP. The so-
lution to the BIGAP can be improved by rewriting the boundary condition on (2.6).
Consider the QRMEP optimality condition (q − 1)1p < wb < q1p and express wb in
terms of the nonbasic observations. Multiplying through by the (p× p) basis matrix
3-25
![Page 96: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/96.jpg)
B leads to
(q − 1)BT1p < (1− q)NTv 1v − qNT
u1u < qBT1p. (3.35)
Using the assignment vectors zv, zu and the design matrix, (3.35) can be represented
equivalently by two inequalities:
(q − 1)XT (1n − zv − zu) < (1− q)XTzv − qXTzu
XTzu < (1− q)XT1n (3.36)
and
(1− q)XTzv − qXTzu < qXT (1n − zv − zu)
XTzv < qXT1n. (3.37)
Adding (3.36) and (3.37) to the constraint set in (3.34) further reduces the number
of feasible bases, and the BIGAP form of the QRMEP is
maxzv ,zu∈{0,1}n
(q − 1)yTzv + qyTzu (3.38)
subject to
qn− p < zTv 1n < qn
(1− q)n− p < zTu1n < (1− q)n
XTzv < qXT1n
XTzu < (1− q)XT1n
zTv 1n + zTu1n = n− p.
3.4.2 The Bounded Interval Knapsack Problem. If (3.38) is relaxed by
removing the requirement that the sum of assignments equal (n− p), then the result
3-26
![Page 97: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/97.jpg)
is the bounded interval knapsack problem (BIKP),
maxzv ,zu∈{0,1}n
(q − 1)yTzv + qyTzu (3.39)
subject to
qn− p < zTv 1n < qn
(1− q)n− p < zTu1n < (1− q)n
XTzv < qXT1n
XTzu < (1− q)XT1n.
For the same reason as (3.38), (3.39) is also considered a suboptimal formulation of
the QRMEP. The BIGAP and BIKP solutions may, however, be useful for providing
starting solutions to other exact algorithms, such as GILP or the LSDS method.
Each of the extensions presented in Sections 3.2 and 3.3 uses the vector w = q1n
as its initial solution, which is relatively close to optimality for small q and n. As
the problem size and/or the target quantile increases, reduced run times can be
achieved with pivoting algorithms through starting solutions which are much closer
to optimality. Since an initial basis can be derived from the solution to the BIKP,
the optimal assignment from (3.39) can be applied in conjunction with the exact-
fit property to generate an initial solution that is much closer to optimality than
w = q1n. The optimal assignment from (3.38), on the other hand, produces an
even closer solution because of the added requirement that exactly (n− p) nonbasic
observations be assigned.
3.5 Summary
This chapter presented detailed extensions to the class of QRMEPs: GILP
and the LSDS method. An application of the bounded simplex algorithm to the
QRMEP was also attempted, along with a demonstration of its failure to solve (1.8).
3-27
![Page 98: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/98.jpg)
Finally, by reconceptualizing the QRMEP as an integer program, the suboptimal
formulations BIGAP and BIKP were developed. The next chapter presents practical
implementations of GILP and the LSDS method, followed by comparative analyses
of their computational performance against two baseline algorithms, an interior-
point method and the dual simplex method, both of which are implemented in a
commercially available programming environment.
3-28
![Page 99: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/99.jpg)
IV. Implementation, Testing, and Numerical Results
This chapter compares the practical performances of GILP and the LSDS algorithm
against an interior-point method and the dual simplex method. An interior-point
method and the dual simplex algorithm were chosen as baselines against which the
extensions of GILP and the LSDS method were measured. Interior-point methods
represent the computational state-of-the-art for solving LPs, particularly for large-
scale problems, while simplex algorithms have been shown to perform best for small
to moderately sized problems [55]. Neither GILP nor the LSDS method, however,
has been evaluated against other algorithms, simplex or interior-point. Therefore,
in this research, it was deemed necessary to include both a simplex and an interior-
point method as part of the evaluation of the extensions of GILP and the LSDS
method. While typically not converging as quickly as interior-point methods, dual
simplex is also a preferred method for solving bounded LPs [39], and its most valuable
feature is that it, like all simplex methods, guarantees exact solutions [18] when the
nondegeneracy assumption holds. The LSDS method, being a long-step variant of
the dual simplex method, also possesses this characteristic [39]. GILP is a non-
simplex pivoting algorithm, yet it is shown in [57] to yield exact solutions as well.
4.1 Implementation
All experimentation with these four methods was conducted in the MATLAB
environment. MATLAB was chosen mainly for its programming simplicity, since any
user-generated MATLAB code implementing a LP algorithm often looks quite similar
to the theoretical linear algebra from which it was derived. Consequently, the time
required to develop such code is significantly shorter than that of other programming
environments [63]. MATLAB has a built-in LP solver, called linprog. The default
algorithm that linprog employs to solve a LP is an interior-point method, specifically
the predictor-corrector variant of the primal-dual path following method, a variant
4-1
![Page 100: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/100.jpg)
developed by Mehrotra [50] and later extended to the QRMEP class of problems by
Portnoy and Koenker [55]. The dual simplex algorithm is also available as an option
within the linprog function, thus allowing for experimentation with the two methods
best suited for solving large-scale linear programs [39].
4.2 Testing
All test data was derived from the Cars93 data set referenced in [19]. This
data set was compiled by Lock [41] and contains information on 93 vehicles for sale
in the US for the 1993 model year. A subset of the 26 variables in the data set
were retained for experimentation. This subset consisted of 16 continuous variables
in Cars93, such as pricing, fuel effi ciency, horsepower ratings, and engine size. The
remainder were discarded because they contained either discrete or categorical data,
or the variables were correlated with one or more variables included in the subset.
The mean sale price was selected as the response variable for all experiments. Chen’s
experiments in [18] provided guidance for the testing in this research. The algorithms
were tested across a broad range of quantiles: q = {0.05, 0.25, 0.5, 0.75, 0.95}. The
algorithms were tested also for three different model sizes: p = {3, 8, 15}. To obtain
the various sample sizes for each test, uniform random samples were extracted from
the data set. Sample sizes ranged from n = 50 to n = 850, in increments of 10
observations. For each triplet (n, p, q), the experiment was replicated 100 times.
The mean run time was computed for each triplet to assess the effi ciency of each
algorithm. The average number of iterations required was computed across all n for
each pair (p, q).
4.3 Numerical Results
The mean run times for each algorithm were plotted for each pair (p, q), re-
sulting in 15 total graphs, where each graph contains a performance curve for each
of three methods: the LSDS extension, the dual simplex algorithm, and the interior-
4-2
![Page 101: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/101.jpg)
Figure 4.1 p = 3. q = 0.05. LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
point method. GILP is evaluated separately because, once n > 50, it exhibited much
longer run times than the other three algorithms. Instead, the independent perfor-
mance of GILP was measured across the quantiles by plotting its mean run times
for each value of p. Therefore, each GILP graph contains one performance curve for
each quantile, resulting in three total plots. The average number of iterations for
each pair (p, q) were tabulated for comparison.
Figures 4.1-4.5 show the graphical results for the small model (p = 3).
Clearly, the location of the crossover point is dependent on the value of q;
that is, as the target quantile increases, the problem size n for which a baseline
method gains computational dominance decreases. Details on the crossover points
corresponding to Figures 4.1 - 4.5 are given in Table 4.1. For this research, a
crossover point (CP) was identified as the smallest problem size n at which the
mean run time (MRT) for a baseline method (dual simplex or interior-point) is less
than or equal to the MRT for the LSDS method. The run time range is given in
the form of a closed interval, where the lower bound is the minimum run time of 100
replications and the upper bound is the maximum run time of 100 replications. The
4-3
![Page 102: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/102.jpg)
Figure 4.2 (p, q) = (3, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
Figure 4.3 (p, q) = (3, 0.5). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
4-4
![Page 103: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/103.jpg)
Figure 4.4 (p, q) = (3, 0.75). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
Figure 4.5 (p, q) = (3, 0.95). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
4-5
![Page 104: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/104.jpg)
Table 4.1 Crossover Point Data for p = 3LSDS Interior-Point
q CP MRT Range MRT Range0.05 650 0.02039 [0.01816, 0.02477] 0.01933 [0.01853, 0.02413]0.25 510 0.01791 [0.01525, 0.02197] 0.01703 [0.01607, 0.01929]0.5 470 0.01506 [0.01279, 0.02100] 0.01483 [0.01432, 0.01432]0.75 440 0.01595 [0.01379, 0.01937] 0.01496 [0.01463, 0.01785]0.95 560 0.02295 [0.02007, 0.03188] 0.02293 [0.02169, 0.02880]
LSDS Dual Simplexq CP MRT Range MRT Range0.05 780 0.02715 [0.02516, 0.03410] 0.02666 [0.02590, 0.03771]0.25 650 0.02619 [0.02228, 0.03214] 0.02617 [0.02539, 0.02888]0.5 590 0.02814 [0.02499, 0.03567] 0.02594 [0.02518, 0.02927]0.75 570 0.02637 [0.02280, 0.03182] 0.02548 [0.02488, 0.02934]0.95 600 0.02782 [0.02298, 0.04019] 0.02699 [0.02553, 0.03801]
data in Table 4.1 also shows that the crossover point for the dual simplex method
was consistently higher than for the interior-point method, which was expected. The
crossover points are highest when q is lowest, and this is a consistent trend for all
model sizes. When the model size is increased to p = 8, a commensurate decrease
in the crossover point is observed in Figures 4.6 - 4.10. This trend continued for
p = 15, as evidenced by Figures 4.11 - 4.15. The crossover point data for p = 8
and p = 15 are summarized in Tables 4.2 and 4.3, respectively. The variances for
each set of 100 replications were also computed. Variances were very small for all
algorithms tested, with GILP exhibiting the largest variances (≤ 0.00281). Among
the other three methods, however, the variances were found to be less than 10−5,
confirming that the run times were consistent for each algorithm.
Figures 4.16 - 4.18 show the performance of GILP for each value of p. In each
of these figures, the run times for q = 0.05 are noticeably lower than those for the
other quantiles tested. This was expected as a result of the all positive slack initial
solution (i.e., w(0) = q1n) used in both GILP and the LSDS algorithm. Depending
on the size of the problem, w(0) = q1n is close to optimality for very low quantiles,
and the cardinality range property can be used to compute an upper bound on the
4-6
![Page 105: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/105.jpg)
Figure 4.6 (p, q) = (8, 0.05). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
Figure 4.7 (p, q) = (8, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
4-7
![Page 106: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/106.jpg)
Figure 4.8 (p, q) = (8, 0.5). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
Figure 4.9 (p, q) = (8, 0.75). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
4-8
![Page 107: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/107.jpg)
Figure 4.10 (p, q) = (8, 0.95). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
Figure 4.11 (p, q) = (15, 0.05). LSDS (solid), Dual Simplex (dashed), Interior-Point (dotted).
4-9
![Page 108: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/108.jpg)
Figure 4.12 (p, q) = (15, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-Point (dotted).
Figure 4.13 (p, q) = (15, 0.5). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).
4-10
![Page 109: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/109.jpg)
Figure 4.14 (p, q) = (15, 0.75). LSDS (solid), Dual Simplex (dashed), Interior-Point (dotted).
Figure 4.15 (p, q) = (15, 0.95). LSDS (solid), Dual Simplex (dashed), Interior-Point (dotted).
4-11
![Page 110: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/110.jpg)
Table 4.2 Crossover Point Data for p = 8LSDS Interior-Point
q CP MRT Range MRT Range0.05 400 0.01860 [0.00523, 0.00754] 0.01791 [0.01413, 0.01858]0.25 260 0.01665 [0.01404, 0.02085] 0.01522 [0.01490, 0.01797]0.5 190 0.01344 [0.01064, 0.01733] 0.01307 [0.01241, 0.01575]0.75 170 0.01458 [0.01155, 0.01818] 0.01335 [0.01251, 0.01514]0.95 210 0.01413 [0.01273, 0.01956] 0.01412 [0.01390, 0.01665]
LSDS Dual Simplexq CP MRT Range MRT Range0.05 500 0.03322 [0.02963, 0.04012] 0.02617 [0.02528, 0.04123]0.25 370 0.02565 [0.02310, 0.03251] 0.02492 [0.02436, 0.02648]0.5 370 0.02927 [0.02594, 0.04593] 0.02573 [0.02502, 0.02896]0.75 330 0.02687 [0.02247, 0.03309] 0.02580 [0.02500, 0.02775]0.95 350 0.02900 [0.02326, 0.03442] 0.02479 [0.02423, 0.02642]
Table 4.3 Crossover Point Data for p = 15LSDS Interior-Point
q CP MRT Range MRT Range0.05 390 0.02221 [0.02018, 0.02846] 0.02002 [0.01961, 0.02415]0.25 170 0.01636 [0.01437, 0.02218] 0.01434 [0.01401, 0.01797]0.5 110 0.01352 [0.01314, 0.01387] 0.01229 [0.01207, 0.01374]0.75 80 0.01109 [0.01101, 0.01168] 0.01091 [0.01067, 0.01346]0.95 100 0.01143 [0.01022, 0.01180] 0.01127 [0.01093, 0.01449]
LSDS Dual Simplexq CP MRT Range MRT Range0.05 410 0.02912 [0.02587, 0.03622] 0.02752 [0.02572, 0.03885]0.25 260 0.02710 [0.02137, 0.04234] 0.02560 [0.02457, 0.02749]0.5 160 0.02506 [0.02131, 0.03187] 0.02434 [0.02370, 0.02601]0.75 180 0.02754 [0.02348, 0.03619] 0.02443 [0.02397, 0.02589]0.95 170 0.02525 [0.02191, 0.03272] 0.02428 [0.02378, 0.02598]
4-12
![Page 111: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/111.jpg)
Figure 4.16 GILP for p = 3. 5th Quantile (magenta), 25th Quantile (blue), 50thQuantile (black), 75th Quantile (green), 95th Quantile (red).
Figure 4.17 GILP for p = 8. 5th Quantile (magenta), 25th Quantile (blue), 50thQuantile (black), 75th Quantile (green), 95th Quantile (red).
4-13
![Page 112: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/112.jpg)
Figure 4.18 GILP for p = 15. 5th Quantile (magenta), 25th Quantile (blue), 50thQuantile (black), 75th Quantile (green), 95th Quantile (red).
quantiles that can be considered “very low”(q < 1
2n
). It follows,when q = 1/20,
that GILP converges in less time than the higher quantiles because fewer pivots are
required to reach the optimal dual basis.
Now consider the average number of iterations each method requires to solve
the QRMEP, which are given in Table 4.4. Although the interior-point method
clearly performed best in terms of converging in the fewest number of iterations, the
performance differences among the pivoting algorithms are also interesting. GILP
and the LSDS method are competitive with both the dual simplex algorithm and the
interior-point method when the model size is small. As p increases, however, the dual
simplex algorithm appears to be superior, among the pivoting algorithms, in terms of
solving the QRMEP in the fewest iterations possible. The LSDS method converges,
on average, in fewer iterations than the dual simplex algorithm when q = 0.05
because fewer long steps in primal space are necessary. These advantages disappear,
however, as the model size increases because GILP and the LSDS algorithm must
estimate the entire dual basis through pivoting operations, whereas the dual simplex
4-14
![Page 113: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/113.jpg)
Table 4.4 Average Number of Iterationsp q GILP LSDS Dual Simplex Interior-Point3 0.05 9 8 8 113 0.25 10 11 10 113 0.5 10 12 11 103 0.75 11 12 12 113 0.95 12 12 9 158 0.05 22 17 21 128 0.25 25 17 23 128 0.5 27 27 25 108 0.75 25 30 22 118 0.95 31 31 27 1115 0.05 42 41 31 1115 0.25 46 55 37 1115 0.5 49 59 41 1115 0.75 49 61 39 1115 0.95 51 61 42 11
algorithm employs preprocessing to obtain a better starting solution before pivoting
operations begin.
Each decomposed problem in GILP generates an (n+ 1)-dimensional polytope,
which is obviously a lower dimension than the polytope in (1.8). It follows that the
total number of vertices for each decomposed problem in GILP is also less than that
of (1.8), implying that GILP should converge to optimality in fewer iterations than
dual simplex. As with the Barrodale-Roberts algorithm, the LSDS method does
not pivot among adjacent vertices as do classic simplex methods. The implication is
that because LSDS can “skip”adjacent vertices, it should also converge to optimality
in fewer iterations than dual simplex.
The differences between the theoretical implications of the pivoting methods
and their respective practical performances can be traced to implementation. Ad-
ditional procedures, such as preprocessing and steps for overcoming degeneracy, are
programmed into the interior-point method and dual simplex methods implemented
in MATLAB to reduce run times further. The GILP and LSDS implementations in
4-15
![Page 114: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/114.jpg)
this research, on the other hand, are simple in the sense that they are coded exactly
according to their theoretical descriptions in Chapter 3. That is, nondegeneracy is
assumed and no preproccessing is conducted to reduce the size of the polytope and
gain improvements in run times.
4.4 Preprocessing
The goal of preprocessing in LP is to reduce the dimensions of the problem, thus
allowing the chosen algorithm to converge faster. A standard preprocessing strategy
is presented in [9], but Portnoy and Koenker [55] insist that the special structure
of the QRMEP is not conducive to such a standard approach. In response, they
propose an alternative strategy designed exclusively for the QRMEP and specifically
in conjunction with the predictor-corrector variant of the primal-dual path following
algorithm. The following is a brief description of this alternative strategy. Com-
prehensive discussions on this method can be found in [55] and [38].
Consider the optimality condition from Theorem 1, which may be expressed as
the directional derivative of the objective function [38]. LettingR (b) =∑n
j=1 ρq (yj − xjb),
the derivative in direction dk is computed by
∇R (b,dk) = −n∑j=1
ψq
(yj − xjb, −xjdk
)xjdk,
where dk is the kth column of B−1, 1 ≤ k ≤ p, and
ψq
(yj − xjb, −xjdk
)=
q − I (rj < 0) , if rj 6= 0
q − I (−xjdk < 0) , if rj = 0
.
4-16
![Page 115: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/115.jpg)
Since xjdk = 0 for j 6= k and xjdk = 1 for j = k, then ∇R (b,dk) can be written in
dual space as
∇R (b,dk) =
wj + 1− q, if wj ∈ [q − 1, 0)
q − wj, if wj ∈ (0, q]
.Therefore, a solution is optimal if ∇R (b,dk) ≥ 0 for all directions k = 1, . . . , p.
This alternative statement of optimality provides a means by which the number
of constraints in (1.8) may be reduced. Preprocessing seeks to identify a subset
of observations which are guaranteed to lie above/below the optimal regression hy-
perplane. Once identified, these observations are globbed [55], thus reducing the
dimensionality of (1.8). Let JL, JH be indexing subsets of observations whose dual
variables are anticipated to be nonbasic at (q − 1) and q, respectively. The objective
function can therefore be rewritten such that
minb∈Rp
∑j∈S\(JL∪JH)
ρq (yj − xjb) + (1− q) (yL − xLb) + q (yH − xHb) ,
where S is the indexing set of all observations, xL =∑
j∈JL xj, and xH =∑
j∈JH xj.
Note that yL must be made small enough and yH made large enough to guaran-
tee that the residuals in these globs are negative and positive, respectively. The
challenge is determining which observations are included in JL and JH . Let M
be a subsample of m observations. Obtain an initial estimate of b by solving the
QRMEP for the subsample M only, and compute a confidence interval around the
solution. Compute a confidence band of the form [XbL,XbU ], where bL is the lower
confidence estimate of b and bU is the upper confidence estimate of b. Provided the
value ofm is appropriately chosen, the setM contains the indices of the observations
falling inside the confidence band. Therefore, JL and JH should contain the indices
of the observations falling outside the confidence band, and two globbed observa-
tions, (yL, xL) and (yH , xH), are constructed. A new estimate of b is then obtained
4-17
![Page 116: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/116.jpg)
by solving the globbed LP, which now consists of (m+ 2) observations. If the signs
of the residuals of the observations in the globs match their assignment to JL or JH ,
then the procedure terminates and returns the optimal solution. Otherwise, the
procedure is repeated after adjusting the composition of the globs and updating M .
Portnoy and Koenker [55] show significant run time improvements over the
Barrodale-Roberts algorithm, at least 10 times better, when preprocessing is applied
to interior-point methods. The improvement is even greater for large problems, since
the run time curves for the BR algorithm in [55] increase quadratically, much like
the run time curves for GILP and LSDS in this research. Chen’s results in [18] also
demonstrated marked run time improvement with preprocessing. Time limitations
unfortunately prevented the application of preprocessing to either GILP or LSDS in
this research.
4.5 Summary
The experimental results in this chapter demonstrate the computational ad-
vantage that the LSDS method has over interior-point methods and the dual simplex
method, for models and problems up to a certain size. The location of the crossover
point, the point at which either an interior-point algorithm or the dual simplex
method gains computational dominance, was also shown to be dependent on the
target quantile. GILP, on the other hand, only exhibited faster average run times
than the baseline algorithms for very small problems. However, GILP was shown
to converge in fewer iterations than the LSDS algorithm in most cases, particularly
as the size of the model increased. With a view to increase the crossover point
locations by improving the respective computational performances of GILP and the
LSDS method, the details of a preprocessing strategy designed specifically for the
QRMEP [55] were also presented. The next chapter summarizes the efforts in this
research, states how this research contributes to the field of Operations Research,
and suggests topics for future research into the QRMEP.
4-18
![Page 117: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/117.jpg)
V. Conclusion
Quantile regression is becoming increasingly popular as an alternative to least squares
for describing the conditional distribution of a response variable. Rather than using
a single conditional mean model to make inferences about a distribution, researchers
can use one or more conditional quantile models and provide a more complete picture
of the same distribution. The first two hundred years of research into LAD models
saw limited progress [11], but the advent of LP in the late 1940s paved the way
for Koenker and Bassett [32] to define the regression quantile. Since 1978, quantile
regression has been employed extensively in econometrics, and it is quickly becoming
a popular application in the finance, medical, and environmental industries [17].
Simplex algorithms and interior-point methods are supported by advanced
computing power and have become the standard methods by which the QRMEP
is solved, but the speed of these algorithms is negatively affected when large-scale
problems are encountered, so many researchers avoid the headache and abandon
quantile regression in favor of OLS. Because of the special structure of the QRMEP,
the dual simplex method is the standard simplex algorithm best suited for solving it.
More effi cient pivoting algorithms which leverage the unique properties and exploit
the special structure of the QRMEP have been developed, namely the Barrodale-
Roberts and Koenker-d’Orey methods. Interior-point methods, on the other hand,
stand as the most conputationally effi cient QRMEP solution techniques by far, par-
ticularly in terms of run time and iterations required.
Affi ne scaling, primal path following, and primal-dual path following algo-
rithms have been developed specifically for the QRMEP, but it is the predictor-
corrector variant of the primal-dual path following algorithm that is most popular
with commercial solvers. Experimentation has shown that pivoting methods are
preferred when the problem is small to moderately sized [18], with the added bonus
that they guarantee exact solutions.
5-1
![Page 118: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/118.jpg)
This research was interested in finding alternative means of solving the QRMEP
such that large problems can be solved with run times either comparable or supe-
rior to those of dual simplex and interior-point methods. Two alternative pivoting
algorithms were explored in detail: GILP and the LSDS method. I-LP was first
developed by Robers and Ben-Israel [56] to solve the l1-approximation (conditional
median) problem. Extending the algorithm to any q ∈ (0, 1) required few modifi-
cations to the overall method, but it exhibited the slowest run times compared to
the dual simplex, interior-point, and LSDS methods. It is, however, distinct from
other pivoting methods because it operates exclusively in dual space by pivoting
between dual infeasible solutions. The LSDS method was developed for general
bounded-variable maximization LPs with equality constraints [39], so extending it
to the quantile regression model class of problems was straightforward. Unlike
GILP, the LSDS algorithm operates in the primal space of the QRMEP and skips
over adjacent vertices by taking longer steps. Since search directions and step
lengths are computed at each iteration, LSDS can also be considered a line search
method. For small sample sizes, the LSDS algorithm was the fastest performing
method tested. On average, its run times were around 1 to 2 times faster than the
interior-point method, 2 to 4 times faster than the dual simplex algorithm, and 8
to 12 times faster than GILP. The dual simplex and interior-point methods, how-
ever, quickly outpaced the LSDS algorithm in run time as sample sizes increased.
The dual simplex and interior-point methods are two algorithms available in the
MATLAB environment, while the code for GILP and the LSDS algorithm must be
user-generated. The respective implementations of GILP and the LSDS method in
this research did not include a preprocessing strategy to reduce the dimensionality
of the problem. Both coding optimization and preprocessing are therefore necessary
to gain further run time improvements.
5-2
![Page 119: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/119.jpg)
5.1 Contributions
This research contributes the following to the Operations Research field.
1. Extensions of Two Alternative Pivtoting Algorithms to the Class of Quantile
Regression Model Estimation Problems. The primary contributions of this re-
search are the extensions of two pivoting algorithms to the class of QRMEPs.
Interior-point methods are generally more effi cient computationally, but they
do not exhibit the same solution accuracy as that of simplex methods. GILP
and the LSDS algorithm possess two of the essential features identified in
Section 1.3: each produces exact solutions and solves the QRMEP for any
q ∈ (0, 1).
2. Development of Suboptimal Integer Programming Formulations of the Quantile
Regression Model Estimation Problem. Another theoretical contribution of
this research is expressing the QRMEP as an integer program. The QRMEP
is reconceptualized as a generalized assignment problem, and nonnegativity
and affi ne transformations are applied to (1.8). Applying the cardinality range
property to the affi ne form (3.31), followed by rewriting the boundary condition
(3.35), produces the BIGAP. The BIKP is easily formed by eliminating the
requirement from the BIGAP that exactly (n− p) nonbasic observations must
be assigned. Because neither formulation accounts for the values of the basic
variables, the BIGAP and BIKP were shown to be suboptimal for solving the
QRMEP. However, they may be employed to obtain starting solutions for
other algorithms in order to decrease processing times.
3. Algorithmic Implementation and Testing. The secondary contributions of this
research are the implementations of the GILP and LSDS algorithms, which
were tested on the Cars93 data set obtained from the literature, and the sub-
sequent experimental results. Uniform random samples were extracted from
Cars93 to produce QRMEPs with sample sizes ranging from n = 50 to n = 850
5-3
![Page 120: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/120.jpg)
and model sizes from p = 3 to p = 15. The LSDS algorithm was shown to
converge to optimality faster, in terms of run time, than other simplex meth-
ods for problems and/or quantile regression models up to a certain size. The
LSDS method even performed well, for small problems and models, against
interior-point methods. However, the locations of the crossover points for the
LSDS method were lower than expected. GILP exhibited comparatively slow
run times, but it was shown in most cases to converge in fewer iterations than
the LSDS algorithm. Optimized coding coupled with preprocessing can reduce
the run times for GILP and the LSDS method, increase the crossover points,
and possibly produce run times to rival those of interior-point methods.
5.2 Future Research
The following sections suggest topics for future research into the QRMEP.
5.2.1 Preprocessing. The results from this research indicate the need for
preprocessing when implementing either GILP or LSDS. The procedure in [55] was
developed explicitly for the QRMEP, and it exploits the primal space properties of
(1.4). Since GILP operates exclusively in dual space, a preprocessing strategy that
takes advantage of the features in (1.8) may achieve reductions in the run times
of the algorithm such that it becomes competitive with simplex methods. The
LSDS algorithm, on the other hand, proceeds from (1.8) but conducts line searches
in primal space, so the preprocessing strategy in [55] follows naturally. However,
determining the effects of preprocessing on the computational effi ciency of the LSDS
method is also recommended.
5.2.2 Integer Programming Alternatives. This research has shown that the
QRMEP can be reconceptualized in the form of a generalized assignment problem.
Future research on the BIGAP formulation should focus primarily on resolving the
issues identified with (3.38). If the QRMEP is to be expressed as an integer pro-
5-4
![Page 121: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/121.jpg)
gram successfully, whether as a BIGAP or a BIKP, then a means of accounting for
the values of the basic variables at optimality must be developed. On the other
hand, because the BIGAP and BIKP are suboptimal formulations of the QRMEP,
future research may also involve studying the effects of using either (3.38) or (3.39)
to generate starting solutions for GILP or the LSDS algorithm. It is recommended
that (3.38) or (3.39), together with preprocessing, be applied to the GILP and LSDS
extensions in order to determine if further improvements on run times can be ob-
tained. Developing a variant of the out-of-kilter method exclusively for the QRMEP
is also suggested.
5-5
![Page 122: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/122.jpg)
Bibliography
1. I. Barrodale and F. Roberts, “An Improved Algorithm for Discrete l1 LinearApproximation”, SIAM Journal on Numerical Analysis, Vol. 10, No. 5, pp. 839-848, 1973.
2. G. Bassett, “A p-subset Property of L1 and Regression Quantile Estimates”,Computational Statistics & Data Analysis, Vol. 6, No. 3, pp. 297-304, 1988.
3. D. Batur and F. Choobineh, “A Quantile-Based Approach to System Selection”,European Journal of Operational Research, Vol. 202, No. 3, pp. 764-772, 2009.
4. M. Bazaraa, J. Jarvis and H. Sherali, Linear Programming and Network Flows,John Wiley & Sons, Inc., Hoboken, NJ, 2005.
5. M. Bazaraa, H. Sherali and C. Shetty, Nonlinear Programming: Theory andAlgorithms, John Wiley & Sons, Inc., Hoboken, NJ, 2006.
6. A. Ben-Israel and P. Robers, “A Decomposition Method for Interval LinearProgramming”, Management Science, Vol. 16, No. 5, pp.374-387, 1970.
7. D. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.
8. D. Bertsimas and J. Tsitsiklis, Introduction to Linear Optimization, Athena Sci-entific, Belmont, MA, 1997.
9. R. Bixby, J. Gregory, I. Lustig, R. Marsten and D. Shanno, “Very Large-ScaleLinear Programming: A Case Study in Combining Interior Point and SimplexMethods”, Operations Research, Vol. 40, No. 5, pp. 885-897, 1992.
10. M. Buchinsky, “The Dynamics of Changes in the Female Wage Distribution inthe USA: A Quantile Regression Approach”, Journal of Applied Econometrics,Vol. 13, No. 1, pp. 1-30, 1998.
11. M. Buchinsky, “Recent Advances in Quantile Regression Models: A PracticalGuideline for Empirical Research”, Journal of Human Resources, pp. 88-126,1998.
12. B. Cade, J. Terrell and R. Schroeder, “Estimating Effects of Limiting Factorswith Regression Quantiles”, Ecology, Vol. 80, No. 1, pp. 311-323, 1999.
13. B. Cade and B. Noon, “A Gentle Introduction to Quantile Regression for Ecol-ogists”, Frontiers in Ecology and the Environment, Vol. 1, No. 8, pp. 412-420,2003.
14. B. Cade, B. Noon and C. Flather, “Quantile Regression Reveals Hidden Biasand Uncertainty in Habitat Models”, Ecology, Vol. 86, No. 3, pp. 786-800, 2005.
BIB-1
![Page 123: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/123.jpg)
15. D. Cattrysse and L. Van Wassenhove, “A Survey of Algorithms for the General-ized Assignment Problem”, European Journal of Operational Research, Vol. 60,No. 3, pp. 260-272, 1992.
16. K-H. Chang, “A Direct Search Method for Unconstrained Quantile-based Simu-lation Optimization”, European Journal of Operational Research, Vol. 246, No.2, pp. 487-495, 2015.
17. C. Chen and Y. Wei, “Computational Issues for Quantile Regression”, The In-dian Journal of Statistics, Vol. 67, No. 2, pp. 399-417, 2005.
18. C. Chen, “A Finite Smoothing Algorithm for Quantile Regression”, Journal ofComputational and Graphical Statistics, Vol. 16, No. 1, pp. 136-164, 2007.
19. C. Davino, M. Furno and D. Vistocco, Quantile Regression: Theory and Appli-cations, John Wiley & Sons, Inc., New York, NY, 2014.
20. E. Eide and M. Showalter, “The Effect of School Quality on Student Perfor-mance: A Quantile Regression Approach”, Economics Letters, Vol. 58, No. 3,pp. 345-350, 1998.
21. M. Fisher, R. Jaikumar and L. Van Wassenhove, “A Multiplier AdjustmentMethod for the Generalized Assignment Problem”, Management Science, Vol.32, No. 9, pp. 1095-1103, 1986.
22. D. Fulkerson, “An Out-of-Kilter Method for Minimal-Cost Flow Problems”,Journal of the Society for Industrial and Applied Mathematics, Vol. 9, No. 1,pp. 18-27, 1961.
23. J. Garcia, P. Hernandez and A. Lopez-Nicolas, “How Wide is the Gap? AnInvestigation of Gender Wage Differences Using Quantile Regression”, EmpiricalEconomics, Vol. 26, No. 1, pp. 149-167, 2001.
24. P. Gill, W. Murray, M. Saunders, J. Tomlin and M. Wright, “On Projected New-ton Barrier Methods for Linear Programming and an Equivalence to Karmarkar’sProjective Method”, Mathematical Programming, Vol. 36, No. 2, pp.183-209,1986.
25. C. Gutenbrunner and J. Jureckova, “Regression Rank Scores and RegressionQuantiles”, The Annals of Statistics, pp. 305-330, 1992.
26. C. Gutenbrunner, J. Jureckova, R. Koenker, and S. Portnoy, “Tests of Lin-ear Hypotheses Based on Regression Rank Scores”, Journal of NonparametricStatistics, Vol. 2, No. 4, pp. 307-331, 1993.
27. L. Hall and R. Vanderbei, “Two-thirds is Sharp for Affi ne Scaling”, OperationsResearch Letters, Vol. 13, No. 4, pp. 197-201, 1993.
28. L. Hao and D. Naiman, Quantile Regression, No. 149, Sage Publications, Inc.,Thousand Oaks, CA, 2007.
BIB-2
![Page 124: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/124.jpg)
29. R. Jackson, P. Boggs, S. Nash and S. Powell, “Guidelines for Reporting Results ofComputational Experiments. Report of the Ad Hoc Committee”, MathematicalProgramming, Vol. 49, No. 1, pp. 413-425, 1990.
30. L. Jaeckel, “Estimating Regression Coeffi cients by Minimizing the Dispersionof the Residuals”, The Annals of Mathematical Statistics, Vol. 43, No. 5, pp.1449-1458, 1972.
31. A. Koberstein, “Progress in the Dual Simplex Algorithm for Solving Large ScaleLP Problems: Techniques for a Fast and Stable Implementation”, ComputationalOptimization and Applications, Vol. 41, No. 2, pp. 185-204, 2008.
32. R. Koenker and G. Bassett, “Regression Quantiles”, Econometrica: Journal ofthe Econometric Society, Vol. 46, No. 1, pp. 33-50, 1978.
33. R. Koenker and V. d’Orey, “Algorithm AS 229: Computing Regression Quan-tiles”, Journal of the Royal Statistical Society, Vol. 36, No. 3, pp. 383-393, 1987.
34. R. Koenker and V. d’Orey, “A Remark on Algorithm AS 229: Computing DualRegression Quantiles and Regression Rank Scores”, Journal of the Royal Statis-tical Society, Vol. 43, No. 2, pp. 410-414, 1994.
35. R. Koenker and B. Park, “An Interior Point Algorithm for Nonlinear QuantileRegression”, Journal of Econometrics, Vol. 71, No. 1, pp. 265-283, 1996.
36. R. Koenker and O. Geling, “Reappraising Medfly Longevity: A Quantile Regres-sion Survival Analysis”, Journal of the American Statistical Association, Vol. 96,No. 454, pp. 458-468, 2001.
37. R. Koenker and K. Hallock, “Quantile Regression”, Journal of Economic Per-spectives, Vol. 15, No. 4, pp. 143-156, 2001.
38. R. Koenker, Quantile Regression, No. 38, Cambridge University Press, Cam-bridge, UK, 2005.
39. E. Kostina, “The Long Step Rule in the Bounded-Variable Dual SimplexMethod:Numerical Experiments”,Mathematical Methods of Operations Research, Vol. 55,No. 3, pp. 413-429, 2002.
40. Y. Li and J. Zhu, “L1-Norm Quantile Regression”, Journal of Computationaland Graphical Statistics, 2012.
41. R. Lock, “1993 New Car Data”, Journal of Statistics Education, Vol. 1, No. 1,1993.
42. I. Lustig, R. Marsten and D. Shanno, “On Implementing Mehrotra’s Predictor-Corrector Interior-Point Method for Linear Programming”, SIAM Journal onOptimization, Vol. 2, No. 3, pp. 435-449, 1992.
BIB-3
![Page 125: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/125.jpg)
43. I. Lustig, R. Marsten and D. Shanno, “Interior Point Methods for Linear Pro-gramming: Computational State of the Art”, ORSA Journal on Computing, Vol.4, No. 1, pp. 1-14, 1994.
44. J. Machado and J. Mata, “Earning Functions in Portugal 1982—1994: EvidenceFrom Quantile Regressions”, Empirical Economics, Vol. 26, No. 1, pp. 115-134,2001.
45. J. Machado and J. Mata, “Counterfactual Decomposition of Changes in WageDistributions Using Quantile Regression”, Journal of Applied Econometrics, Vol.20, No. 4, pp. 445-465, 2005.
46. P. Martins and P. Pereira, “Does Education Reduce Wage Inequality? QuantileRegression Evidence From 16 Countries”, Labour Economics, Vol. 11, No. 3, pp.355-371, 2004.
47. J. Mata and J. Machado, “Firm Start-Up Size: A Conditional Quantile Ap-proach”, European Economic Review, Vol. 40, No. 6, pp. 1305-1323, 1996.
48. K. McShane, C. Monma and D. Shanno, “An Implementation of a Primal-DualInterior Point Method for Linear Programming”, ORSA Journal on Computing,Vol. 1, No. 2, pp. 70-83, 1989.
49. S. Mehrotra, “On Finding a Vertex Solution Using Interior Point Methods”,Linear Algebra and Its Applications, Vol. 152, pp. 233-253, 1991.
50. S. Mehrotra, “On the Implementation of a Primal-Dual Interior Point Method”,SIAM Journal on Optimization, Vol. 2, No.4, pp.575-601, 1992.
51. L. Meligkotsidou, I. Vrontos and S. Vrontos, “Quantile Regression Analysis ofHedge Fund Strategies”, Journal of Empirical Finance, Vol. 16, No. 2, pp. 264-279, 2009.
52. B. Melly, “Public—Private Sector Wage Differentials in Germany: Evidence FromQuantile Regression”, Empirical Economics, Vol. 30, No. 2, pp. 505-520, 2005.
53. R. Mueller, “Public—Private Sector Wage Differentials in Canada: EvidenceFrom Quantile Regressions”, Economics Letters, Vol. 60, No. 2, pp. 229-235,1998.
54. J. Nocedal and S. Wright, Numerical Optimization, Springer-Verlag New York,Inc., New York, NY, 1999.
55. S. Portnoy and R. Koenker, “The Gaussian Hare and the Laplacian Tortoise:Computability of Squared-Error Versus Absolute-Error Estimators”, StatisticalScience, Vol. 12, No. 4, pp. 279-300, 1997.
56. P. Robers and A. Ben-Israel, “An Interval Programming Algorithm for DiscreteLinear L1 Approximation Problems”, Journal of Approximation Theory, Vol. 2,No. 4, pp. 323-336, 1969.
BIB-4
![Page 126: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/126.jpg)
57. P. Robers and A. Ben-Israel, “A Suboptimization Method for Interval LinearProgramming: A New Method for Linear Programming”, Linear Algebra andIts Applications, Vol. 3, No. 3, pp. 383-405, 1970.
58. G. Ross, R. Soland and A. Zoltners, “A Note on the Bounded Interval General-ized Assignment Problem”, Research Report CCS 253, DTIC, 1976.
59. D. Stifel and S. Averett, “Childhood Overweight in the United States: A Quan-tile Regression Approach”, Economics & Human Biology, Vol. 7, No. 3, pp.387-397, 2009.
60. R. Vanderbei, M. Meketon, and B. Freedman, “A Modification of Karmarkar’sLinear Programming Algorithm”, Algorithmica, Vol. 1, No. 1-4, pp. 395-407,1986.
61. R. Vanderbei, Linear Programming, Springer-Verlag New York, Inc., New York,NY, 2015.
62. S. Vaz, C. Martin, P. Eastwood, B. Ernande, A. Carpentier, G. Meaden and F.Coppin, “Modelling Species Distributions Using Regression Quantiles”, Journalof Applied Ecology, Vol. 45, No. 1, pp. 204-217, 2008.
63. Y. Zhang, “Solving Large-Scale Linear Programs by Interior-Point Methods Un-der the Matlab* Environment”, Optimization Methods and Software, Vol. 10,No. 1, pp. 1-31, 1998.
BIB-5
![Page 127: AIR FORCE INSTITUTE OF TECHNOLOGYquantiles, is shown to be the solution to a parametric minimization problem. It can also be shown that the same model parameters are obtainable by](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fba2df39277442dc72084e9/html5/thumbnails/127.jpg)
REPORT DOCUMENTATION PAGE Form ApprovedOMB No. 0704–0188
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, includingsuggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704–0188), 1215 Jefferson Davis Highway,Suite 1204, Arlington, VA 22202–4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collectionof information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD–MM–YYYY) 2. REPORT TYPE 3. DATES COVERED (From — To)
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
6. AUTHOR(S)
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORTNUMBER
9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORTNUMBER(S)
16. SECURITY CLASSIFICATION OF:
a. REPORT b. ABSTRACT c. THIS PAGE
17. LIMITATION OFABSTRACT
18. NUMBEROFPAGES
19a. NAME OF RESPONSIBLE PERSON
19b. TELEPHONE NUMBER (include area code)
Standard Form 298 (Rev. 8–98)Prescribed by ANSI Std. Z39.18
14–09–2017 Doctoral Dissertation Oct 2014–Sep 2017
Duality Behaviors of the Quantile Regression Model Estimation Problem
Robinson II, Paul D., Major, USAF
Air Force Institute of TechnologyGraduate School of Engineering and Management (AFIT/EN)2950 Hobson WayWPAFB, OH 45433-7765
AFIT-ENS-DS-17-S-043
United States Army Cyber Command (ARCYBER)ATTN: Cade Saie, LTC, USA8825 Beulah StFort Belvoir, VA 22060
ARCYBER
12. DISTRIBUTION / AVAILABILITY STATEMENT
Distribution Statement A: Approved for Public Release; Distribution Unlimited
13. SUPPLEMENTARY NOTES
This work is declared a work of the U.S. Government and is not subject to copyright protection in the United States.
14. ABSTRACT
15. SUBJECT TERMS
quantile regression, linear programming, optimization
U U U UU 127Dr. James W. Chrissis, AFIT/ENS
(937) 367-6760 [email protected]