air force institute of technologyquantiles, is shown to be the solution to a parametric minimization...

DUALITY BEHAVIORS OF THE QUANTILE REGRESSION MODEL

ESTIMATION PROBLEM

DISSERTATION

Paul D. Robinson II, Major, USAF

AFIT-ENS-DS-17-S-043

DEPARTMENT OF THE AIR FORCEAIR UNIVERSITY

AIR FORCE INSTITUTE OF TECHNOLOGY

Wright-Patterson Air Force Base, Ohio

DISTRIBUTION STATEMENT A:APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED

The views expressed in this dissertation are those of the author and do not reflectthe official policy or position of the United States Air Force, the Department ofDefense, or the United States Government. This material is declared a work of theU.S. Government and is not subject to copyright protection in the United States.



ESTIMATION PROBLEM

DISSERTATION

Presented to the Faculty

Department of Operational Sciences

Graduate School of Engineering and Management

Air Force Institute of Technology

Air University

Air Education and Training Command

in Partial Fulfillment of the Requirements for the

Degree of Doctor of Philosophy in Operations Research

Paul D. Robinson II, B.A., M.S.

Major, USAF

September 2017

DISTRIBUTION STATEMENT A:APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED



ESTIMATION PROBLEM

Paul D. Robinson II, B.A., M.S.Major, USAF

Committee Membership:

James W. Chrissis, PhDChair

Richard F. Deckro, PhDMember

Christine M. Schubert Kabban, PhDMember

James F. Morris, PhDMember

Adedeji B. Badiru, PhDDean, Graduate School of Engineering

and Management

Table of Contents

Page

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

1.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . 1-2

1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . 1-8

1.3 Solution Method Characteristics . . . . . . . . . . . . . 1-9

1.3.1 Problem Statement . . . . . . . . . . . . . . . 1-12

1.3.2 Research Objectives . . . . . . . . . . . . . . 1-12

1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 1-13

II. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

2.1 QRMEP Properties . . . . . . . . . . . . . . . . . . . . 2-1

2.1.1 Exact-Fit or p-Subset Property . . . . . . . . 2-2

2.1.2 Cardinality Range Property . . . . . . . . . . 2-3

2.1.3 Partitioning . . . . . . . . . . . . . . . . . . . 2-4

2.2 Pivoting Methods . . . . . . . . . . . . . . . . . . . . . 2-8

2.2.1 Barrodale-Roberts Algorithm . . . . . . . . . 2-8

2.2.2 Koenker-d’Orey Algorithm . . . . . . . . . . . 2-13

2.2.3 Interval-Linear Programming . . . . . . . . . 2-16

2.2.4 Dual Simplex Method for Bounded Variables . 2-22

2.3 Interior-Point Methods . . . . . . . . . . . . . . . . . . 2-25

2.3.1 Affi ne Scaling . . . . . . . . . . . . . . . . . . 2-26

iii

3.2 Log-Barrier Methods . . . . . . . . . . . . . . 2-33

2.4 Finite Smoothing Algorithm . . . . . . . . . . . . . . . 2-40

2.5 Integer Programming Formulations . . . . . . . . . . . 2-42

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 2-46

III. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

3.1 Simplex Method for Bounded Variables . . . . . . . . . 3-1

3.2 Generalized Interval-Linear Programming . . . . . . . 3-9

3.3 Long-Step Dual Simplex (LSDS) Method . . . . . . . . 3-15

3.4 The QRMEP as an Integer Program . . . . . . . . . . 3-21

3.4.1 The Bounded Interval Generalized Assignment

Problem . . . . . . . . . . . . . . . . . . . . . 3-23

3.4.2 The Bounded Interval Knapsack Problem . . . 3-26

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3-27

IV. Implementation, Testing, and Numerical Results . . . . . . . . 4-1

4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . 4-1

4.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

4.3 Numerical Results . . . . . . . . . . . . . . . . . . . . 4-2

4.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 4-16

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 4-18

V. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . 5-3

5.2 Future Research . . . . . . . . . . . . . . . . . . . . . 5-4

5.2.1 Preprocessing . . . . . . . . . . . . . . . . . . 5-4

5.2.2 Integer Programming Alternatives . . . . . . . 5-4

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIB-1

iv

List of FiguresFigure Page

4.1 p = 3. q = 0.05. LSDS (solid), Dual Simplex (dashed), Interior-

Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

4.2 (p, q) = (3, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-

Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10

v

Figure Page


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11


Point (dotted). . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11

4.16 GILP for p = 3. 5th Quantile (magenta), 25th Quantile (blue),

50th Quantile (black), 75th Quantile (green), 95th Quantile

(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13



(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13



(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14

vi

List of TablesTable Page

4.1 Crossover Point Data for p = 3 . . . . . . . . . . . . . . . . . 4-6

4.2 Crossover Point Data for p = 8 . . . . . . . . . . . . . . . . . 4-12

4.3 Crossover Point Data for p = 15 . . . . . . . . . . . . . . . . 4-12

4.4 Average Number of Iterations . . . . . . . . . . . . . . . . . . 4-15

vii


Abstract

A vector of quantile regression model coeffi cients, also known as regression

quantiles, is shown to be the solution to a parametric minimization problem. It

can also be shown that the same model parameters are obtainable by solving a

nonparametric dual linear program, and it is this feature of the quantile regression

model estimation problem (QRMEP) that is of particular interest.

Both the primal and dual linear programs of the QRMEP are shown to possess

special structures. Provided certain model assumptions are met, the QRMEP also

exhibits two unique properties. These properties, along with the duality behaviors

of the problem, are exploited in order to extend two pivoting algorithms to the class

of QRMEPs: a generalization of interval-linear programming (I-LP) and a long-step

variant of the dual simplex method. For problems and/or models up to a certain

size, these extensions are shown to perform well, computationally, against the classic

dual simplex algorithm and interior-point methods.

viii

DUALITY BEHAVIORS OF THE QUANTILE REGRESSION

MODEL ESTIMATION PROBLEM

I. Introduction

Insights into the distribution of a single variable can be obtained through

measures of central tendency and measures of spread. The mean enjoys a dominant

position among centrality measures; such dominance is even more apparent when

models for conditional distributions are required, namely linear regression models.

The ordinary least squares (OLS) normal equations can be solved easily, even for

large problems, and the closed-form solution offered by the normal equations partially

explains the appeal of OLS models. A typical assumption made for linear regression

models is that the errors (residuals) are independently, identically, and normally

distributed. If this assumption holds, then OLS models suffi ciently describe the

distributive behavior of a response about its center. One of the limitations of OLS

models, however, is sensitivity to outliers and skewed distributions, and researchers

seek to mitigate the effects of such extreme values. One method of dealing with

outliers in OLS models is simply to discard them, but eliminating extreme values

could yield misleading conclusions.

Many studies, on the other hand, focus specifically on these extreme values.

Analysis techniques that examine conditional distribution locations other than the

mean are therefore needed in order to provide a more detailed description of a con-

ditional response distribution [38], and quantile regression satisfies this requirement.

The main advantage quantile regression models have over OLS is their robustness.

That is, quantile regression models are insensitive to outliers and skewed distribu-

1-1

tions, so they have applications to studies in the social sciences where the tails of a

distribution are a concern [28].

1.1 Problem Setting

The research in this dissertation is focused exclusively on the problem of es-

timating the quantile regression model coeffi cients; that is, the quantile regression

model estimation problem (QRMEP). Like OLS, the parameter estimates for quan-

tile regression models come from solutions to minimization problems. On the other

hand, a closed form solution is not available for the QRMEP. Instead, conditional

quantile functions are estimated as the solutions to parametric linear programming

(LP) problems.

The idea of generating a hypothetical model of a response distribution via

some manner of least absolute deviation (LAD) spans more than two hundred years

of research, beginning with Boscovich and Laplace in the late 1700s, followed by

Edgeworth nearly a century later [11]. Boscovich essentially introduced median re-

gression, a special case of the QRMEP, while modeling the ellipticity of the earth.

He proposed a linear model for ellipticity, one where the sum of absolute errors is

minimized, and its errors sum to zero [38]. The next major advancement, one that

may be considered the source of modern quantile regression and this research, comes

from Koenker and Bassett [32]. Instead of repeating the traditional and laborious

process of sorting sample observations to obtain quantiles, they proposed formulat-

ing the minimized sum of absolute deviations as a parametric LP, the solution to

which is a vector of quantile regression model parameters. They called this class

of linear models regression quantiles, where the vector of quantile regression model

coeffi cients (solution to the QRMEP) is also the vector of regression quantiles. Con-

ditional mean models are still more popular because of the computational advantage

OLS has over that of LAD methods, but recent advances in LP theory have helped

regression models of the l1-norm variety overcome this deficiency and stimulated

renewed interest in such problems.

1-2

Properly defining a quantile is necessary before formulating the LP to estimate

the quantile regression model. At least three definitions of a quantile are encountered

in the literature, but one seems to convey the idea of a quantile better than the

others. Koenker [38], as well as Batur and Choobineh [3], define the quantile y(q)

of a random variable Y as y(q) = inf {y : F (y) ≥ q}, where F (Y ) = P (Y ≤ y).

Hao and Naiman [28] estimate the quantile in terms of proportions of values of the

random variable Y . For instance, if y(q) is the (100q)th quantile, then the proportion

of values of Y which are less than or equal to y(q) is q. Koenker and Hallock also

use this definition [37]. Chang [16] offers a definition in the context of a financial

management metric known as downside risk, which is the probability of observing

values of a random variable that are less than some critical value. It should also be

noted that although these three definitions are different, each essentially generates

the same (100q)th quantile for a continuous random variable, but downside risk most

adequately describes how to interpret a conditional quantile model.

Suppose a sample obtained from a random variable Y is partitioned into equal

parts, and assume that a value Y = y can be drawn from any of the partitions with

equal probability. The values that define the boundaries of each partition are said

to be the quantiles. In other words, a value y(q) is the (100q)th quantile of Y if the

probability of a random draw from Y being less than or equal to y(q) is exactly q, or

P (Y ≤ y(q)) = q. (1.1)

Certain quantiles are uniquely identified, such as the quartiles (0.25, 0.75), median

(0.50), deciles (0.10, 0.20, etc.), and percentiles (0.01, 0.02, etc.). For simplicity,

however, the general term quantile is used for any y(q) with probability q ∈ (0, 1).

While OLS examines residual behavior about the conditional mean, quantile

regression examines residual behavior about the (100q)th conditional quantile. The

associated probability of interest for the quantile supplies the asymmetric weights to

1-3

the objective function, namely q and its complement (1− q). Let p be the number

of parameters included in a quantile regression model of the form

y = Xb+ r,

where y ∈ Rn is a vector of observations from some dependent random variable

Y (response), X ∈ Rn×p is a design matrix whose rows denote observations from

(p− 1) independent random variables (regressors) X1, X2, . . . , Xp−1; b ∈ Rp is the

vector of quantile regression model parameters (coeffi cients), and r ∈ Rn is a vector

of estimation errors (residuals).

Suppose the conditional median (50th quantile) is to be estimated. Rather

than minimizing the sum of squares of the residuals (i.e., OLS), the symmetric abso-

lute value function is minimized in order to solve for the vector of model parameters

b:

minb∈Rp

n∑j=1

|yj − xjb| = minb∈Rp

n∑j=1

|rj| ,

where xj is the jth row of X of the form xj =(1, xj1, xj2, . . . , xj(p−1)

). For the

median, the sum of positive and negative residuals must be zero. For any q 6= 0.5,

this sum is nonzero, so some modification to the absolute value function is required

such that it covers any probability q ∈ (0, 1). Let ρq (rj) =(1− Irj<0

)rj denote

the tilted absolute value function [37] for the jth residual, where Irj<0 is an indicator

function such that Irj<0 = 1 if rj < 0 and zero otherwise. This residual loss (check)

function is generalized for any probability q ∈ (0, 1) as

ρq (rj) =(q − Irj<0

)rj.

If a sample of size n is partitioned into equal parts, and a random draw coming from

any partition is equiprobable, then it follows that the number of sample values which

are less than or equal to the quantile is at most qn. Now consider the asymmetric

1-4

sum of residualsn∑j=1

ρq (rj) =

n∑j=1

ρq (yj − xjb) .

The objective is to find the vector of model coeffi cients (regression quantiles [32])

b ∈ Rp that minimizes the convex combination of residuals

minb∈Rp

{n∑j=1

ρq (rj)

}= min

b∈Rp

(q − 1)∑rj<0

(yj − xjb) + q∑rj>0

(yj − xjb)

.Since q 6= |q − 1|, except in the median case, it is necessary to distinguish between the

set of residuals weighted by q and the set of residuals weighted by (q − 1). Rewrite

the residual vector as the difference between two nonnegative vectors; that is, let

r = u − v, where u ∈ Rn+ is a vector of positive residuals and v ∈ Rn+ is a vector of

absolute values of negative residuals. In other words,

uj =

rj, rj > 0

0, otherwise

(1.2)

vj =

|rj| , rj < 0

0, otherwise

.

It follows that if the jth residual is zero, then uj = vj = 0. The vector of positive

residuals u is weighted by the probability q, and the vector of absolute negative

residuals v is weighted by its complement (1− q). Clearly, it is unnecessary to

place any kind of weight on the zero residuals, so the QRMEP takes the form of the

minimization problem

minb∈Rp,u≥0n,v≥0n

q

n∑j=1

uj + (1− q)n∑j=1

vj (1.3)

1-5

subject to

x1b+ u1 − v1 = y1

...

xnb+ un − vn = yn,

where the raw residuals are the constraints. The primal QRMEP can be expressed

in matrix notation as

minb∈Rp,u≥0n,v≥0n

quT1n + (1− q)vT1n (1.4)

subject to

Xb+ u− v = y,

where 1n denotes an (n× 1) vector of ones.

The Karush-Kuhn-Tucker (KKT) conditions [4] further demonstrate the spe-

cial structure of the QRMEP. Obviously, the constraints b ∈ Rp, Xb+ u− v = y,

and u,v ≥ 0 from (1.4) constitute the primal feasibility conditions. The method

of Lagrange multipliers is used to derive the remaining KKT conditions. Since the

first n constraintsXb+u−v = y are equality constraints, the associated n-vector of

multipliers w is unrestricted in sign. The n-vectors of multipliers t and s, associated

respectively with the residual vectors u and v, are nonnegative, since the next 2n

constraints u ≥ 0 and v ≥ 0 are inequality constraints. The resulting Lagrangian

function therefore takes the form

L (w, t, s) = quT1n + (1− q)vT1n +wT (y −Xb− u+ v)− tTu− sTv

= uT (q1n −w − t) + vT ((1− q)1n +w − s) + yTw − bTXTw.

1-6

Taking partial derivatives with respect to the model parameters b and the decision

variables (residuals) u,v produces the dual feasibility conditions [4]

∂L

∂b= −XTw = 0p

∂L

∂u= q1n −w − t = 0

∂L

∂v= (1− q)1n +w − s = 0.

Letting t, s ≥ 0 be n-vectors of surplus variables, the dual feasibility conditions be-

come

XTw = 0p (1.5)

w ≤ q1n (1.6)

w ≥ (1− q)1n, (1.7)

where the two inequality constraints provide bounds on w. Thus, the dual LP of the

QRMEP is

maxw∈[q−1,q]n

yTw (1.8)

subject to

XTw = 0p.

The complementary slackness condition is given by

wT (y −Xb− u+ v) = 0. (1.9)

Let S be the indexing set of all observations, so the cardinality of S is n. To

guarantee that yj −xjb−uj + vj = 0 holds for all j ∈ S, conditions on the values uj

1-7

and vj must be satisfied [38]. It follows from (1.2) that no single observation can have

both a positive and negative residual simultaneously, so the situation where uj > 0

and vj > 0 is impossible. As a result, Koenker [37] rewrites the complementary

slackness condition as min {uj, vj} = 0, which must hold for all j ∈ S. In other

words, yj − xjb = uj implies vj = 0 for positive residuals, xjb − yj = vj implies

uj = 0 for negative residuals, and yj = xjb implies uj = vj = 0 for zero residuals.

The observations associated with the vector of zero residuals define the regression

hyperplane, so these observations are said to be basic. Observations associated with

the positive or negative residuals are said to be nonbasic.

1.2 Applications

Social scientists, econometricians, and ecologists are often more concerned with

researching the extremes of certain phenomena rather than their respective central

behaviors. For example, an economist studying income inequality pays particular

interest to the rich and poor, which requires close examination of the upper and lower

quantiles, or tails, of the income distribution [28]. Since linear regression only de-

scribes the conditional behavior of the response about its center, quantile regression

is a robust alternative for analyzing the extremes of a conditional distribution.

Over the past decade, public health and medical studies are two areas in which

quantile regression has seen increasing acceptance, both as a primary method and a

comparative tool. Stifel and Averett [59] show how traditional analyses on obesity

are being challenged by conditional quantile models. Anomaly detection in cyber

operations is another example of a new field to which quantile regression can be ex-

tended, where anomalous, rather than normal (i.e., average), web traffi c is a primary

concern.

Quantile regression models have been applied extensively to various wage gap

studies. Garcia, Hernandez, and Lopez-Nicolas [23] examined the male-female

wage gap in the Spanish labor market. For each gender, Garcia, et al. con-

1-8

structed conditional quantile models on wages and analyzed wage differentials for

q ={

110, 1

4, 1

2, 3

4, 9

10

}. Buchinsky [10] focused on female wage distribution in the

US. Machado and Mata [44], [45] determined the effects of skills preferences, foreign

competition, and education on the Portuguese labor force. Martins and Pereira [46]

investigated the effects of education on wage inequality across 16 nations. Wage

differentials between the public and private sectors have also been described using

quantile regression models. See Melly [52] for details on the public-private sector

wage gap in Germany. Mueller [53] conducted a similar study for Canada. Quantile

regression has been applied to various other economic topics, such as firm start-up

size [47] and hedge fund strategies [51].

Quantile regression is being increasingly employed in ecological studies. Cade

and Noon [13] provide a brief and simple mathematical description of quantile regres-

sion in their introductory article, but the article’s purpose was rather to demonstrate

the current successes of quantile regression in ecology and encourage its increased

use. Heteroscedasticity in species modeling is analyzed using quantile regression in

Cade, et al. [12], and a subsequent study [14] uses regression quantiles to uncover

hidden biases in habitat models. Using quantile regression to model the distribu-

tions of various species is discussed in Vaz, et al. [62]. Each of these articles does

not contain a suffi cient mathematical description of quantile regression, a deficiency

which may slow the increased acceptance of quantile regression as a useful tool in

ecology.

1.3 Solution Method Characteristics

Developing an alternative method to solve the QRMEP presents a unique chal-

lenge, considering that several computationally effi cient algorithms already exist.

Because (1.4) and (1.8) are LPs, it is reasonable to investigate pivoting methods for

potential solution techniques. The special structure of the QRMEP, particularly

(1.8), prevents the classic simplex method from being applied directly, specifically

1-9

with regards to the stopping criteria of the algorithm. The details on this feature

of the problem are discussed in Chapter 3. Barrodale and Roberts [1] developed an

effi cient simplex algorithm for median regression, and Koenker and d’Orey [33] ex-

tended this method to compute regression quantiles for any probability q ∈ (0, 1). In

most software packages containing a dedicated quantile regression solver, this mod-

ified simplex algorithm is the default [19]. It is most effective and effi cient on small

to moderately sized problems, which Chen and Wei [17] defined to be n ≤ 100, 000

observations. What characterizes a large problem also depends on the number of

independent variables (regressors). Chen and Wei established an upper bound of

5, 000 on the number of observations to be a large problem, when the number of

regressors approaches 50.

Interior-point algorithms have a computational effi ciency advantage over sim-

plex methods. Consequently, interior-point methods are generally preferred for

estimating large-scale LPs. Practical experimentation has shown the advantage in

computational effi ciency to be dependent on sample size. Portnoy and Koenker [55]

showed that interior-point methods are actually inferior to simplex methods when

sample sizes are small. Interior-point methods achieve computational dominance

once a sample grows to a certain size, but this crossover point is subject to the

number of regressors in the model. The experiments of Portnoy and Koenker [55]

on the conditional median showed this crossover point to be around n = 20, 000

when the number of regressors was low, say (p− 1) = 4, but n decreased substan-

tially (n < 500) for small increases in the number of regressors (up to (p− 1) = 16).

Specifics of the interior-point method developed by Koenker and Park [35], a variant

of the primal-dual path following method, are presented in Chapter 2.

A third method, which Chen [18] calls the finite smoothing algorithm, is compu-

tationally competitive with both simplex and interior-point methods, and its details

are also discussed in Chapter 2. The primal objective function in (1.4) is a weighted

LAD function and not differentiable, so the finite smoothing algorithm approximates

1-10

the objective function via a Huber function, thus making it differentiable. The finite

smoothing algorithm outperforms the Barrodale-Roberts algorithm when n > 3, 000

or (p− 1) > 50. It is significantly faster than the interior-point method when the

number of model regressors is large.

Other solution techniques, such as interval-linear programming (I-LP), have

been proposed [56], but the dual simplex, interior-point, and finite smoothing meth-

ods have become standard options in many software packages [19]. Each of these

current methods has its own advantages and disadvantages, but the following algo-

rithm features are considered to be essential to extending an alternative method to

the class of QRMEPs:

1. Exact. Being simplex variants, the Barrodale-Roberts and Koenker-d’Orey

[33] algorithms are considered exact solution methods in that each converges

to the optimal basis in a finite number of iterations [1]. The interior-point and

finite smoothing algorithms, on the other hand, are iterative search methods

which approximate an improving model parameter vector solution at each it-

eration. This research focused on extending an alternative pivoting algorithm

to the class of QRMEPs; one that proceeds from (1.8) and converges to the

optimal basis with the same accuracy as that of simplex methods.

2. General. The Barrodale-Roberts and I-LP methods were developed to solve

a special case of the QRMEP, specifically the conditional median. Just as the

Koenker-d’Orey algorithm extended Barrodale-Roberts to any quantile, the

extensions resulting from this research are also applicable for any probability

q ∈ (0, 1).

3. Effi cient. The computational effort required to estimate quantile regression

models is generally greater than that of OLS. As the sample size and/or model

size increases, the computation time for the algorithm also increases, so the

computational effi ciency of a QRMEP solution method is bounded above by

1-11

sample/model size. Two measures of performance serve as proxy metrics for

the required computational effort [29]: processing time (run time) and the

number of iterations.

Much of the existing research on the QRMEP has paid particular attention

to algorithmic speed (run time). However, each of the existing quantile regression

model estimation methods is deficient in at least one of the aforementioned features.

The discussion in Chapter 2 gives details on which feature each method performs

poorly. Taking into account the above characteristics, the following statement cap-

tures the direction of this research.

1.3.1 Problem Statement. There exist simplex, interior-point, and finite

smoothing algorithms for solving the QRMEP. However, an alternative method-

ology is needed which proceeds from (1.8), exploits both the unique properties of

the QRMEP and the special structures of (1.4) and (1.8), and yields an algorithm

that is exact, general, and computationally effi cient. This research responds to

these requirements by extending two alternative pivoting algorithms to the class of

QRMEPs: generalized interval-linear programming and a long-step variant of dual

simplex.

1.3.2 Research Objectives. This research first attempted to extend a spe-

cial simplex implementation, the simplex method for bounded variables (bounded

simplex), to the class of QRMEPs because the form of (1.8) is equivalent to the form

required for the bounded simplex method. Additionally, two pivoting algorithms

were successfully extended to this class of problems: interval-linear programming (I-

LP) and a long-step variant of the dual simplex method. This research progressed

under the following objectives:

1-12

1. Exploit the unique duality properties of the QRMEP and special structures of

(1.4) and (1.8) by extending a generalized form of I-LP and a long-step variant

of the dual simplex method to the class of QRMEPs.

2. Implement the extended algorithms in a commercially available software pro-

gram, specifically MATLAB. Furthermore, compare the extended algorithms

against two methods available as standard options in the MATLAB environ-

ment: dual simplex and Mehrotra’s predictor-corrector variant of primal-dual

path following (interior-point) [50].

3. Test the algorithms using an open-source dataset for a finite set of quantiles.

The dataset is sampled uniformly to obtain various combinations of problem

and model sizes (i.e., varying levels of n and p). Obtain run times and the

numbers of required iterations for all algorithms under comparison. Generate

and analyze plots of the run times, and tabulate the iteration results in order

to evaluate the performance of the extensions.

1.4 Overview

This dissertation is organized in the following manner. Chapter 2 reviews the

relevant literature on the QRMEP, beginning with detailing its two essential prop-

erties. The review continues with a discussion on partitioning the design matrix

and a translated form of (1.8), followed by comprehensive reviews of current piv-

oting algorithms (Barrodale-Roberts, Koenker-d’Orey, I-LP), current interior-point

methods (affi ne scaling, primal path following, primal-dual path following), and the

finite smoothing algorithm. Chapter 2 concludes by noting similarities the QRMEP

has with some familiar integer programs. Chapter 3 details the two alternative

pivoting algorithms which this research extends to the class of QRMEPs. Chapter

4 presents graphical and tabular results from the MATLAB testing of four solution

methods. Chapter 5 summarizes this research, discusses how it contributes to the

field of Operations Research, and presents topics for future work.

1-13

II. Literature Review

In addition to the basic duality structure and KKT optimality conditions presented

in Chapter 1, the QRMEP exhibits some unique properties. These properties im-

pose additional optimality conditions on the QRMEP. They also establish some

necessary assumptions on the model, and these assumptions must hold to guarantee

an optimal solution to the QRMEP. Section 2.1 begins by presenting two properties

of quantile regression found in the current literature, and concludes with alternative

formulations of the primal and dual LPs resulting from these properties.

Section 2.2 identifies the contemporary fields to which quantile regression is

most commonly applied. To date, the social science fields have demonstrated

a greater preference for quantile regression as an alternative to conditional mean

models than have other disciplines, yet other fields are beginning to apply quantile

regression to their respective analyses.

Chapter 2 continues with descriptions of current solution methods for the

QRMEP. Section 2.3 describes the pivoting algorithms: Barrodale-Roberts, Koenker-

d’Orey, I-LP, and dual simplex. Section 2.4 covers the interior-point methods: affi ne

scaling, primal path following, and primal-dual path following. The literature re-

view concludes with identifying structural similarities between the QRMEP and some

familiar integer programming problems.

2.1 QRMEP Properties

In 1978, Koenker and Bassett [32] introduced two properties unique to the

QRMEP. These properties are significant because they establish three model as-

sumptions which must hold in order to guarantee convergence to the optimal basis.

The following subsections restate the theorems that define each property and present

these assumptions. Combining the properties with a specific partitioning scheme

results in some interesting reformulations of the primal (1.4) and dual (1.8) LPs.

2-1

2.1.1 Exact-Fit or p-Subset Property. The observations used to fit the

quantile regression model can be distinguished from those having nonzero residuals

by defining H as the set of all index subsets of size p. That is, each p-subset

h ∈ H consists of the indices of the observations used for model fitting, and these

are referred to as the basic observations [38]. Let Xh denote a square submatrix of

size p whose rows are the observations identified by each index j ∈ h. Each row in

Xh has a zero residual, so

Xhb = yh (2.1)

where yh is the p-vector of the response defined by h, and b is its solution vector

of model parameters. This leads to the following theorem, which is a restatement

of theorems given originally by Koenker and Bassett [32] and later by Koenker [38].

The result is what is known as the exact-fit property.

Theorem 1 If the design matrix X has rank p, then there exists at least one p-

element subset h ∈ H such that

b∗ = X−1h yh.

Furthermore, b∗ is a solution to the QRMEP if and only if

(q − 1)1p ≤ wh ≤ q1p,

where wh is the corresponding p-subset of the dual solution vector w.

If rank (X) = p, then Xh ∈ Rp×p and nonsingular (invertible), so the exact-

fit property is also called the p-subset property [2]. The set of all p-subsets H is

therefore equivalent to the set of all dual extreme points generated by the polytope

from (1.8).

2-2

If all data in the model are continuous, then the exact-fit property can be used

to compute a parameter vector b for any p-subset h ∈ H. However, not every

p-subset satisfies dual feasibility for a specific q. It is possible that more than one

p-subset solves the QRMEP, though Koenker [38] indicates that such degeneracies

are rare and typically occur when discrete data are present. The exact-fit property

therefore states two necessary assumptions on the quantile regression model. If the

data are all continuous and a nondegenerate solution can be assumed to exist, then

b∗ is a unique solution to the QRMEP if and only if

(q − 1)1p < wh < q1p (2.2)

for exactly one h ∈ H. In other words, the unique optimal solution to the QRMEP

for the (100q)th conditional quantile is identified by the p-subset for which dual

feasibility is strictly satisfied.

2.1.2 Cardinality Range Property. A residual cannot be simultaneously

positive and negative. Similarly, it cannot be both zero and positive, or zero and

negative. Each residual, and the observation to which it corresponds, can there-

fore be classified into exactly one of three mutually exclusive sets: zero residuals

(basic observations which define the regression hyperplane), negative residuals (non-

basic observations falling below the regression hyperplane), and positive residuals

(nonbasic observations falling above the regression hyperplane). The respective

cardinalities of these sets can be approximated under the following theorem from

Koenker and Bassett [32],

Theorem 2 Let P , N , and Z denote the numbers of positive, negative, and zero

elements, respectively, in the residual vector r = y−Xb. If the quantile regression

2-3

model contains an intercept, then

N ≤ qn ≤ n− P = N + Z

P ≤ (1− q)n ≤ P + Z

for all b ∈ S, where S is the set of all b that satisfy the above inequality. If the

cardinality of S is one (|S| = 1), then b is unique and

N < qn < N + Z

P < (1− q)n < P + Z.

If a solution is nondegenerate, then Z = p with bounds [38] on N

qn− p < N < qn (2.3)

and bounds on P

(1− q)n− p < P < (1− q)n. (2.4)

Koenker and Bassett did not provide a name for this property, so it is referred to in

this research as the cardinality range property. This property also establishes the

third assumption that the model must contain an intercept.

2.1.3 Partitioning. Koenker and Bassett [32] distinguish only between

basic (h) and nonbasic (h) observations, but the cardinality range property reveals

another way of partitioning the sample. The nomenclature used here is similar to

that given by Bazaraa, Jarvis, and Sherali [4] in their presentation of the simplex

method for bounded variables (bounded simplex). The design matrix can be par-

titioned as X = (B;Nv;Nu), where B is the (p× p) basis matrix, Nv consists of

the observations (rows of X) falling below the regression hyperplane, and Nu con-

sists of the rows of X falling above the hyperplane. If the three model assumptions

2-4

established by the exact-fit and cardinality range properties are met; that is, if a non-

degenerate solution exists, all independent variables are continuous, the response is

continuous, and the quantile regression model contains an intercept (i.e., X·1 = 1n),

then Z = p, the number of rows in Nv must satisfy (2.3), and the number of rows

in Nu must satisfy (2.4). The dual vector can be similarly partitioned such that

w = (wb,wv,wu), where wv = (q − 1)1v, wu = q1u and wb ∈ (q − 1, q)p. The dual

constraints can now be rewritten as

XTw = BTwb +NTvwv +NT

uwu = 0p, (2.5)

and wb can be computed directly:

wb = −(BT)−1

NTvwv −

(BT)−1

NTuwu. (2.6)

The objective coeffi cient vector y can also be partitioned into y = (yb,yv,yu)T , and

(2.6) can be substituted into the objective function, yielding

yTw = yTb wb + yTvwv + yTuwu

= −yTb(BT)−1

NTvwv − yTb

(BT)−1

NTuwu + yTvwv + yTuwu

=(yTv − yTb

(BT)−1

NTv

)wv +

(yTu − yTb

(BT)−1

NTu

)wu. (2.7)

These partitions reveal a unique attribute of the dual LP, namely that for any n

and p, wv is a vector whose indices correspond to negative residuals, wu is a vector

whose indices correspond to positive residuals, and wb is a vector whose indices

correspond to the observations used to fit the regression hyperplane (zero residuals).

This implies that the positive and negative residual vectors can approach ∞ and

−∞, respectively, without changing the solution. That is, the dual LP approach

is concerned not with the magnitude of the residuals, but rather on which side of

2-5

the regression hyperplane each observation lies [11]. The dual LP can therefore be

expressed in terms of the nonbasic observations,

maxwb∈(q−1,q)p

(q − 1)(yTv − yTb

(BT)−1

NTv

)1v + q

(yTu − yTb

(BT)−1

NTu

)1u (2.8)

subject to

wb = (1− q)(BT)−1

NTv 1v − q

(BT)−1

NTu1u,

where dual feasibility is satisfied, and optimality achieved, only when the basis vector

lies strictly within the bounds. Additional characteristics of the QRMEP can be

obtained by partitioning (1.4). Since uj = vj = 0 for all basic observations, (1.4)

can be rewritten as

minb∈Rp,uu≥0u,vv≥0v

(1− q)vTv 1v + quTu1u (2.9)

subject to

Bb = yb

Nvb− vv = yv

Nub+ uu = yu.

The basis matrix B is nonsingular, so the exact-fit property can be used to determine

the model parameters; b = B−1yb. With the model coeffi cients computed, the

components of the residual vector, uu and vv, can also be obtained,

vv = Nvb− yv

= NvB−1yb − yv

uu = yu −Nub

= yu −NuB−1yb.

2-6

The duality gap [4] is zero at optimality, so

(1− q)vTv 1v + quTu1u =(yTv − yTb

(BT)−1

NTv

)wv +

(yTu − yTb

(BT)−1

NTu

)wu.

Even these rewrites of the primal and dual LPs still do not take into account the

cardinality range property for a specific q, so a solution can satisfy primal feasibility

without satisfying (2.3) and (2.4).

Koenker and Bassett [32] present regression quantiles for linear models as natu-

ral extensions of the order statistics of a single sample. That is, the residuals are the

order statistics in the quantile regression model. Nonparametric statistics in loca-

tion models can also be computed based on the rankings of the sample observations,

and the concept of ranking observations was extended to the quantile regression class

of models in 1992, when Gutenbrunner, et al. defined regression rank scores [25].

Consider the translation aj = wj + 1 − q, which shifts the boundaries on the dual

decision variables such that the translated dual LP [38] is

maxa∈[0,1]n

yTa (2.10)

subject to

XTa = (1− q)XT1n,

where the solution a is a vector of regression rank scores [25]. This equivalent form of

the dual was first presented by Koenker and Bassett [32]. One distinct advantage to

using (2.10) is that, unlike the standard form of (1.8), any translated dual solution

has identical bounds, a ∈ [0, 1]n, regardless of the given quantile. The feasible

region of (2.10), however, is dependent on the quantile. Conversely, the feasible

region of (1.8) holds for any quantile, while the bounds on w vary by quantile. This

translated form can also be expressed in terms of the nonbasic variables, using the

same partitioning of the design matrix and response vector as was done in (2.8). Let

2-7

a = (ab, av, au)T , where av = 0v, au = 1u, and ab ∈ (0, 1)p such that

XTa = (1− q)XT1n (2.11)

BTab +NTv av +NT

uau = (1− q)BT1p + (1− q)NTv 1v + (1− q)NT

u1u

BTab + (q − 1)BT1p = (1− q)NTv 1v − qNT

u1u

BTab = (1− q)BT1p + (1− q)NTv 1v − qNT

u1u.

Since av is a zero vector, the objective function simplifies to

yTa = ybab + yTu1u (2.12)

= (1− q)yTb 1p + (1− q)yTb(B−1

)TNTv 1v − qyTb

(B−1

)TNTu1u + yTu1u

= (1− q)yTb 1p + (1− q)yTb(B−1

)TNTv 1v +

(yTu − qyTb

(B−1

)TNTu

)1u,

which leads to

maxab∈(0,1)p

(1− q)yTb 1p+ (1− q)yTb(B−1

)TNTv 1v +

(yTu − qyTb

(B−1

)TNTu

)1u (2.13)

subject to

ab = (1− q)1p + (1− q)(B−1

)TNTv 1v − q

(B−1

)TNTu1u.

These alternative formulations, especially (2.8), exhibit features unique to the

QRMEP and are useful for solving LPs with bounded variables. Chapter 3 shows

how these features are exploited.

2.2 Pivoting Methods

2.2.1 Barrodale-Roberts Algorithm. Although quantile regression model

coeffi cients are obtained via the solution to a LP, the unique properties of regression

quantiles eliminate the classic simplex algorithm as a viable solution method. In

2-8

1973, Barrodale and Roberts [1] introduced a method, which this research calls the

Barrodale-Roberts algorithm, for the l1-approximation problem which modifies the

simplex method in order to take advantage of the special structure of the condi-

tional median LP. Since q = (1− q) for median regression, the Barrodale-Roberts

formulation of the l1-approximation problem as an LP differs from (1.4) in two ways:

the objective function weights (1− q) and q are removed, and the vector of model

coeffi cients is rewritten as the difference between two nonnegative vectors. Since

median regression is a special case of the QRMEP, (1.4) becomes

minb+∈Rp,b−∈Rp,u≥0n,v≥0n

uT1n + vT1n

subject to

Xb+ −Xb− + u− v = y.

Each iteration involves estimating one of the model parameters, so one should expect

to perform at least p pivots in the tableau. Not all columns need to be displayed in

the tableau, and an initial basic feasible solution is readily available for any problem,

namely by letting all observations be basic. That is, for each observation,

yj − 0 = uj

xj(b+j − b−j

)= 0

b+0 + xjb

+1 − b−0 − xjb−1 = 0

and the intial basis consists of all positive residuals. The initial tableau [1] takes

the form

Costs −→ 0 0 1Tn −1Tn↓ Basis RHS (b+)

T(b−)

TuT vT

1n u y X −X In −InMarginal Costs −→ yT1n 1TnX −1TnX 0 1Tn

.

2-9

The algorithm consists of two stages. Stage 1 is a maximal selection from

among the columns of X or −X [18]. Specifically, Stage 1 is chooses the nonbasic

variable with the largest nonnegative marginal cost to enter the basis. Let c = 1TnX

be a (1× p) row vector, where

k = maxj

{maxj{cj} ,max

j{−cj}

}

denotes the index of the model coeffi cient selected to enter the basis, and 0 ≤ k ≤

(p− 1). Once k is identified, a basic uj must be chosen to leave the basis. This is

accomplished by a sequence of three steps. First, candidate slopes must be computed

for each observation. This means that if the jth observation defines the regression

hyperplane, then there exists a unique multiplier b(j)k = b

(j)+k − b(j)−

k such that

yj − xj(b

(j)+k − b(j)−

k

)= 0

b(j)+k − b(j)−

k =yjxj,

where b(j)k denotes the jth candidate slope. The residual vector and objective func-

tion are computed for each candidate slope, and the b(j)k that minimizes uT1n+vT1n

is identified. The row in the tableau for which yj − xjb(j)k = 0 is the pivot row, and

uj leaves the basis. This sequence continues until p observations have been selected

to define the regression hyperplane.

Stage 2 consists of exchanging nonbasic uj, vj with basic uj, vj. That is,

columns of In or −In are interchanged to complete the optimal basis [18]. As in

Stage 1, the nonbasic variable having maximum marginal cost is selected to enter

the basis, and the basic variable which minimizes the objective function is selected

to leave the basis. If a residual becomes negative, then vj replaces uj in the basis.

The following is a reproduction of an example given in [1].

2-10

Example 3 Estimate the median regression model for the following set of 5 obser-

vations [1]:

y =

1

1

2

3

2

X =

1 1

1 2

1 3

1 4

1 5

The initial tableau takes the form

Cost Basis r b+0 b+

1

1 u1 1 1 1

1 u2 1 1 2

1 u3 2 1 3

1 u4 3 1 4

1 u5 2 1 5

Clearly, b+1 has the maximum marginal cost at

∑5j=1 xj = 15. Compute the candidate

slopes for b+1 , where b

(j)+1 = yj/xj. Compute the residual vector for each b(j)+

1 .

Evaluate the objective function for each candidate slope, and choose the minimizing

value.

b(j)+1 1 1/2 2/3 3/4 2/5

y − xb(j)+1 = r(j)

0

−1

−1

−1

−3

1/2

0

1/2

1

−1/2

1/3

−1/3

0

1/3

−4/3

1/4

−1/2

−1/4

0

−7/4

3/5

1/5

4/5

7/5

0

uT1n + vT1n 9/2 7/8 17/12 31/16 3/4

2-11

The minimizer is b(5)+1 = 2/5, but the cardinality range property is satisfied only by

b(3)+1 = 2/3. Pivot on the third row of the tableau such that b+

1 replaces u3 in the

basis.Cost Basis r(j) b+

0 u3

1 u1 1/3 2/3 −1/3

−1 v2 1/3 −1/3 2/3

0 b+1 2/3 1/3 1/3

1 u4 1/3 −1/3 −4/3

−1 v5 4/3 2/3 5/3

Compute candidate slopes, residuals, and objective function values for b+0 .

b(j)+0 1/2 −1 2 −1 2

y − xb(j)+0

0

−1/2

1/2

1/2

−1

1

0

1

0

−2

−1

−1

0

1

0

1

0

1

0

−2

−1

−1

0

1

0

uT1n + vT1n 5/2 4 3 4 3

The minimizer is b(1)+0 = 1/2, and b+

0 replaces u1 in the basis.

Cost Basis RHS u1 u3

0 b+0 1/2 3/2 −1/2

−1 v2 1/2 1/2 1/2

0 b+1 1/2 −1/2 1/2

1 u4 1/2 1/2 −3/2

−1 v5 1 −1 2

2-12

The two observations used to fit the regression line are (x1, y1) and (x3, y3), and the

parameter vector (optimal solution) is b = (1/2, 1/2)T .

Although the modified simplex method in [1] was developed exclusively for

median regression, the next section presents a generalized algorithm applicable to

any quantile [33].

2.2.2 Koenker-d’Orey Algorithm. Koenker and d’Orey [33] extended the

method in [1] to any probability q ∈ (0, 1), so it is referred to in this research as the

Koenker-d’Orey algorithm. The structure of (2.8) is unique in that the basis vector

is defined exclusively by the nonbasic rows of X. If the partitioning of the design

matrix is relaxed such that N denotes an (n− p)× p matrix containing all nonbasic

rows of X, then (2.6) becomes

wb = −(BT)−1

NTwN ,

where NTwN = NTvwv +NT

uwu and

(q − 1)1p ≤ −(BT)−1

NTwN ≤ q1p,

which can be rewritten in primal space according to the subgradient condition [32]

(q − 1)1p ≤∑j∈h

(BT)−1

xTj

(1

2− 1

2sgn (yj − xjb)− q

)≤ q1p, (2.14)

where sgn (rj) = 1 if rj > 0, and sgn (rj) = −1 if rj < 0. The p-vector b of model

coeffi cients is optimal at q if and only if it satisfies the subgradient condition [33].

Bassett showed in [2] that (2.14) can be rewritten in matrix notation as

(q − 1)1p ≤(BT)−1

NT

((1

2− q)In−p −

1

2I

(r)n−p

)1n−p ≤ q1p, (2.15)

2-13

where In−p is an identity matrix of size (n− p), and I(r)n−p is a diagonal matrix of

size (n− p) whose jth diagonal element is sgn (rj) = sgn (yj − xjb). Applying the

transformation from (2.10) yields

0p ≤ (1− q)(BT)−1

XT1n −(BT)−1

NT

(1

2In−p +

1

2I

(r)n−p

)1n−p ≤ 1p. (2.16)

As long as (2.16) is satisfied, b is optimal for a specific range of q ∈ (0, 1). Suppose

that at some iteration t ≥ 1, b(t) solves (1.4) uniquely for some fixed qt ∈ (0, 1) and

specified basis (p-subset) h ∈ H. Let q denote the quantile of interest and assume

that qt < q. Iteration t of the KD algorithm consists of determining the least upper

bound qt+1 > qt, also called the breakpoint [38], at which b(t) ceases to be optimal.

The algorithm computes these breakpoints by executing line searches of the form

b(t) + δdk, where δ denotes the step size, k is the index of the basic variable selected

to leave the basis, and the search direction dk is the kth column of B−1. The dual

counterpart of the line search is obtained from (2.10). Specifically, the equation in

(2.13) for the translated dual basic vector ab can be rewritten such that the optimal

a(t)b at iteration t satisfies the double inequality

0p ≤ (1− q)(BT)−1

XT1n −(BT)−1

NTa(t)N ≤ 1p, (2.17)

which can be further decomposed into two p-vectors

f =(BT)−1(XT1n −NTa

(t)N

)g =

(BT)−1

XT1n

such that

0p ≤ f − qg ≤ 1p. (2.18)

If the current basis does not satisfy (2.17), then the next breakpoint must be com-

puted because at least one a(t)j ∈ a

(t)b is dual infeasible. Either a(t)

j < 0 or a(t)j > 1

2-14

for at least one j ∈ h, so the index of the leaving basic variable corresponds to the

most negative element of the set

k = minj∈h

{−a(t)

j , a(t)j − 1

}.

The leaving variable a(t)k ∈ a

(t)b becomes nonbasic such that either fk − qt+1gk = 0

or fk − qt+1gk = 1, and it is desirable to find the largest breakpoint which does not

exceed the target quantile. That is,

qt+1 = max

{fkgk,fk − 1

gk: qt < qt+1 ≤ q

}.

The boundary to which a(t)k is driven determines the direction of movement along

b(t) + δdk in order to bring a(t)k into dual feasibility. Let σ denote the direction of

movement such that

σ =

1, if qt+1 = fk/gk

−1, if qt+1 = (fk − 1) /gk

.The next task is finding a nonbasic variable to enter the basis, which occurs when

a nonbasic residual is driven to zero. The new residual vector is given by r(t+1) =

y − X(b(t) + δdk

)= y − Xb(t) − δXdk = r(t) − δXdk, so the index m of the

entering (blocking) variable is determined by the smallest positive step size such

that a nonbasic residual becomes zero,

m = minj∈h

{δj =

(r

(t)j /σx

Tj dk

)> 0}.

The mth row of the design matrix, xm, replaces xk in the basis matrix B, the new

vector of model coeffi cients is computed by b(t+1) = b(t) + δmdk, t = t + 1, and the

next iteration begins.

2-15

2.2.3 Interval-Linear Programming. In 1969, Robers and Ben-Israel [56]

presented a method for solving the median regression problem called interval-linear

programming (I-LP). The algorithm proceeds from (1.8) on the intervalw ∈ [−1, 1]n,

and each iteration consists of solving a decomposition of (1.8). The I-LP method

solves problems of the form

maxw∈[−1,1]n

yTw (2.19)

subject to

d− ≤ Aw ≤ d+,

where y,A,d−, and d+ are known. It follows that (1.8) can be rewritten into

(2.19) by substituting XTw ≥ 0p and XTw ≤ 0p as equivalent to XTw = 0p. By

augmenting matrices and vectors, (1.8) becomes

maxw∈[−1,1]n

yTw (2.20)

subject to 0p

−1n

≤XT

In

w ≤0p1n

,where d− =

0p

−1n

,d+ =

0p1n

, and A =

XT

In

.Simplex methods pivot from one basic feasible solution to another, so they

can be classified as special cases of pivoting algorithms. I-LP, when applied to the

QRMEP, operates exclusively in the dual space generated by (1.8) where only one

solution is dual feasible. Therefore, I-LP can only be classified as a pivoting algo-

rithm because it pivots among dual infeasible solutions. The decomposed problem

takes the form

maxw∈[−1,1]n

yTw (2.21)

2-16

subject to

d(t)− ≤ A(t)w ≤ d(t)+

ds(t)− ≤ as(t)w ≤ ds(t)+,

where A(t) is a coeffi cient matrix from a set of n constraints chosen fromA such that

A(t) is nonsingular, as(t) is a coeffi cient vector from a single constraint chosen from

the remaining p constraints, and t ≥ 1 denotes the current iteration. For t ≥ 1,

let w(t−1) be the maximizer of yTw, subject only to d(t)− ≤ A(t)w ≤ d(t)+, and

let w(t) be the optimal solution to (2.21). For t = 1, notice that (2.21) is easily

formulated by letting d(1)− = −1n,d(1)+ = 1n, and A(1) = In. The remaining single

constraint is therefore chosen from 0p ≤ XTw ≤ 0p, say 0 ≤ 1Tnw ≤ 0. Once

the solution to max{yTw(t−1) : d(t)− ≤ A(t)w(t−1) ≤ d(t)+

}is obtained, it must be

substituted into the single constraint. If w(t−1) satisfies ds(t)− ≤ as(t)w(t−1) ≤ ds(t)+,

then w(t) = w(t−1). The constraints not included in (2.21) are then checked for

feasibility. If w(t−1) satisfies all constraints in (2.19), then the solution is optimal.

Otherwise, the quantity as(t)w(t−1) is either less than its lower bound or greater

than its upper bound. Let the nonnegative amount by which w(t−1) fails to satisfy

ds(t)− ≤ as(t)w(t−1) ≤ ds(t)+ be denoted by

∆ =

as(t)w(t−1) − ds(t)+, if as(t)w(t−1) > 0

as(t)w(t−1) − ds(t)−, if as(t)w(t−1) < 0

. (2.22)

It follows that moving as(t)w(t−1) into feasibility requires changing one or more ele-

ments of A(t)w(t−1), but such changes cannot affect the feasibility of A(t)w(t−1). Let

γj represent the marginal cost of changing the jth element of A(t)w(t−1), where

γj =

(yT(A(t)

)−1)j(

as(t) (A(t))−1)j

sgn ∆ (2.23)

2-17

for all(as(t)

(A(t)

)−1)j6= 0. Let m ≤ n be the number of nonnegative marginal

costs, and sort all γj ≥ 0 from smallest to largest. Define Q as the set of indices

corresponding to the sorted marginal costs,

Q =

{jk : 1 ≤ k ≤ m,

(as(t)

(A(t)

)−1)jk6= 0, γjk ≥ 0

}(2.24)

where 1 ≤ j ≤ n. For each jk ∈ Q, the distance from each element(A(t)w(t−1)

)jk

to its closer boundary is determined by

δjk =

(d(t)− −A(t)w(t−1)

)jk, if sgn ∆ = sgn

(as(t)

(A(t)

)−1)jk(

d(t)+ −A(t)w(t−1))jk, if sgn ∆ = − sgn

(as(t)

(A(t)

)−1)jk

. (2.25)

The index of the element of w(t−1) that will become basic, along with the associated

relative cost, is obtained by

jr = min

{jk : 1 ≤ k ≤ m,

∣∣∣∣∣∑jk∈Q

δjk

(as(t)

(A(t)

)−1)jk

∣∣∣∣∣ ≥ |∆|}

(2.26)

and

θ =−∆−

∑jr−1jk=j1

δjk

(as(t)

(A(t)

)−1)jk(

as(t) (A(t))−1)jr

. (2.27)

Now w(t) is computed as

w(t) = w(t−1) +(A(t)

)−1

(jr−1∑jk=j1

δjkejk + θejr

),

where ejk denotes an n-vector of zeros with a one in the jkth position. If w(t) also

satisfies the constraint(s) excluded from (2.21), then w(t) is the optimal solution

to (2.19). Otherwise, the jrth constraint is removed from d(t)− ≤ A(t)w ≤ d(t)+

and replaced by ds(t)− ≤ as(t)w ≤ ds(t)+, and the constraint set becomes d(t+1)− ≤

2-18

A(t+1)w ≤ d(t+1)+. The new single constraint ds(t+1)− ≤ as(t+1)w ≤ ds(t+1)+ is taken

as any constraint from (2.20) not satisfied by w(t), and the next iteration begins.

I-LP can be suffi ciently demonstrated by working an example from the Cars93

data set [19]. This data set consists of information on vehicle sales in the United

States for the 1993 model year.

Example 4 A sample extracted from Cars93 uses the mean retail price (response)

and horsepower (regressor) variables for all vehicle models sold by Ford Motor Com-

pany in 1993,

y =(

7. 4, 10. 1, 11. 3, 15. 9, 19. 9, 14, 20. 2, 20. 9)T

X =

1 1 1 1 1 1 1 1

63 127 96 105 145 115 140 190

T

where n = 8 and p = 2. For t = 1, the relaxed problem is

maxw∈[−1,1]n

yTw

subject to

−1n ≤ Inw ≤ 1n

0 ≤ 1Tnw ≤ 0,

where A(1) = In, as(1) = 1Tn , d(1)− = −1n, d(1)+ = 1n, and ds(1)− = ds(1)+ = 0. The

optimal solution to max{yTw(0) : −1n ≤ Inw(0) ≤ 1n

}is obviously w(0) = 1n, but

as(1)w(0)= 1Tnw(0) = n is positive and does not satisfy 0 ≤ 1Tnw ≤ 0. Let ∆ = n = 8

and γ = y be the vector of marginal costs. Sorting the elements of γ yields Q =

{1, 2, 3, 6, 4, 5, 7, 8}. All elements of as(1)(A(1)

)−1= 1TnIn are of the same sign as

∆, so δjk =(d(t)− −A(t)w(t−1)

)jk

=(−1n − Inw(0)

)jk

= −2 for all jk ∈ Q. Notice

2-19

that∣∣δ1

(1TnIn

)1

+ δ2

(1TnIn

)2

+ δ3

(1TnIn

)3

+ δ6

(1TnIn

)6

∣∣ = |−2− 2− 2− 2| = 8, so

jr = min

{jk : 1 ≤ k ≤ 8,

∣∣∣∣∣∑jk∈Q

δjk(1TnIn

)jk

∣∣∣∣∣ ≥ 8

}= j4 = 6

and

θ =−8−

∑j3jk=j1

δjk(1TnIn

)jk

(1TnIn)j4=−8− (−2− 2− 2)

1= −2,

which leads to

w(1) = w(0) + In

(j3∑

jk=j1

δjkejk − 2ejr

)= 1n + (−2e1 − 2e2 − 2e3 − 2e6)

=(−1, −1, −1, 1, 1, −1, 1, 1

)T.

2-20

Since XT1w

(1) = 179 6= 0, the 6th constraint −1 ≤ w6 ≤ 1 is replaced by 0 ≤ 1Tnw ≤

0. For t = 2,

A(2) =

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 1 0 0 0

1 1 1 1 1 1 1 1

0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 1

,

as(2) = XT1 ,

d(2)− = −1n + e6,

d(2)+ = 1n − e6,

ds(2)− = ds(2)+ = 0,

∆ = 179,

γ =(

0.1269, −0.3250, 0.1421, −0.1900, 0.1967, 0.1217, 0.2480, 0.0920)T

,

Q = {8, 6, 1, 3, 5, 7} ,

δ =(δ8, δ6, δ1, δ3, δ5, δ7

)T=

(−2, 0, 2, 2, −2, −2

)T,

jr = min

{jk : 1 ≤ k ≤ 6,

∣∣∣∣∣∑jk∈Q

δjk

(XT

1

(A(2)

)−1)jk

∣∣∣∣∣ ≥ 179

}= 1,

θ =−179−

∑j2jk=j1

δjk

(XT

1

(A(2)

)−1)jk(

XT1 (A(2))

−1)j3

= 0.2788,

w(2) = w(1) +(A(2)

)−1

(j2∑

jk=j1

δjkejk + θej3

)

=(−0.7212, −1, −1, 1, 1, 0.7212, 1, −1

)T.

2-21

Since XTw(2) = 0 and w(2) ∈ [−1, 1]8, the vector is optimal to (2.20) with z =

yTw(2) = 18.4596.

Robers and Ben-Israel [56] claim a computational effi ciency advantage over

the simplex method when applied to the l1-approximation problem. The algorithm

is shown to extend easily to any interval [d−,d+] in [57], and numerical results to

support the effi ciency claim in [56] are also given. Since I-LP was developed to

estimate only the conditional median, a major component of this research involves

extending I-LP to the entire class of QRMEPs. This extension is presented in

Chapter 3.

2.2.4 Dual Simplex Method for Bounded Variables. The duality proper-

ties of the QRMEP prevent the direct implementation of the simplex method to

(1.8). To be more specific, a primal simplexing algorithm cannot be used because

it requires pivoting from one primal basic feasible solution to another until dual

feasibility is satisfied. In the case of the QRMEP, (1.8) would have to be treated

as the primal LP, and the goal would be to satisfy primal feasibility. It has been

previously established, however, that any basis of size p satisfies primal feasibility

for the QRMEP. Therefore, implementing an algorithm which proceeds from (1.8)

requires a dual approach, namely the dual simplex method.

The dual simplex algorithm pivots from one primal feasible solution to another,

as does the Koenker-d’Orey algorithm, while using dual space properties to update

the solution at each iteration, so the initial basis need not be necessarily dual feasible

[4]. Unlike the Koenker-d’Orey algorithm, however, the dual simplex method pivots

to an adjacent vertex. The previous discussion on bounded simplex also established

that any nonbasic variable in (1.8) is fixed at one of the bounds, which is additional

confirmation that the dual simplex method is an appropriate implementation [39].

The standard dual simplex method for bounded variables is designed to solve

problems whose general form is maxx{cTx : Ax = b,x ∈ [l, u]n

}, where c and x are

2-22

n-vectors, A is (m× n) with full row rank, b is an m-vector, and l, u ∈ R are the

lower and upper bounds, respectively. Clearly, (1.8) adheres to this general form,

as m = p, c = y, x = w, A = XT , b = 0p, l = (q − 1), and u = q. In 2002,

Kostina [39] presented the steps of what can be called the short-step dual simplex

method, the details of which are reproduced here in the context of the QRMEP.

The algorithm is initialized by selecting from the design matrix a starting basis,

denoted by the (p× p) matrix B, which is not necessarily dual feasible [4]. In fact,

the duality properties of the QRMEP all but guarantee that the initial basis will be

dual infeasible, unless the optimal basis is chosen by happenstance. Once a starting

basis is selected, the exact-fit property is used to estimate the model parameters.

That is, solve the equation Bb = yb for b. Let N be the ((n− p)× p) matrix

of nonbasic observations. The following steps constitute a single iteration of the

short-step dual simplex method.

1. Compute the n-vector of reduced costs (i.e., the residual vector) as r = y−Xb.

Let the triplet λ = (b,u,v) denote a feasible solution to (1.4), and partition

the residual vector as r = (u;v) such that uj = rj, vj = −rj, and r = u − v.

Ensure the numbers of nonzero elements in u and v satisfy the cardinality

range property. Define a search directions ∆r,∆b and a step length σ ≥ 0

such that the residual vector for the next iteration takes the form r (σ) =

r+ σ∆r = y −Xb (σ), where b (σ) = b+ σ∆b. Therefore,

r+ σ∆r = y −X (b+ σ∆b)

y −Xb+ σ∆r = y −Xb− σX∆b

∆r = −X∆b.

2. Partition the dual solution vector as w = (wb;wr), where wb denotes the

vector of basic variables, and wr denotes the vector of nonbasic variables cor-

2-23

responding to nonzero residuals. Fix the elements of wr such that

w(j)r =

q − 1, vj > 0

q, uj > 0

.

Compute the basic variables as wb = −(BT)−1

NTwr.

3. If wb ∈ [q − 1, q]p, then the current basis is optimal, and the algorithm ter-

minates. Otherwise, one of the following inequalities holds for at least one

ik ∈ wb: w(ik)b < (q − 1) or w(ik)

b > q. Select the ikth variable to leave the

basis, where k is the index of the most infeasible element of wb. That is, for

1 ≤ k ≤ p,

k = maxik

{q − 1− w(ik)

b , w(ik)b − q

},

4. Solve Bt = ek for t, where ek denotes a p-vector of zeros with the kth element

at unity. Compute ∆b = ∆rikt, where

∆rik =

−1, w(ik)b > q

1, w(ik)b < (q − 1)

,

and compute ∆r = −X∆b = −∆rikXt.

5. Find the blocking variable by computing step lengths for each j ∈ N, where

∆rj 6= 0. Let σj be the step length for the jth nonbasic variable such that

σj =

−rj/∆rj, rj∆rj < 0

∞, o/w

.

Select the minimum step length according to σh = min {σj}, where h is the

index of the nonbasic variable chosen to enter the basis. If σh =∞, then the

problem is infeasible.

2-24

6. Update the basis and model parameters. That is,

b = b+ σh∆b,

B ←− (B \ xk) ∪ xh,

N ←− (N \ xh) ∪ xk,

and return to Step 1 to begin the next iteration.

In 2002, Kostina [39] developed a variant of the dual simplex algorithm for a

general maximization problem with bounded variables. This variant modifies Step 5

of the dual simplex method to allow for taking longer dual steps. It can therefore be

called the long-step dual simplex (LSDS) method, and this research extends LSDS

to the class of QRMEPs. This extension is also presented in Chapter 3.

2.3 Interior-Point Methods

Interior-point algorithms are the most commonly used methods for estimating

quantile regression models in practice, mainly because of the effi ciency advantage

they offer over simplex-based methods for moderate and large problems. The gen-

eral procedure of any interior-point method involves starting from an initial feasible

solution, which must lie strictly in the interior of the feasible region, and moving

in some direction which improves the objective function value [61]. This action of

taking an improving step is repeated until some stopping criterion is satisfied. Two

types of interior-point methods, both of which have been successfully adapted to

the QRMEP, are reviewed here: affi ne scaling and log-barrier methods. The affi ne

scaling method is presented first under a general maximization problem with equal-

ity constraints, followed by the Koenker-Park [35] adaptation of this method to the

QRMEP. This section concludes by discussing two types of log-barrier methods: the

primal and primal-dual path following algorithms.

2-25

2.3.1 Affi ne Scaling. Vanderbei [61] presents the affi ne scaling algorithm

as a method of solving the maximization problem,

maxg≥0n

cTg (2.28)

subject to

Ag = d,

where c and g are each (n× 1) vectors, d is an (m× 1) vector, and A is an (m× n)

matrix. Orthogonal gradient projection is employed as an appropriate ascent direc-

tion, but g must be scaled such that any step in the steepest ascent direction does

not cross the bounding hyperplanes of the feasible region, thus violating feasibility.

The n-vector g can be represented equivalently as G1n, where G = diag (g), so the

transformation

g = G−1g (2.29)

= 1n

g = Gg.

is applied by Vanderbei [61] as well as Koenker and Park [35]. After substituting

(2.29) into (2.28), the result is an equivalent LP

maxg>0n

cT g (2.30)

subject to

Ag = d,

where A = AG and c = Gc.

Let g(0) denote the initial fesible solution to (2.30). Because of the variable

transformation from (2.29), Ag(0) = d and 0 < g(0) = 1n, so g(0) is an interior

2-26

solution that lies strictly within the bounds of the polytope defined by Ag = d and

g ≥ 0. This initial solution must now be moved towards optimality by stepping in

some direction ∆g(0) such that g(1) = g(0) + α∆g is also a feasible interior solution

to (2.30), and cT g(1) > cT g(0). If g(1) is feasible, then

Ag(1) = d

A(g(0) + α∆g

)= d

must hold. Since Ag(0) = d, then g(1) cannot satisfy primal feasibility unless

αA∆g = 0, implying that ∆g = 0 and no improvement to the initial solution is

obtained. Finding the search direction is the first priority, so the step size α can be

ignored for now. The steepest ascent direction is of course nonzero, so an additional

constraint must be imposed on (2.30) such that both the current and improved

solutions strictly satisfy primal feasibility. This is easily achieved by imposing the

unit length requirement on the steepest ascent direction,

maxg(0)>0n,∆g∈Rn

cT(g(0) + ∆g

)(2.31)

subject to

A(g(0) + ∆g

)= d

‖∆g‖2 = 1,

2-27

where ‖∆g‖ represents the Euclidean norm of ∆g. Introducing h and δ as vectors

of Lagrange multipliers leads to the Lagrangian and first-order conditions

L (∆g,h, δ) = cT(g(0) + ∆g

)+ hT

(d− A

(g(0) + ∆g

))+ δ

(1−∆gT∆g

)=

(cT − hT A− δ∆gT

)∆g +

(cT − hT A

)g(0) + hTd+ δ,

∂L

∂∆g= c− ATh− 2δ∆g = 0n, (2.32)

∂L

∂h= d− A

(g(0) + ∆g

)= 0m, (2.33)

∂L

∂δ= 1−∆gT∆g = 0. (2.34)

The equations (2.33) and (2.34) are clearly the primal feasibility conditions, while

(2.32) is the dual feasibility condition. Dual feasibility implies primal optimality [4],

so if it is assumed that δ = 1/2, then the dual feasibility condition reduces to

∆g = c − ATh. Applying this substitution to ∂L/∂h, while letting r = c −ATh

and recalling that Ag(0) = d, leads to

d− A(g(0) + c− ATh

)= 0m (2.35)

Ag(0) + Ac− AATh = d

Ac− AATh = 0m

h =(AAT

)−1

Ac

=(AG2AT

)−1AG2c

2-28

and

∆g = c− ATh (2.36)

= G(c−ATh

)= Gr

=(In −GAT

(AG2AT

)−1AG

)Gc

= PGc

= cP .

Let P = In − GAT(AG2AT

)−1AG be the matrix which projects c = Gc onto

the null space of A = AG. In other words, if the null space of AG is defined by

N (AG) = {∆g ∈ Rn : AG∆g = 0m}, then ∆g is the orthogonal projection of Gc

onto N (AG) [61]. Because of this attribute, the affi ne scaling method can also be

called a gradient projection method [7]. The step size can now be obtained, and

when coupled with the ascent direction from (2.36), the two can be used to compute

the new solution vector g(1).

Gradient projection moves the new solution towards the feasible region bound-

ary. The optimal solution vector to a LP of the same form as (2.28), assuming non-

degeneracy, consists of a set ofm basic variables which are strictly positive and n−m

nonbasic variables which are exactly zero [4]. If a new solution moves in the pro-

jected gradient direction until it reaches a bounding hyperplane, then gj +α∆gj = 0

for each nonbasic gj, and the step length is chosen as the smallest multiplier which

satisfies this property, or

α =

(maxj

{−∆gj

gj

})−1

(2.37)

=

(maxj

{−eTj G

2r

eTj g

})−1

.

2-29

Since the new solution must be strictly feasible, the step length is further reduced

by some θ ∈ (0, 1) such that g(1) does not reach a bounding hyperplane, so

α = θ

(maxj

{−eTj G

2r

eTj g

})−1

. (2.38)

The choice of θ is also known to have an effect on the convergence of the affi ne scaling

algorithm [61]. Specifically, Hall and Vanderbei [27] confirm that convergence from

affi ne scaling is assured as long as the step size is scaled no more than θ = 2/3. This

is a significant result because it is proven to hold even when nondegeneracy cannot

be assumed. However, if the true optimal solution is nondegenerate, then affi ne

scaling is guaranteed to converge to optimality for any 0 < θ < 1.

The weak duality property supplies a natural stopping criterion. The solution

to the primal maximization problem is bounded above by the dual solution [4],

cTg ≤ dTh

and the two are equal at optimality. It follows that as the estimate in each iteration

approaches the true optimal solution, the duality gap (i.e., the absolute difference

between the dual and primal solutions) decreases, approaching zero in the limit [60].

That is,

limk→∞

(dTh(k)−cTg(k)

)= 0 (2.39)

where g(k) and h(k) are the respective kth iteration estimates of the primal and dual

solutions. Therefore, for some small positive tolerance ξ, the algorithm terminates

when(dTh(k)−cTg(k)

)≤ ξ [35].

Affi ne scaling can be stated simply by the following steps:

1. Choose a suitable θ ∈ (0, 1), a tolerance ξ > 0, and apply the variable trans-

formation from (2.29).

2-30

2. Compute the projected gradient according to (2.36), which also contains for-

mulas for the dual and shadow price vectors.

3. Use (2.38) to obtain the step size and compute the new solution,

g(k+1) = g(k) + αGr(k).

4. If(dTh(k+1)−cTg(k+1)

)≤ ξ, then STOP. Otherwise, return to Step 2.

Vanderbei, Meketon, and Freedman [60] identify affi ne scaling as an effi cient

alternative to Karmarkar’s algorithm. Koenker and Park [35] adapt affi ne scaling

to quantile regression, since (1.8) is nearly equivalent in form to (2.28). As before,

start by applying the transformation,

w = 1TnW = wW. (2.40)

Unlike (2.28), the dual vector from (1.8) is bounded above and below, so W 6=

diag (w). Rather, w is centered [35] relative to the bounds on w. That is, each

diagonal element ofW is determined by the boundary to which it is closest, or

W = diag(

min{

1− q + w(k)j , q − w(k)

j

})(2.41)

where w(k) is the dual vector estimate for the kth iteration. This transformation,

like (2.30), yields X = WX and y = Wy. Notice that the dual feasible region

of (1.8) defines N(XT)(the null space of XT ), and exactly p residuals are zero

(assuming nondegeneracy), so every transposed design matrix has rank p (full row

rank). Therefore, under transformation, (WX)T = XTW also has full row rank,

and the orthogonal projection ofWy onto N(XTW

)(i.e., steepest ascent direction)

2-31

is [60],

∆w =

(In − X

(XT X

)−1

XT

)y (2.42)

=(In −WX

(XTW2X

)−1XTW

)Wy

= W(y −X

(XTW2X

)−1XTW2y

)= W

(y −Xb

)= Wr(k)

where b =(XTW2X

)−1XTW2y is the p-vector of estimated model coeffi cients,

and r(k) is the residual vector estimate. Because the equation for b looks nearly

identical to the solution to the OLS normal equations, Koenker and Park describe

this application of affi ne scaling as an IRLS method [35]. To determine the step

size, and not to be confused with the α from (2.38), let

α = max1≤j≤n

{max

{−eTjW2r(k)

1− q + w(k)j

,eTjW

2r(k)

q − w(k)j

}}(2.43)

and choose some η ∈ (0, 1) such that the dual vector estimate for iteration (k + 1) is

w(k+1) = w(k) +η

αW2r(k). (2.44)

Once the duality gap(

(1− q)∣∣r(k)

∣∣r(q)j <0

+ q∣∣r(k)

∣∣r(q)j >0

)−w(k)y is suffi ciently small,

or less than some defined tolerance ξ, then the current dual vector is optimal.

Koenker and Park [35] focus only on the median case, so this algorithm was

tested for q = 1/4, 3/4, 1/20, 19/20 and compared to the exact results from the

bounded LP. Each test was performed on the same subset of Cars93 as that used to

test the I-LPmethod. Slightly more iterations were required to solve for the quartiles

than the quintiles, yet convergence to a unique optimal solution was achieved for each

test. This problem has a nondegenerate optimal solution, so η was adjusted in order

2-32

to reduce the required number of iterations and speed up convergence. Beginning

at η = 1/20, and continuing at increments of 1/20, the number of iterations was

recorded for each run of the affi ne scaling algorithm. Just as expected from [27],

η = 19/20 resulted in the fewest iterations. So, η was then incremented by 1/100,

but these resulted in no further iteration reductions.

2.3.2 Log-Barrier Methods. Barrier function methods can be developed for

many different LP structures, but they are constructed here to solve (2.28). Barrier

function methods are so named because they, like penalty function methods, help

transform constrained optimization problems into sequences of unconstrained opti-

mization problems by adding a weighted function to the objective such that only

strictly feasible solutions are generated. The barrier function is chosen such that

as a solution approaches the boundary of the feasible region, the barrier function

approaches infinity, resulting in no objective function improvement as the solution

approaches the polytope boundary [5]. Several types of functions satisfy this prop-

erty, but the most commonly used is the natural logarithm. Interior-point methods

using the natural logarithm as a barrier function are therefore called logarithmic

barrier (log-barrier) methods [24].

The polytope boundaries in (2.28) are defined by the nonnegativity constraints

g ≥ 0, so applying the log-barrier function leads to the parametric form of (2.28)

maxg≥0n,µ>0

B (g, µ) = cTg + µ

n∑j=1

ln gj (2.45)

subject to

Ag = d,

where µ > 0. Because limgj→0+ (ln gj) = −∞ for all j, the log-barrier function clearly

penalizes the objective function as the solution estimate approaches the polytope

boundaries. It follows then that any solution vector g to (2.45) lies strictly in the

2-33

interior S of the feasible region, where S = {g : Ag = d,g > 0}. Assume that

an optimal solution to (2.28) exists and denote it by g∗. Assume also that an

optimal solution to (2.45) exists and denote it by g (µ). It follows that when µ is

suffi ciently small, the objective function approaches its optimal value [8]. In other

words, limµ→0

(cTg (µ) + µ

∑nj=1 ln gj (µ)

)= cTg∗.

The KKT conditions for (2.45) are obtained by starting with the Lagrangian

L (g,h, µ) = cTg + µn∑j=1

ln gj + hT (d−Ag)

=(cT − hTA

)g + µ

n∑j=1

ln gj + hTd.

Taking partial derivatives results in the primal feasibility and dual feasibility condi-

tions [61], respectively

∂L

∂h= d−Ag = 0m

∂L

∂g= c−ATh+ µG−11n = 0n.

The complementarity condition [8], which corresponds to the complementary slack-

ness condition when µ = 0, is defined by the substitution s = µG−11n, leading to

the KKT conditions

Ag = d (2.46)

ATh− s = c

GS1n = µ1n.

2.3.2.1 Primal Path Following Algorithm. Since B (g, µ) is neither

linear nor quadratic, the ascent direction is obtained via Newton’s method. The

objective function in the Newton problem is the second-order Taylor series expansion

2-34

(quadratic approximation) of B (g, µ), whose gradient and Hessian take the forms

∇B (g, µ) = c+ µG−11n

∇2B (g, µ) = −µG−2,

respectively. The Newton problem therefore takes the form

maxφ∈Rn,µ>0

cTφ+ µ1TnG−1φ− 1

2µφTG−2φ (2.47)

subject to

Aφ = 0m,

where φ denotes the ascent direction. The first-order conditions are obtained by

applying again the Lagrange multiplier method, which yields

L (φ,h) = cTφ+ µ1TnG−1φ− 1

2µφTG−2φ− hTAφ

c+ µG−11n − µG−2φ = ATh

Aφ = 0m.

Although AT is not invertible, it is assumed to have full row rank, so multiplying

through the dual feasibility condition by AG2 produces an invertible AG2AT term

on the right hand side of the first condition, making it possible to solve for the dual

solution vector h directly [55]:

h =(AG2AT

)−1AG (Gc+ µ1n) . (2.48)

2-35

The result for h is substituted back into the dual feasibility condition such that the

ascent direction φ can be obtained,

φ =(In −G2AT

(AG2AT

)−1A)(G1n +

1

µG2c

)(2.49)

= GPG1n +1

µGPGGc

= G

(1n +

1

µcP

)

and the new solution estimate for the next iteration is computed as g(k+1) = g(k) +

φ(k), where g(k) and φ(k) denote the solution estimate and ascent direction, respec-

tively, for the kth iteration.

As µ varies, the solutions to (2.45) form the central path through the poly-

tope, so this type of log-barrier method can be called a path following algorithm.

Bertsimas and Tsitsiklis [8] present (2.28) as a primal minimization problem, so the

resulting log-barrier method is called a primal path following algorithm. For the

QRMEP, however, the translated dual (2.10) is used when applying the path fol-

lowing algorithm, making it instead a dual path following algorithm. It is therefore

necessary to present the dual LP of (2.28)

minh∈Rm,s≥0n

dTh (2.50)

subject to

ATh− s = c,

which is of the same form as the QRMEP primal. The following steps summarize

the dual path following algorithm:

1. Let k = 0, select an initial solution which strictly satisfies primal feasibility

and dual feasibility (i.e., g(0) > 0n and s(0) > 0n), choose some α ∈ (0, 1), set

the tolerance ξ > 0, and set the barrier parameter µ > 0.

2-36

2. If(s(0))Tg(0) < ξ, then STOP. Otherwise, proceed to Step 3.

3. Use (2.48) and (2.49) to compute the primal solution vector h and ascent

direction φ, respectively.

4. Update the dual solution and primal slack vectors, respectively, as follows:

g(k+1) = g(k) + φ(k)

s(k) = ATh− c.

5. Let µ(k+1) = αµ(k), and return to Step 2.

Portnoy and Koenker [55] apply the dual path following algorithm to the

QRMEP by first eliminating the upper bound on the translated dual vector a in

(2.10) via the substitution a+ s = 1n

maxa,s≥0n

yTa (2.51)

subject to

XTa = (1− q)XT1n

a+ s = 1n,

which puts (2.10) in the same form as that of (2.28). The log-barrier function

becomes B (a, s, µ) = yTa+µ∑n

j=1 ln ajsj, so its gradient and Hessian, respectively,

are

∇B (a, s, µ) = yT + µ(A−1 − S−1

)1n

∇2B (a, s, µ) = −µ(A−2 + S−2

).

2-37

The Newton step φ maximizes the quadratic approximation of B (a, s, µ),

maxφ∈Rn,µ>0

yTφ+ µφT(A−1 − S−1

)1n −

1

2µφT

(A−2 + S−2

)φ (2.52)

subject to

XTφ = 0p.

If h = b, G−1 = (A−1 − S−1), and G−2 = (A−2 + S−2), then the Newton direction

φ satisfies

y + µ(A−1 − S−1

)1n − µ

(A−2 + S−2

)φ = Xb

XTφ = 0p

b =(XTG2X

)−1XT(G2y + µG1n

)(2.53)

=(XT(A−2 + S−2

)−1X)−1

XT((A−2 + S−2

)−1y + µ

(A−1 − S−1

)−11n

)φ =

(In −G2X

(XTG2X

)−1XTG2

)((A−1 − S−1

)1n +

1

µy

)(2.54)

where b ∈ Rp is the vector of model parameters.

2.3.2.2 Primal-Dual Path Following Algorithm. The primal-dual

variant also approximates the central path through the polytope. It differs from the

primal(dual) path following algorithm by applying Newton’s method to the KKT

system of (2.46) rather than to the second-order Taylor series expansion described

in (2.47). Because GS1n = µ1n is nonlinear, the algorithm obtains search direc-

tions in both the primal and dual spaces by applying Newton’s method for solving a

nonlinear system of equations. Despite being computationally more complex than

affi ne scaling, the primal-dual path following IPM performs very well on large prob-

lems. According to Bertsimas and Tsitsiklis [8], it is the preferred algorithm for

commercial solvers implementing interior-point methods.

2-38

If the (2n+m) × 1 vector t = (g,h, s) represents the solution to the KKT

conditions in (2.46), and

F (t) =

Ag − d

ATh− s− c

GS1n − µ1n

represents the KKT system, then the objective is to find t such that F (t) = 0(2n+m).

Start by constructing an approximation of F (t). The first-order Taylor series ex-

pansion is F (t+ φ) ≈ F (t) + J (t)φ, where φ is the Newton direction and J (t) is

the (2n+m)× (2n+m) Jacobian matrix

J (t) =

A 0 0

0 AT −InS 0 G

.

Lettingφ =(φg,φh,φs

), the Newton direction is obtained by solving J (t)φ = −F (t),

or

Aφg = d−Ag (2.55)

ATφh − φs = c−ATh+ s

= c−ATh+ µG−11n

Sφg +Gφs = µ1n −GS1n.

Solving (2.55) for the Newton directions yields

φ =

φg

φh

φs

=

GS−1

(µG−11n + c−ATφh −ATh

)(AGS−1AT

)−1 (AGS−1

(c+ µG−11n −ATh

)+Ag − d

)ATφh +ATh− µG−11n − c

.

2-39

After obtaining the appropriate search directions, proper step lengths must be

computed. The primal-dual path following algorithm requires two step lengths: one

each for the primal and dual directions, respectively. Let θ(k)P denote the primal step

length for the kth iteration and θ(k)D denote the dual step length for the kth iteration.

The step lengths are computed using a ratio test similar to that of (2.38),

θ(k)P = σmin

j

{minj

{−

eTj g(k)

eTj φ(k)g (j)

, eTj φ(k)g (j)

}}

θ(k)D = σmin

j

{minj

{−

eTj s(k)

eTj φ(k)s (j)

, eTj φ(k)s (j)

}}

where σ ∈ (0, 1), φ(k)g (j) is the jth element of φ(k)

g , and φ(k)s (j) is the jth element of

φ(k)s . The scaling factor σ is usually set very close to 1 in practice so that as large

a step as possible can be taken without reaching the polytope boundary. Lustig,

Marsten, and Shanno [42] use σ = 0.99995, as do Portnoy and Koenker [55].

2.4 Finite Smoothing Algorithm

Chen [18] proposed an alternative algorithm to the interior-point method which

applies smoothing to the objective function ρq. Testing and comparisons against

the Barrodale-Roberts and interior-point methods revealed finite smoothing to be

computationally superior to the former. The performance of finite smoothing against

that of the interior-point method is what defines its significance. For large-sample

problems where the number of regressors is small, the finite smoothing and interior-

point methods perform similarly. Finite smoothing performs much faster than the

interior-point method when the design matrix contains a large number of regressors.

It has another advantage in that it provides the same accuracy, or exact solution, as

the Barrodale-Roberts algorithm.

2-40

For the (100q)th conditional quantile, (1.4) can be approximated by the smooth

Huber function [18]n∑j=1

Hγ,q

(r

(q)j

)(2.56)

where

Hγ,q

(r

(q)j

)=

(q − 1) r

(q)j − 1

2(q − 1)2 γ if r

(q)j ≤ (q − 1) γ

12γ

(r

(q)j

)2

if (q − 1) γ ≤ r(q)j ≤ qγ

qr(q)j − 1

2q2γ if r

(q)j ≥ qγ

(2.57)

and γ ∈ R+ is a threshold value. Notice that the inequalities in Hγ, q define three

subregions whose boundaries are the parallel hyperplanes r(q) = (q − 1) γ1n and

r(q) = qγ1n. Each negative residual satisfies r(q)j ≤ (q − 1) γ, and each positive

residual satisfies r(q)j ≥ qγ. The basic residuals lie strictly between the parallel

hyperplanes, which is further demonstrated by defining a sign vector ξ such that

ξj =

−1 if r

(q)j ≤ (q − 1) γ

0 if (q − 1) γ < r(q)j < qγ

1 if r(q)j ≥ qγ

(2.58)

for the jth observation. Define also ωj = 1 − ξ2j such that the smoothing function

can be rewritten as

Hγ,q

(r

(q)j

)=

1

2γωj

(r

(q)j

)2

+ξj

(1

2r

(q)j +

1

4(1− 2q) γ + ξj

(r

(q)j

(q − 1

2

)− 1

4

(2q2 − 2q + 1

)γ

)).

The smoothed objective function is now continuously differentiable, so both its

gradient and Hessian exist. The finite smoothing algorithm is therefore a modified

line search method, where∑n

j=1Hγ,q

(r

(q)j

)is minimized for a series of decreasing

γ [18]. As γ approaches zero, the minimizer of (2.56) approaches the true minimizer

of (1.4).

2-41

Chen [18] cautions not to view the finite smoothing algorithm as a complete

replacement for other quantile regression algorithms. The algorithm performs best

with a large number of regressors, which occurs commonly in certain types of studies,

such as those involving survey data. Chen [18] suggests possibly using a different

method when this is not the case, as characteristics of the data could cause the finite

smoothing method, and others, to fail. A data set with a significant number of

outliers may lead to both the interior-point and finite smoothing algorithms failing,

but the stability of the Barrodale-Roberts algorithm guarantees a solution, despite

its sluggish performance on large problems.

2.5 Integer Programming Formulations

It can be shown that the special structures of (1.4), (1.8), and (2.10) are similar

to certain well-solved problems in other areas of linear optimization, specifically

integer programming. Through simple scalar multiplication, the primal and dual

LPs can be put into the forms necessary for implementing the out-of-kilter method

for solving the minimum cost network flow problem (MCNFP) [22]. The properties

of the QRMEP can also be used to reconceptualize the problem and generate new

formulations based on two well-known integer programs: the generalized assignment

problem and the knapsack problem. These alternative formulations are developed

in detail in Chapter 3.

The structures of (1.4) and (2.10) are quite similar to the dual and primal

structures, respectively, of the MCNFP as given in [4]. Fulkerson [22] presents the

MCNFP more generally as

minz∈[l,u]n

cz (2.59)

subject to

Az = d,

2-42

where A is m× n. The dual LP of the MCNFP therefore takes the form

maxπ∈Rm,λ≥0n,µ≥0n

πb+ λl− µu (2.60)

subject to

πA+ λ− µ = c.

Both (1.8) and (2.10) can be rewritten, respectively, as equivalents to the MCNFP,

minw∈[q−1,q]n

−yTw (2.61)

subject to

XTw = 0p

and

mina∈[0,1]n

−yTa (2.62)

subject to

XTa = (1− q)XT1n.

Similarities can be found also between the (1.4) and (2.60). By simply negating the

objective function, (1.4) assumes the form of (2.60),

maxb∈Rp,u≥0n,v≥0n

(q − 1)vT1n − quT1n (2.63)

subject to

Xb+ u− v = y.

The optimality conditions Fulkerson presents in [22] for the out-of-kilter algo-

rithm are equivalent to a property of the QRMEP. For a feasible z in (2.59), there

2-43

exists a pricing vector π such that

cj + aiπi > 0 =⇒ zj = l (2.64)

cj + aiπi < 0 =⇒ zj = u (2.65)

for each j, where ai denotes the ith column of A. With y = −c, it follows that

the dual feasibility conditions for nonbasic variables in the QRMEP are equivalent

to the necessary conditions, (2.64) and (2.65), for the MCNFP, specifically

yj − xjb < 0 =⇒ wj = q − 1 (2.66)

yj − xjb > 0 =⇒ wj = q, (2.67)

where xj denotes the jth row of the design matrix. By complementary slackness,

each component of these necessary conditions can attain one of three possible levels:

cj + aiπi can be positive, negative, or zero. Similarly, zj can be greater than, less

than, or equal to one of the bounds. There are, consequently, nine possible case

classifications [22] into which each element of z must fall. This is also true for any

feasible basis in the quantile regression dual, but recall from Section 2.1 that not

every basis which satisfies primal feasibility is necessarily a feasible basis for a given

q. For example, a conditional median estimate in which all residuals are positive

may satisfy primal feasibility, but it is an estimate of a lower conditional quantile

rather than the median. The following table lists the nine classes, each with its

corresponding cases, for both the MCNFP and the equivalent quantile regression

2-44

problem.

Case

Class MCNFP QR

α cj + aiπi > 0, zj = l yj − xjb < 0, wj = q − 1

β cj + aiπi = 0, l < zj < u yj − xjb = 0, q − 1 < wj < q

γ cj + aiπi < 0, zj = u yj − xjb > 0, wj = q

α1 cj + aiπi > 0, zj < l yj − xjb < 0, wj < q − 1

β1 cj + aiπi = 0, zj < l yj − xjb = 0, wj < q − 1

γ1 cj + aiπi < 0, zj < u yj − xjb > 0, wj < q

α2 cj + aiπi > 0, zj > l yj − xjb < 0, wj > q − 1

β2 cj + aiπi = 0, zj > u yj − xjb = 0, wj > q

γ2 cj + aiπi < 0, zj > u yj − xjb > 0, wj > q

If all elements of z fall into at least one of the classes α, β, or γ (the in-kilter classes),

then the current solution is optimal. This is also true for quantile regression, and

the exact number of class β elements is known to be p (at optimality), while the

number of class α elements lies within the closed interval from dqn− pe to bqnc.

The optimal basis for quantile regression is unique (assuming nondegeneracy), so for

any other basis in the set of feasible bases, at least one of the p elements in the basis

falls into one of two out-of-kilter classes: β1 or β2. The purpose of the algorithm

is to retain the in-kilter elements and gradually bring the out-of-kilter (infeasible)

elements into kilter [4].

There are several issues to address when considering how to extend the out-

of-kilter method to the class of QRMEPs, chief among them being the fact that the

properties discussed in Section 2.1 hold only for continuous data. Fulkerson [22]

developed the out-of-kilter algorithm to work only with integer, or rational, data.

Any extension to quantile regression models may require some initial transformation

of the data to make it integer or rational. Another option is to modify the steps of

2-45

the algorithm such that it converges even with continuous data. Another issue is

the initial solution. This is theoretically unimportant, since even b = 0p yields a

primal feasible solution for any q, but beginning the algorithm at a primal feasible

solution that also satisfies the cardinality range property can decrease computation

time. A feasible solution that also satisfies (2.3) and (2.4) guarantees that no more

than p elements are out-of-kilter in dual space. However, any processing advantage

gained by starting at such a solution should be weighed against the computational

effort required to obtain it.

2.6 Summary

A review of the literature on quantile regression reveals many significant ad-

vancements that have been achieved in the field, particularly in the area of model

estimation. By introducing the concept of regression quantiles in 1978, Koenker

and Bassett extended the idea of order statistics in single-variable samples (loca-

tion models) to the broader class of linear models [32], and they introduced two

properties that follow from the special structure of the QRMEP. The exact-fit and

cardinality range properties, along with the KKT conditions, constitute the set of

necessary optimality conditions which are specific to the QRMEP. These proper-

ties also established three assumptions on the quantile regression model which must

hold to guarantee a unique vector b ∈ Rp of model coeffi cients: all data are continu-

ous, a nondegenerate solution exists, and the quantile regression model contains an

intercept.

A simple method of partitioning the design matrix X, one which distinguishes

only between basic (Xh ∈ Rp×p) and nonbasic(Xh ∈ R(n−p)×p) observations, is suf-

ficient for employing the exact-fit property to estimate the model parameters. Ex-

ploiting the cardinality range property, however, requires a more detailed partitioning

scheme that further decomposes the matrix of nonbasic observations into two ma-

trices: a matrix of nonbasic observations whose residuals are negative (Nv) and a

2-46

matrix of nonbasic observations whose residuals are positive (Nu). The number of

rows in Nv must satisfy (2.3), while the number of rows in Nu must satisfy (2.4).

Gutenbrunner, et al. extended the idea of ranking sample observations to the

class of conditional quantile models by applying a transformation to (1.8). The

result is a the translated dual LP (2.10), whose solution a ∈ [0, 1]n is a vector of

regression rank scores. Two types of interior-point methods, primal path following

and primal-dual path following, proceed from (2.10) to solve the QRMEP. Affi ne

scaling, by contrast, uses the standard dual LP (1.8). Chen’s finite smoothing

algorithm [18] offers an alternative to interior-point methods, under certain model

and problem sizes.

Three types of pivoting algorithms have been developed for solving the QRMEP:

a primal method (Barrodale-Roberts), a primal-dual method (Koenker-d’Orey), and

a dual method (I-LP). The dual simplex method, contrary to its name, may be

considered a primal method in the context of the QRMEP because it solves (1.8)

by conducting line searches in primal space. Kostina [39] developed a long-step

variant of the dual simplex method by modifying the step size selection process.

The Barrodale-Roberts algorithm and I-LP were developed to solve a special case

of the QRMEP: the l1-approximation (conditional median). While Koenker and

d’Orey [33] extended the Barrodale-Roberts algorithm to all conditional quantiles,

I-LP has not yet been extended to the entire class of QRMEPs. The next chap-

ter discusses extending both I-LP and the long-step dual simplex algorithm to the

quantile regression model class of problems. Additionally, by continuing with the

concept presented in Section 2.5, two suboptimal integer programming formulations

of the QRMEP are also developed by extending the idea of expressing the problem

alternatively as an integer program.

2-47

III. Methodology

This research explores extending three pivoting algorithms to the class of QRMEPs:

the simplex method for bounded variables, the LSDS method, and I-LP. The simplex

method for bounded variables could not be successfully extended, and Section 3.1

discusses why it could not be implemented. Section 3.2 details how I-LP can be

generalized to solve (1.8) for any quantile. A successful extension of the LSDS

method to the class of QRMEPs is provided in Section 3.3. Chapter 3 concludes by

developing two suboptimal formulations from (1.8) which resemble a familiar integer

programming problem.

3.1 Simplex Method for Bounded Variables

The Barrodale-Roberts algorithm was successfully extended by Koenker and

d’Orey [34] to be a primal pivoting algorithm capable of computing regression quan-

tiles for any q ∈ (0, 1). Proceeding from (1.4), each iteration of the Barrodale-

Roberts algorithm consists of estimating an element of the parameter vector b and

then pivoting to a new set of residual vectors u and v. It seems natural to explore

the effi cacy of a simplex-based procedure which proceeds instead from (1.8); that is, a

dual pivoting algorithm. Given the boundary constraints on w in (1.8), the simplex

method for bounded variables (bounded simplex) [4] is a logical starting point. It

turns out that bounded simplex, as described in [4], does not solve (1.8) for a simple

reason. The idea behind any simplex method, and bounded simplex is no exception,

is to pivot from one basic feasible solution to another until optimality is achieved

(dual feasibility is satisfied). If the three model assumptions established in Section

2.1 hold, then only one basic feasible solution exists in dual space for the QRMEP:

the optimal solution. Primal feasibility, on the other hand, is satisfied by any solu-

tion. Pivoting between basic feasible solutions in (1.8) is therefore impossible. It

can be shown that the standard bounded simplex algorithm does not converge to

3-1

the optimal basis when applied to (1.8). The theory behind the bounded simplex

algorithm, in the context of the QRMEP, is presented first. A single iteration of the

bounded simplex method is then conducted, using a small sample extracted from

the Cars93 data set, to demonstrate how the algorithm fails to solve (1.8). One

way to demonstrate this result is by rewriting the objective function,

yTw =(yTv − yTb

(BT)−1

NTv

)wv +

(yTu − yTb

(BT)−1

NTu

)wu (3.1)

= (q − 1)∑j∈Rv

(cj − zj) + q∑j∈Ru

(cj − zj) ,

where Rv denotes the set of indices for the nonbasic variables at the lower bound

(q − 1) and Ru denotes the set of indices for the nonbasic variables at the upper

bound q. Each cj − zj corresponds to the raw residual for the jth observation.

In a bounded variable problem that does not possess the special structure of (1.8),

a feasible solution may exist where all nonbasic variables are fixed at one of the

two bounds, but the fact that w must span the null space of XT prevents such a

solution in the QRMEP. Furthermore, each cj − zj is a raw residual, so it follows

that cj − zj < 0 for all j ∈ Rv and cj − zj > 0 for all j ∈ Ru. For a maximization

problem, the stopping criteria for the bounded simplex algorithm are that cj−zj < 0

holds for all j ∈ Rv and cj − zj > 0 holds for all j ∈ Ru [4]. These conditions are

satisfied by any primal feasible solution to (1.4), so the bounded simplex algorithm

cannot be used to solve (1.8). If a pivoting algorithm is to be designed for solving

(1.8), then it must start with a dual infeasible solution and pivot towards optimality.

An iteration of the bounded simplex method begins by selecting a variable to

enter the basis. Let k be the index of the nonbasic variable selected to enter the

basis,

k = minj

{minj∈Rv{zj − cj} ,min

j∈Ru{cj − zj}

}. (3.2)

3-2

For the QRMEP, k is the index of the smallest absolute nonzero residual. If k ∈ Rv,

then wk enters the basis by increasing from its current value of (q − 1). If k ∈ Ru,

then wk enters the basis by decreasing from its current value of q.

Suppose k ∈ Rv. Let ∆k be the amount by which wk is increased from (q − 1)

such that wk = q − 1 + ∆k. Substituting this into (2.6) and (2.7) yields

wb = −(BT)−1

NTvwv −

(BT)−1

NTuwu −

(BT)−1

xTkwk

= − (q − 1)(BT)−1

NTv 1v − q

(BT)−1

NTu1u − (q − 1 + ∆k)

(BT)−1

xTk

= (1− q)(BT)−1

NTv 1v − q

(BT)−1

NTu1u −∆ksk (3.3)

and

yTw =(yTv − yTb

(BT)−1

NTv

)wv +

(yTu − yTb

(BT)−1

NTu

)wu +

(yk − yTb sk

)wk

= (1− q)∑j∈Rv

(zj − cj) + q∑j∈Ru

(cj − zj) + ∆k (ck − zk)

= z + ∆k (ck − zk) , (3.4)

where xk is the kth row of the design matrix (kth observation) and sk =(BT)−1

xTk .

The increase ∆k can be blocked when one of the basic variables either drops to

(q − 1) or increases to q. Let γ1 = ∆k denote the value at which a basic variable

decreases to (q − 1). This increase is bounded above by

(q − 1)1p < wb (3.5)

(q − 1)1p < −(BT)−1

NTvwv −

(BT)−1

NTuwu −∆ksk

(q − 1)1p < d−∆ksk

∆ksk < d+ (1− q)1p.

If sk ≤ 0p, then ∆k can assume any nonnegative value without violating the in-

equality, so compute the following minimum ratio test only for positive elements of

3-3

sk,

γ1 = min1≤j≤p

{dj + 1− q

sjk: sjk > 0

}(3.6)

=dr + 1− q

srk.

The index r identifies the candidate variable wr ∈ wb to become nonbasic at the

lower bound q − 1, and wk takes the place of wr in the basis.

Let γ2 = ∆k denote the value at which a basic variable increases to q, and this

increase is bounded below by

q1p > wb (3.7)

q1p > d−∆ksk

∆ksk > d− q1p.

If sk ≥ 0p, then ∆k can assume any nonnegative value without violating the in-

equality, so compute the following minimum ratio test only for negative elements of

sk,

γ2 = min1≤j≤p

{dj − qsjk

: sjk < 0

}(3.8)

=dr − qsrk

.

In this case, the index r identifies the candidate variable wr ∈ wb to become nonbasic

at the upper bound q, and wk becomes basic in the place of wr.

The value of ∆k is determined by the minimum amount wk can increase before

being blocked,

∆k = min {γ1, γ2} . (3.9)

3-4

Once ∆k is obtained, a new solution can be computed. The nonbasic variable wk is

updated by wk = q − 1 + ∆k, and this result is substituted into (3.3) to update the

working basis vector.

Now suppose wk must decrease from the upper bound q (i.e., k ∈ Ru). Let ∆k

be the amount by which wk is decreased from q: wk = q−∆k. Substituting this into

(2.6) and (2.7) yields

wb = −(BT)−1

NTvwv −

(BT)−1

NTuwu −

(BT)−1

xTkwk

= − (q − 1)(BT)−1

NTv 1v − q

(BT)−1

NTu1u − (q −∆k)

(BT)−1

xTk

= (1− q)(BT)−1

NTv 1v − q

(BT)−1

NTu1u + ∆ksk (3.10)

and

yTw =(yTv − yTb

(BT)−1

NTv

)wv +

(yTu − yTb

(BT)−1

NTu

)wu +

(yk − yTb sk

)wk

= (1− q)∑j∈Rv

(zj − cj) + q∑j∈Ru

(cj − zj)−∆k (ck − zk)

= z −∆k (ck − zk) . (3.11)

Under the dual feasibility condition, the basic variables are strictly bounded below

by

(q − 1)1p < wb (3.12)

(q − 1)1p < −(BT)−1

NTvwv −

(BT)−1

NTuwu + ∆ksk

(q − 1)1p < d+ ∆ksk

(q − 1)1p − d < ∆ksk

3-5

and strictly bounded above by

q1p > wb (3.13)

q1p > d+ ∆ksk

q1p − d > ∆ksk.

The equations for γ1 and γ2 also take different forms:

γ1 = min1≤j≤p

{q − 1− dj

sjk: sjk < 0

}(3.14)

=q − 1− dr

srk

and

γ2 = min1≤j≤p

{q − djsjk

: sjk > 0

}(3.15)

=q − drsrk

.

The value of ∆k is determined by (3.9), which is the minimum amount wk can

decrease before being blocked. The nonbasic variable wk is updated by wk = q−∆k,

and (3.10) updates the working basis vector.

Recall the Cars93 sample from Chapter 2. The following example executes

one iteration of the bounded simplex method [4] on a QRMEP, where q = 1/3 and

p = 2.

Example 5 Let x1 and x3 form the initial basis,

B =

1 63

1 96

.

3-6

Use the exact-fit property to obtain b and generate r:

b =(−0.0455, 0.1182

)Tand

r =(

0, −4.8636, 0, 3.5364, 2.8091, 0.4545, 3.7, −1.5091)T.

There are two negative residuals and four positive residuals, which satisfies (2.3)

and (2.4). This leads to the initial basic vector wb =(−1.303, 1.303

)T, and both

elements w1 and w3 are clearly infeasible. The smallest absolute residual corresponds

to k = 6; k ∈ Ru, so the currently nonbasic variable w6 must be decreased by ∆6

such that w6 = 0.3333−∆6, and

s6 =(BT)−1

xT6 =(−0.5758, 1.5758

)T.

Since w1 is closer to the lower bound (q − 1) than w3 is to the upper bound q, it

may be falsely concluded that w1 should become nonbasic at (q − 1). Because γ1

is defined to be the value at which a basic variable drops to the lower bound, it can

only be computed for a basic variable whose current value is greater than (q − 1).

Similarly, γ2 is defined to be the value at which a basic variable increases to the

upper bound, so it can only be computed for a basic variable whose current value is

less than q. Therefore, by (3.14), (3.15), and (3.9):

γ1 =0.3333− 1− 1.303

−0.5758= 3.4211

γ2 =0.3333− (−1.303)

1.5758= 1.0385

∆6 = min {γ1, γ2} = 1.0385.

3-7

It follows that w1 is indeed the blocking variable, so it becomes nonbasic at q. The

entering variable w6 is updated to be w6 = 0.3333 − 1.0385 = −0.7051, and (3.10)

updates the basic vector to wb =(−1.709, 2.4141

)T. The new basis and residual

vector, respectively, are

B =

x3

x6

=

1 96

1 115

r =

(0.7895, −5.6053, 0, 3.3211, 1.6368, 0, 2.6474, −3.7579

)T.

As expected, the residual vector confirms that w1 is nonbasic at q. The new basic

vector, if obtained using the exact-fit property and (2.6), is

wb =(−3.1754, 3.1754

)T6=(−1.709, 2.4141

)T.

Clearly, since(−3.1754, 3.1754

)T6=(−1.709, 2.4141

)T, the algorithm cannot

continue.

The example reveals another reason why the bounded simplex algorithm fails to

solve (1.8). If the current solution is not optimal, and the three model assumptions

hold, then dual feasibility is not yet satisfied and one of the following inequalities

is true for at least one element of wb: wj < (q − 1) or wj > q. For this reason,

the blocking variable tests (3.6), (3.8), (3.14), and (3.15) fail to update the working

basis such that it also satisfies the exact-fit property.

The bounded simplex method assumes that a finite number of basic feasible

solutions exist for the LP from which it proceeds. Under the three model assump-

tions established for the QRMEP in Chapter 2, only the optimal solution is basic

feasible in dual space. Thus, a pivoting method operating in the dual space of the

QRMEP must converge to the optimal basis by pivoting among infeasible solutions.

3-8

3.2 Generalized Interval-Linear Programming

There exist four methods for solving the QRMEP, three of which are found in

most commercial quantile regression solvers [18]. The I-LP method, however, is not

among them because it was designed to solve only a special case of the QRMEP. This

section discusses extending the algorithm proposed by Robers and Ben-Israel [56]

such that regression quantiles for any q ∈ (0, 1) can be computed. The extension be-

gins by changing the bounds on the dual vector from w ∈ [−1, 1]n to w ∈ [q − 1, q]n.

Therefore, QRMEPs of the form

maxw∈[q−1,q]n

yTw (3.16)

subject to 0p

(q − 1)1n

≤XT

In

w ≤ 0p

q1n

can be solved by a method that can be called generalized interval-linear programming

(GILP). The GILP method solves a finite sequence of decompositions of (3.16), and

each decomposed problem is of the form

maxw∈[q−1,q]n

yTw (3.17)

subject to

d− ≤ Fw ≤ d+ (3.18)

g− ≤ hTw ≤ g+, (3.19)

where F ∈ Rn×n, h ∈ Rn×1, and F is nonsingular. As with I-LP, (3.18) is a set of n

constraints selected from the (n+ p) constraints in (3.16) such that F is invertible,

and (3.19) is a single constraint selected from the remaining p constraints. Since F

3-9

is invertible, apply the transformation s = Fw so that (3.17) becomes

maxs∈Rn

yTF−1s (3.20)

subject to

d− ≤ s ≤ d+ (3.21)

g− ≤ hTF−1s ≤ g+. (3.22)

Letting w∗ denote the optimal solution to (3.17), it follows that w∗ = F−1s∗, where

s∗ is the optimal solution to (3.20). Therefore, solving (3.20) is equivalent to solving

(3.17).

As in [56], the GILP method begins by first solving the subproblem

maxs∈Rn

yTF−1s (3.23)

subject to

d− ≤ s ≤ d+.

Let s(t) be the maximizer of (3.23), where t ≥ 1 denotes the current iteration. If, in

addition, s(t) satisfies (3.22), then s(t) is also the maximizer of (3.20). To check for

feasibility in (3.16), the reverse transformation w(t) = F−1s(t) is applied, and w(t) is

substituted into each of the (p− 1) constraints removed from (3.16). If w(t) satisfies

all constraints in (3.16), then optimality has been achieved and the algorithm stops.

Suppose s(t) does not satisfy (3.22). Then, either hTF−1s(t) < g− or g+ <

hTF−1s(t). If the former holds, then there exists a solution to (3.20) such that

hTF−1s = g−, and if the latter holds, then there exists a solution to (3.20) such that

3-10

hTF−1s = g+. Let ∆ denote the amount by which (3.22) is violated,

∆ =

hTF−1s(t) − g−, if hTF−1s(t) < 0

hTF−1s(t) − g+, if hTF−1s(t) > 0

. (3.24)

It follows that if hTF−1s(t) < g−, then ∆ < 0. Conversely, if hTF−1s(t) > g+, then

∆ > 0. Let(hTF−1

)jdenote the jth element of the vector hTF−1,

(yTF−1

)jdenote

the jth element of the vector yTF−1, and γj denote the marginal cost of changing the

jth element of s(t) [57]. Let Q be the set of indices identifying which elements of s(t)

are candidates to be changed in order to satisfy (3.22) while maintaining feasibility

in (3.21). Let m denote the cardinality of Q such that |Q| = m ≤ n, and

Q =

{j : 1 ≤ j ≤ n,

(hTF−1

)j6= 0, γj =

((yTF−1

)j

(hTF−1)jsgn ∆

)≥ 0

}.

Reorder the indices in Q such that

Q ={jk : γj1 ≤ γj2 ≤ · · · ≤ γjm

}.

One or more elements from the resulting set{s

(t)j : j = jk ∈ Q

}are altered until

all constraints in (3.20) are satisfied. For each jk ∈ Q, compute the distance δjk

from s(t)jkto its opposite boundary. In other words, each s(t)

jkmoves to its opposite

boundary, one at a time, until s(t) satisfies (3.22). These distances are determined

by

δjk =

d−jk − s(t)jk, if sgn ∆ = sgn

(hTF−1

)jk

d+jk− s(t)

jk, if sgn ∆ = − sgn

(hTF−1

)jk

.The step length δjk is equivalent to the direction of movement σ from the Koenker-

d’Orey algorithm. That is, the magnitude of δjk is equal to the length of the closed

interval [q − 1, q] (i.e., q− (q − 1) = 1), and its sign indicates in which direction s(t)jk

moves. One s(t)jkdoes not need to move the entire distance in order to satisfy (3.22).

3-11

Let s(t)jrdenote this “entering”variable, whose index is determined by

jr = min

{jk ∈ Q :

∣∣∣∣∣r∑

k=1

δjk(hTF−1

)jk

∣∣∣∣∣ ≥ |∆|}.

The elements{s

(t)j1, s

(t)j2, . . . , s

(t)jr−1

}move to their respective opposing boundaries, and

the step length for s(t)jris computed as

θ =−∆−

∑r−1k=1 δjk

(hTF−1

)jk

(hTF−1)jr.

Therefore, the optimal solution to (3.20), and thus (3.17), is given by

w(t+1) = F−1

(s(t) +

r−1∑k=1

δjkejk + θejr

),

where ejk denotes an n-vector of zeros with a one in the jkth position. If w(t+1) also

satisfies dual feasibility (i.e., XTw(t+1) = 0p), then w(t+1) is the optimal solution to

(3.16). Otherwise, the jrth constraint in (3.18) is replaced by (3.19), and a new

g− ≤ hTw ≤ g+ is selected from among the (p− 1) constraints removed from (3.16)

that is not satisfied by w(t+1). Robers and Ben-Israel [57] recommend choosing the

constraint which w(t+1) violates by the greatest amount. Let t = t + 1, and begin

the next iteration.

The following example uses the Cars93 sample to demonstrate GILP for q =

1/5.

Example 6 Recall the sample extracted from the Cars93 data set [19]

y =(

7. 4, 10. 1, 11. 3, 15. 9, 19. 9, 14, 20. 2, 20. 9)T

X =

1 1 1 1 1 1 1 1

63 127 96 105 145 115 140 190

T ,

3-12

where y is the vector of mean retail prices, and X2 is the vector of horsepower ratings

for all vehicle models sold by Ford Motor Company in 1993. Use GILP to solve (1.8)

for the first conditional quintile (q = 1/5). For t = 1, let

d− =

(−4

5

)18,

F = I8,

d+ =

(1

5

)18,

g− = g+ = 0,

h = X1 = 18.

Clearly, F−1 = I8, s(1) = Fw = w, and the optimal solution to (3.23) is s(1) =

(1/5)18, but hTF−1s(1) = 8/5 6= 0. Therefore,

∆ = g+ − hTF−1s(1) = −8

5,

hTF−1 = 1T8 ,

yTF−1 = yT ,

γj =yTj

(1T8 )j= yj.

Since y > 0, it follows that γj ≥ 0 for all 1 ≤ j ≤ 8, so

Q = {1, 2, 3, 6, 4, 5, 7, 8} ,

δjk ={d−jk − s

(1)jk, ∀jk ∈ Q

}= {−1,−1,−1,−1,−1,−1,−1,−1} .

Since∑2

k=1 δjk(hTF−1

)jk

= −2 < −8/5, the index of the element that does not take

the full −1 step is jr = j2 = 2. Thus,

θ =∆− δj1

(hTF−1

)j1

(hTF−1)j2= −3

5

3-13

and

w(2) = F−1(s(1) + δj1ej1 + θej2

)= (−0.8,−0.4, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2)T .

Because XTw(2) = (0, 57)T , dual feasibility is not satisfied. The vector 1T8 replaces

the row (0, 1, 0, 0, 0, 0, 0, 0) in F, d−2 = d+2 = 0, and h = X2. For t = 2,

∆ = −57,

hTF−1 = (−64, 127,−31,−22, 18,−12, 13, 63) ,

yTF−1 = (−2.7, 10.1, 1.2, 5.8, 9.8, 3.9, 10.1, 10.8) ,

γj = {0.0422, 0.0795,−0.0387,−0.2636, 0.5444,−0.325, 0.7769, 0.1714} ,

Q = {1, 2, 8, 5, 7} ,

δjk = {1, 0,−1,−1,−1} ,

δj1(hTF−1

)1

= −64 < −57,

θ =57

64,

w(3) = F−1(s(2) + θej1

),

= (0.0906,−1.2906, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2)T ,

and w(3)2 is not dual feasible. Let XT

2 replace (1, 0, 0, 0, 0, 0, 0, 0) in F, d−1 = d+1 = 0,

g− = −4/5, g+ = 1/5,and h = (0, 1, 0, 0, 0, 0, 0, 0)T . The algorithm continues until

t = 6, where

w(7) = (−0.1528,−0.8, 0.2, 0.2, 0.2, 0.2, 0.2,−0.0472)T .

Since XTw(7) = 0p and w(7) ∈ [−0.8, 0.2]8, it is a dual feasible solution. Fur-

thermore, since each of the pair(w

(7)1 , w

(7)8

)lies strictly inside the open interval

(−0.8, 0.2), then by (2.2), w(7) is the optimal solution to (3.16).

3-14

GILP is distinguished from the dual simplex algorithm, as well as its long-

step variant, because it operates exclusively in the dual space of the QRMEP. It

pivots among infeasible dual solutions until arriving at the optimal basis, rather than

switching to primal space and pivoting among primal feasible solutions.

3.3 Long-Step Dual Simplex (LSDS) Method

Kostina [39] proposed a long-step variant of the dual simplex algorithm to

solve general maximization problems with bounded variables. This research extends

the long-step dual simplex (LSDS) method to a specific class of bounded variable

problems: the QRMEP. The following is a detailed description of how the step size

selection procedure is modified in the context of the QRMEP.

Modifying the step size selection to take longer steps is analogous to how the

Barrodale-Roberts algorithm operates in the primal space. As with the short-step

dual simplex method, let the triplet λ = (b,u,v) denote a feasible solution to

(1.4), where r = y − Xb = u − v. Define search directions and a nonnegative

step size such that an improved feasible solution is identified by the triplet λ (σ) =

(b (σ) ,u (σ) ,v (σ)), where b (σ) = b+σ∆b, u (σ) = u+σ∆u, and v (σ) = v+σ∆v.

The equation φ (λ) = 0Tp b + q1Tuu + (1− q)1Tv v represents the objective function

value generated by the solution λ. Since(XTw

)T= 0Tp , the improved objective

function can be written as

φ (λ (σ)) = φ (λ+ σ∆λ) (3.25)

= φ (λ) + σ(0Tp ∆b+ q1Tu∆u+ (1− q)1Tv ∆v

)= φ (λ) + σ

(wTX∆b+ q1Tu∆u+ (1− q)1Tv ∆v

)= φ (λ) + σ

(q1Tu∆u+ (1− q)1Tv ∆v −∆rTw

).

3-15

For either r or r (σ), its jth element is in exactly one of two possible states. That is,

either rj ≥ 0 or rj ≤ 0, and either rj (σ) ≥ 0 or rj (σ) ≤ 0. Therefore, the following

four cases are possible.

1. Let rj ≥ 0 and rj (σ) ≥ 0. Then,

uj = rj, uj (σ) = uj + σ∆uj = rj (σ)

vj = 0, vj (σ) = vj + σ∆vj = 0.

If wj ∈ wb in the current iteration and remains basic in the next iteration,

then rj = ∆rj = 0 and the jth element does not decrease the objective func-

tion value. Applying these substitutions to the jth element of the improved

objective function yields

σ (q∆uj + (1− q) ∆vj −∆rjwj) = σ (q∆rj −∆rjwj)

= σ∆rj (q − wj)

φ (λ (σ)) = qrj + σ∆rj (q − wj)

= (q − wj) rj (σ) + rjwj

= qrj + (q − wj) rj (σ)− (q − wj) rj.

If rj = 0 and rj (σ) > 0, then j = ik, w(ik)b > q, and wik leaves the basis

to become nonbasic at the upper bound. Since σ ≥ 0 and ∆rj > 0, namely

∆rj = 1, it follows that (q − wj) < 0 and the objective function value decreases.

If rj > 0 and rj (σ) = 0, then wj enters the basis from being nonbasic at the

upper bound. That is, the jth residual is driven to zero. Since wj = q, it

follows that φ (λ (σ)) = qrj, so ∆rj can be either positive or negative. The

same result occurs if rj > 0 and rj (σ) > 0.

3-16

2. Let rj ≥ 0 and rj (σ) ≤ 0. Then,

uj = rj, uj (σ) = uj + σ∆uj = 0

vj = 0, vj (σ) = vj + σ∆vj = −rj (σ) .

Applying these substitutions to the jth element of the improved objective

function yields

σ (q∆uj + (1− q) ∆vj −∆rjwj) = −qrj + (q − 1) (rj + σ∆rj)− σ∆rjwj

= σ∆rj (q − 1− wj)− rj

φ (λ (σ)) = (q − 1) rj + σ∆rj (q − 1− wj)

= (q − 1− wj) rj (σ) + rjwj

= qrj + (q − 1− wj) rj (σ)− (q − wj) rj.

If rj = 0 and rj (σ) < 0, then j = ik, w(ik)b < (q − 1), and w(ik)

b leaves the basis

to become nonbasic at the lower bound. It follows that(q − 1− w(ik)

b

)> 0,

and the objective function value decreases only if ∆rik < 0, namely ∆rik = −1.

If rj > 0 and rj (σ) = 0, or if rj > 0 and rj (σ) < 0, then wj = q and

φ (λ (σ)) = (q − 1) rj − σ∆rj.

3. Let rj ≤ 0 and rj (σ) ≤ 0. Then,

uj = 0, uj (σ) = uj + σ∆uj = 0

vj = −rj, vj (σ) = vj + σ∆vj = −rj (σ) .

3-17


function yields

σ (q∆uj + (1− q) ∆vj −∆rjwj) = σ ((q − 1) ∆rj −∆rjwj)

= σ∆rj (q − 1− wj)

φ (λ (σ)) = (q − 1) rj + σ∆rj (q − 1− wj)

= (q − 1− wj) rj (σ) + rjwj

= (q − 1) rj + (q − 1− wj) rj (σ)− (q − 1− wj) rj.

If rj = 0 and rj (σ) < 0, then j = ik, w(ik)b < (q − 1), and w(ik)

b leaves the basis

to become nonbasic at the lower bound. It follows that(q − 1− w(ik)

b

)> 0,

and the objective function value decreases only if ∆rik < 0, namely ∆rik = −1.

If rj < 0 and rj (σ) = 0, or if rj < 0 and rj (σ) < 0, then wj = (q − 1),

φ (λ (σ)) = (q − 1) rj = (q − 1) (rj (σ)− σ∆rj), and either∆rj > 0 or∆rj < 0.

4. Let rj ≤ 0 and rj (σ) ≥ 0. Then,

uj = 0, uj (σ) = uj + σ∆uj = rj (σ)

vj = −rj, vj (σ) = vj + σ∆vj = 0.


function yields

σ (q∆uj + (1− q) ∆vj −∆rjwj) = q (rj + σ∆rj) + (1− q) rj − σ∆rjwj

= rj + σ∆rj (q − wj)

φ (λ (σ)) = qrj + σ∆rj (q − wj)

= (q − wj) rj (σ) + rjwj

= (q − 1) rj + (q − wj) rj (σ)− (q − 1− wj) rj.

3-18

If rj = 0 and rj (σ) > 0, then j = ik, w(ik)b > q, and w(ik)

b leaves the basis

to become nonbasic at the upper bound. Since σ ≥ 0 and ∆rj > 0, namely

∆rj = 1, it follows that(q − w(ik)

b

)< 0 and the objective function value

decreases. If rj < 0 and rj (σ) = 0, or if rj < 0 and rj (σ) > 0, then

wj = (q − 1) and φ (λ (σ)) = qrj + σ∆rj = (q − 1) rj + rj (σ).

By summing the results from the four cases, (3.25) can be rewritten as

φ (λ (σ)) =∑

rj(σ)≥0

(qrj + σ∆rj (q − wj)) +∑

rj(σ)≤0

((q − 1) rj + σ∆rj (q − 1− wj))

= φ (λ) + σ

∑rj≥0,rj(σ)≥0

∆rj (q − wj) +∑

rj≤0,rj(σ)≤0

∆rj (q − 1− wj)

+

∑rj≤0,rj(σ)≥0

((q − wj) rj (σ)− (q − 1− wj) rj)

+∑

rj≥0,rj(σ)≤0

((q − 1− wj) rj (σ)− (q − wj) rj) . (3.26)

The goal is to choose a σ ≥ 0 such that the amount by which φ (λ (σ)) decreases

is maximized. Whenever a basic variable leaves the basis, the value of φ (λ (σ)) is

guaranteed to decrease according to the rate [39]

dφ (λ (σ))

dσ=

∑rj=0,rj(σ)>0

∆rj (q − wj) +∑

rj=0,rj(σ)<0

∆rj (q − 1− wj) ,

which is equal to the magnitude of infeasibility of the jth dual variable. Since the

value of φ (λ (σ)) does not decrease whenever a nonbasic variable enters the basis,

additional reductions occur when rj and ∆rj have different signs, which can only be

guaranteed when rj and rj (σ) change signs. If wj = (q − 1), then let ∆wj denote

the amount by which wj changes for the next iteration such that wj + ∆wj = q, so

∆wj = 1. Conversely, when wj = q, then the value of φ (λ (σ)) is guaranteed to

decrease if wj + ∆wj = (q − 1), so ∆wj = −1. This result can be generalized in

3-19

primal space by the amount

dφ (λ (σ))

dσ= −

∑∆rj 6=0

|∆rj| .

The equation (3.26) and the analyses of the four cases lead to the following modifi-

cation of Step 5 of the dual simplex algorithm:

σj =

−rj/∆rj, if rj∆rj < 0

0, if rj = 0,∆rj < 0, wj > q

0, if rj = 0,∆rj > 0, wj < (q − 1)

∞, o/w

. (3.27)

Unlike Step 5 of the standard algorithm, (3.27) computes finite step lengths for all

residuals with a nonzero search direction, and ∆rj = 0 only when rj = rj (σ) = 0.

The set of all σj is then sorted in ascending order such that σ(i)j ≤ σ

(i+1)h , and j 6= h.

Rather than selecting the maximum σ(i)j , the maximum reduction in φ (λ (σ)) is

achieved by computing the net distance remaining after w(i)j has traveled to its

opposite boundary [39],

κi = κi−1 +∣∣∣∆r(i)

j

∣∣∣ ,where κ0 =

(w

(ik)b − q + 1

)if w(ik)

b < (q − 1), and κ0 =(q − w(ik)

b

)if w(ik)

b > q.

Notice that κ0 < 0 in either case. As with GILP, the elements w(i)j move to their

opposite boundary, one at a time, until the cumulative sum of distances∑∣∣∣∆r(i)

j

∣∣∣meets or exceeds the amount κ0 by which w

(ik)b violates dual feasibility. Thus, the

σ(i)j corresponding to the first nonnegative κi is selected to be the step length. That

is,

σ(s)j =

{σ

(i)j : κi ≥ 0, κi−1 < 0

},

3-20

and w(s)j enters the basis. The vector of model coeffi cients and the design matrix

partitions B and N are updated as before in the short-step dual simplex algorithm,

and the next iteration begins.

3.4 The QRMEP as an Integer Program

Simplex algorithms are designed to solve LPs possessing nonnegative vari-

ables [4], but the dual variables in (1.8) are unrestricted in sign. A nonnegativity

transformation can be applied, where w ∈ Rn is rewritten as the difference between

two nonnegative variables, w = w+ −w−. The bounds on w can also be rewritten

as

w− ≤ (q − 1)1n +w+ (3.28)

and

w+ ≤ q1n +w−. (3.29)

Since w+ and w− are both nonnegative by definition, w− must satisfy (3.28), even

when w+ is at its minimum value. Similarly, w+ must satisfy (3.29), even if w−

is at its minimum. This reasoning leads to upper bounds on the n-vectors w+ and

w−, respectively:

w+ ≤ q1n

w− ≤ (1− q)1n.

Thus, substituting w = w+ −w− into (1.8) transforms the dual LP into

maxw+,w−≥0n

yTw+ − yTw− (3.30)

3-21

subject to

XTw+ = XTw−

w+ ≤ q1n

w− ≤ (1− q)1n.

At any extreme point in (3.30), a nonbasic w+j is fixed at exactly one of its bounds

(0 or q). Likewise, any nonbasic w−j is fixed at either 0 or (1− q).

Another transformation must be applied in order to express the QRMEP as

an integer program. Let zu and zv be n-vectors such that

z(j)u =

0 ≤ z(j)u ≤ 1, if w+

j 6= 0

0, otherwise

and

z(j)v =

0 ≤ z(j)v ≤ 1, if w−j 6= 0

0, otherwise

,which yields

w+ = qzu

and

w− = (1− q) zv.

In other words, w+ and w− can be expressed as convex combinations of zu and zv.

Additionally, the sum of all component vectors must equal the unit vector, so let

w+ = Tuzu and w− = Tvzv be the affi ne transformations [5] of the (n× 1) column

vectors zu and zv, respectively, where Tu = qIn, Tv = (1− q) In, and zu + zv = 1n.

Applying these substitutions into (3.30) yields

maxzu,zv∈[0,1]n

qyTzu + (1− q)yTzv (3.31)

3-22

subject to

qXTzu = (1− q)XTzv

zu + zv = 1n.

It can be shown that redefining the component vectors zv and zu, rewriting the

boundary condition on (2.6), and applying the cardinality range property produces

a suboptimal formulation of the QRMEP; specifically, a variant of the generalized

assignment problem.

3.4.1 The Bounded Interval Generalized Assignment Problem. Cattrysse

and Van Wassenhove [15] described the generalized assignment problem (GAP) as a

cost minimizing assignment of n jobs to m workers. It can be equivalently described

as a value maximizing assignment of n jobs to m workers, as in [21]. Each job

must be assigned to exactly one worker, so let zij be a binary variable where zij = 1

indicates the jth job being assigned to the ith worker and zij = 0 otherwise. Let the

value of having the ith worker do the jth job be denoted by cij. Each worker has

capacity restrictions such that a single worker can only take on a linited number of

jobs, so let di denote the work capacity for the ith worker. The amount of resources

consumed when the jth job is performed by the ith worker is identified by aij. The

GAP [21] therefore assumes the form

maxzij∈{0,1}

m∑i=1

n∑j=1

cijzij (3.32)

s.t.

n∑j=1

aijzij ≤ di, for 1 ≤ i ≤ m

m∑i=1

zij = 1, for 1 ≤ j ≤ n

3-23

The dual LP (1.8) can be used to approximate (3.32). Just as the optimal basis

is a unique p-subset, so are the optimal sets of positive and negative residuals. That

is, for any optimal solution, the resulting combination of nonbasic dual variables

constitutes a unique assignment. Consider each observation to be a job, and the

location of each observation relative to the regression hyperplane (above/below) to

be a worker. As with jobs in (3.32), each nonbasic observation in the QRMEP is

assigned to exactly one of two locations: above or below the regression hyperplane.

The inequalities (2.3) and (2.4) are equivalent to the work capacity restrictions in

(3.32). Let the response vector y be the vector of value coeffi cients. The residual

locations are asymmetrically weighted, so the weights q and (q − 1) are also applied

to the objective function. The amount of resource consumed by assigning the jth

observation to the ith location is unity, or aij = 1, for all i and j. Redefine zv from

(3.31) to be a binary n-vector, where z(j)v = 1 when the jth observation is assigned

below the hyperplane, and zero otherwise. Redefine zu to be a binary n-vector,

where z(j)u = 1 when the jth observation is assigned above the hyperplane, and zero

otherwise. A simple GAP formulation of the QRMEP therefore takes the form

maxzv ,zu∈{0,1}n

(q − 1)yTzv + qyTzu (3.33)

subject to

zv + zu = 1n

qn− p < zTv 1n < qn

(1− q)n− p < zTu1n < (1− q)n.

Notice that each location also possesses a lower bound, which is not necessarily

zero, on the number of nonbasic observations assigned to it. These result from the

assumption that the quantile regression model contains an intercept [32]. Thus,

3-24

(3.33) is in the form of a bounded interval generalized assignment problem (BIGAP)

[58].

There are at least two issues with (3.33): the absence of the design matrixX in

the constraints and the requirement that all observations be assigned (zv + zu = 1n).

Without accounting for the independent variables, the solution to (3.33) is simply the

unconditional quantile (sample quantile) of the response vector y. The structure

of (1.8) must be examined such that additional constraints containing the design

matrix X may be added to (3.33). The basic observations constitute a p-subset

which defines the regression hyperplane, so z(j)v = z

(j)u = 0 must hold for any basic

observation. Therefore, zv+zu = 1n must be removed from (3.33) and replaced with

a constraint which guarantees that all nonbasic observations are assigned. Under

the assumption that a nondegenerate solution to the QRMEP exists, exactly (n− p)

nonbasic observations must be assigned, so (3.33) becomes

maxzv ,zu∈{0,1}n

(q − 1)yTzv + qyTzu (3.34)

subject to


(1− q)n− p < zTu1n < (1− q)n

zTv 1n + zTu1n = n− p.

The optimal assignment from (3.34) does not correspond to the optimal solu-

tion to (1.8) because the optimal assignment assumes all basic variables are zero,

which is why the BIGAP is called a suboptimal formulation of the QRMEP. The so-

lution to the BIGAP can be improved by rewriting the boundary condition on (2.6).

Consider the QRMEP optimality condition (q − 1)1p < wb < q1p and express wb in

terms of the nonbasic observations. Multiplying through by the (p× p) basis matrix

3-25

B leads to

(q − 1)BT1p < (1− q)NTv 1v − qNT

u1u < qBT1p. (3.35)

Using the assignment vectors zv, zu and the design matrix, (3.35) can be represented

equivalently by two inequalities:

(q − 1)XT (1n − zv − zu) < (1− q)XTzv − qXTzu

XTzu < (1− q)XT1n (3.36)

and

(1− q)XTzv − qXTzu < qXT (1n − zv − zu)

XTzv < qXT1n. (3.37)

Adding (3.36) and (3.37) to the constraint set in (3.34) further reduces the number

of feasible bases, and the BIGAP form of the QRMEP is

maxzv ,zu∈{0,1}n

(q − 1)yTzv + qyTzu (3.38)

subject to


(1− q)n− p < zTu1n < (1− q)n

XTzv < qXT1n

XTzu < (1− q)XT1n

zTv 1n + zTu1n = n− p.

3.4.2 The Bounded Interval Knapsack Problem. If (3.38) is relaxed by

removing the requirement that the sum of assignments equal (n− p), then the result

3-26

is the bounded interval knapsack problem (BIKP),

maxzv ,zu∈{0,1}n

(q − 1)yTzv + qyTzu (3.39)

subject to


(1− q)n− p < zTu1n < (1− q)n

XTzv < qXT1n

XTzu < (1− q)XT1n.

For the same reason as (3.38), (3.39) is also considered a suboptimal formulation of

the QRMEP. The BIGAP and BIKP solutions may, however, be useful for providing

starting solutions to other exact algorithms, such as GILP or the LSDS method.

Each of the extensions presented in Sections 3.2 and 3.3 uses the vector w = q1n

as its initial solution, which is relatively close to optimality for small q and n. As

the problem size and/or the target quantile increases, reduced run times can be

achieved with pivoting algorithms through starting solutions which are much closer

to optimality. Since an initial basis can be derived from the solution to the BIKP,

the optimal assignment from (3.39) can be applied in conjunction with the exact-

fit property to generate an initial solution that is much closer to optimality than

w = q1n. The optimal assignment from (3.38), on the other hand, produces an

even closer solution because of the added requirement that exactly (n− p) nonbasic

observations be assigned.

3.5 Summary

This chapter presented detailed extensions to the class of QRMEPs: GILP

and the LSDS method. An application of the bounded simplex algorithm to the

QRMEP was also attempted, along with a demonstration of its failure to solve (1.8).

3-27

Finally, by reconceptualizing the QRMEP as an integer program, the suboptimal

formulations BIGAP and BIKP were developed. The next chapter presents practical

implementations of GILP and the LSDS method, followed by comparative analyses

of their computational performance against two baseline algorithms, an interior-

point method and the dual simplex method, both of which are implemented in a

commercially available programming environment.

3-28

IV. Implementation, Testing, and Numerical Results

This chapter compares the practical performances of GILP and the LSDS algorithm

against an interior-point method and the dual simplex method. An interior-point

method and the dual simplex algorithm were chosen as baselines against which the

extensions of GILP and the LSDS method were measured. Interior-point methods

represent the computational state-of-the-art for solving LPs, particularly for large-

scale problems, while simplex algorithms have been shown to perform best for small

to moderately sized problems [55]. Neither GILP nor the LSDS method, however,

has been evaluated against other algorithms, simplex or interior-point. Therefore,

in this research, it was deemed necessary to include both a simplex and an interior-

point method as part of the evaluation of the extensions of GILP and the LSDS

method. While typically not converging as quickly as interior-point methods, dual

simplex is also a preferred method for solving bounded LPs [39], and its most valuable

feature is that it, like all simplex methods, guarantees exact solutions [18] when the

nondegeneracy assumption holds. The LSDS method, being a long-step variant of

the dual simplex method, also possesses this characteristic [39]. GILP is a non-

simplex pivoting algorithm, yet it is shown in [57] to yield exact solutions as well.

4.1 Implementation

All experimentation with these four methods was conducted in the MATLAB

environment. MATLAB was chosen mainly for its programming simplicity, since any

user-generated MATLAB code implementing a LP algorithm often looks quite similar

to the theoretical linear algebra from which it was derived. Consequently, the time

required to develop such code is significantly shorter than that of other programming

environments [63]. MATLAB has a built-in LP solver, called linprog. The default

algorithm that linprog employs to solve a LP is an interior-point method, specifically

the predictor-corrector variant of the primal-dual path following method, a variant

4-1

developed by Mehrotra [50] and later extended to the QRMEP class of problems by

Portnoy and Koenker [55]. The dual simplex algorithm is also available as an option

within the linprog function, thus allowing for experimentation with the two methods

best suited for solving large-scale linear programs [39].

4.2 Testing

All test data was derived from the Cars93 data set referenced in [19]. This

data set was compiled by Lock [41] and contains information on 93 vehicles for sale

in the US for the 1993 model year. A subset of the 26 variables in the data set

were retained for experimentation. This subset consisted of 16 continuous variables

in Cars93, such as pricing, fuel effi ciency, horsepower ratings, and engine size. The

remainder were discarded because they contained either discrete or categorical data,

or the variables were correlated with one or more variables included in the subset.

The mean sale price was selected as the response variable for all experiments. Chen’s

experiments in [18] provided guidance for the testing in this research. The algorithms

were tested across a broad range of quantiles: q = {0.05, 0.25, 0.5, 0.75, 0.95}. The

algorithms were tested also for three different model sizes: p = {3, 8, 15}. To obtain

the various sample sizes for each test, uniform random samples were extracted from

the data set. Sample sizes ranged from n = 50 to n = 850, in increments of 10

observations. For each triplet (n, p, q), the experiment was replicated 100 times.

The mean run time was computed for each triplet to assess the effi ciency of each

algorithm. The average number of iterations required was computed across all n for

each pair (p, q).

4.3 Numerical Results

The mean run times for each algorithm were plotted for each pair (p, q), re-

sulting in 15 total graphs, where each graph contains a performance curve for each

of three methods: the LSDS extension, the dual simplex algorithm, and the interior-

4-2

Figure 4.1 p = 3. q = 0.05. LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).

point method. GILP is evaluated separately because, once n > 50, it exhibited much

longer run times than the other three algorithms. Instead, the independent perfor-

mance of GILP was measured across the quantiles by plotting its mean run times

for each value of p. Therefore, each GILP graph contains one performance curve for

each quantile, resulting in three total plots. The average number of iterations for

each pair (p, q) were tabulated for comparison.

Figures 4.1-4.5 show the graphical results for the small model (p = 3).

Clearly, the location of the crossover point is dependent on the value of q;

that is, as the target quantile increases, the problem size n for which a baseline

method gains computational dominance decreases. Details on the crossover points

corresponding to Figures 4.1 - 4.5 are given in Table 4.1. For this research, a

crossover point (CP) was identified as the smallest problem size n at which the

mean run time (MRT) for a baseline method (dual simplex or interior-point) is less

than or equal to the MRT for the LSDS method. The run time range is given in

the form of a closed interval, where the lower bound is the minimum run time of 100

replications and the upper bound is the maximum run time of 100 replications. The

4-3

Figure 4.2 (p, q) = (3, 0.25). LSDS (solid), Dual Simplex (dashed), Interior-Point(dotted).


4-4



4-5

Table 4.1 Crossover Point Data for p = 3LSDS Interior-Point

q CP MRT Range MRT Range0.05 650 0.02039 [0.01816, 0.02477] 0.01933 [0.01853, 0.02413]0.25 510 0.01791 [0.01525, 0.02197] 0.01703 [0.01607, 0.01929]0.5 470 0.01506 [0.01279, 0.02100] 0.01483 [0.01432, 0.01432]0.75 440 0.01595 [0.01379, 0.01937] 0.01496 [0.01463, 0.01785]0.95 560 0.02295 [0.02007, 0.03188] 0.02293 [0.02169, 0.02880]

LSDS Dual Simplexq CP MRT Range MRT Range0.05 780 0.02715 [0.02516, 0.03410] 0.02666 [0.02590, 0.03771]0.25 650 0.02619 [0.02228, 0.03214] 0.02617 [0.02539, 0.02888]0.5 590 0.02814 [0.02499, 0.03567] 0.02594 [0.02518, 0.02927]0.75 570 0.02637 [0.02280, 0.03182] 0.02548 [0.02488, 0.02934]0.95 600 0.02782 [0.02298, 0.04019] 0.02699 [0.02553, 0.03801]

data in Table 4.1 also shows that the crossover point for the dual simplex method

was consistently higher than for the interior-point method, which was expected. The

crossover points are highest when q is lowest, and this is a consistent trend for all

model sizes. When the model size is increased to p = 8, a commensurate decrease

in the crossover point is observed in Figures 4.6 - 4.10. This trend continued for

p = 15, as evidenced by Figures 4.11 - 4.15. The crossover point data for p = 8

and p = 15 are summarized in Tables 4.2 and 4.3, respectively. The variances for

each set of 100 replications were also computed. Variances were very small for all

algorithms tested, with GILP exhibiting the largest variances (≤ 0.00281). Among

the other three methods, however, the variances were found to be less than 10−5,

confirming that the run times were consistent for each algorithm.

Figures 4.16 - 4.18 show the performance of GILP for each value of p. In each

of these figures, the run times for q = 0.05 are noticeably lower than those for the

other quantiles tested. This was expected as a result of the all positive slack initial

solution (i.e., w(0) = q1n) used in both GILP and the LSDS algorithm. Depending

on the size of the problem, w(0) = q1n is close to optimality for very low quantiles,

and the cardinality range property can be used to compute an upper bound on the

4-6



4-7



4-8


Figure 4.11 (p, q) = (15, 0.05). LSDS (solid), Dual Simplex (dashed), Interior-Point (dotted).

4-9



4-10



4-11







4-12

Figure 4.16 GILP for p = 3. 5th Quantile (magenta), 25th Quantile (blue), 50thQuantile (black), 75th Quantile (green), 95th Quantile (red).


4-13


quantiles that can be considered “very low”(q < 1

2n

). It follows,when q = 1/20,

that GILP converges in less time than the higher quantiles because fewer pivots are

required to reach the optimal dual basis.

Now consider the average number of iterations each method requires to solve

the QRMEP, which are given in Table 4.4. Although the interior-point method

clearly performed best in terms of converging in the fewest number of iterations, the

performance differences among the pivoting algorithms are also interesting. GILP

and the LSDS method are competitive with both the dual simplex algorithm and the

interior-point method when the model size is small. As p increases, however, the dual

simplex algorithm appears to be superior, among the pivoting algorithms, in terms of

solving the QRMEP in the fewest iterations possible. The LSDS method converges,

on average, in fewer iterations than the dual simplex algorithm when q = 0.05

because fewer long steps in primal space are necessary. These advantages disappear,

however, as the model size increases because GILP and the LSDS algorithm must

estimate the entire dual basis through pivoting operations, whereas the dual simplex

4-14

Table 4.4 Average Number of Iterationsp q GILP LSDS Dual Simplex Interior-Point3 0.05 9 8 8 113 0.25 10 11 10 113 0.5 10 12 11 103 0.75 11 12 12 113 0.95 12 12 9 158 0.05 22 17 21 128 0.25 25 17 23 128 0.5 27 27 25 108 0.75 25 30 22 118 0.95 31 31 27 1115 0.05 42 41 31 1115 0.25 46 55 37 1115 0.5 49 59 41 1115 0.75 49 61 39 1115 0.95 51 61 42 11

algorithm employs preprocessing to obtain a better starting solution before pivoting

operations begin.

Each decomposed problem in GILP generates an (n+ 1)-dimensional polytope,

which is obviously a lower dimension than the polytope in (1.8). It follows that the

total number of vertices for each decomposed problem in GILP is also less than that

of (1.8), implying that GILP should converge to optimality in fewer iterations than

dual simplex. As with the Barrodale-Roberts algorithm, the LSDS method does

not pivot among adjacent vertices as do classic simplex methods. The implication is

that because LSDS can “skip”adjacent vertices, it should also converge to optimality

in fewer iterations than dual simplex.

The differences between the theoretical implications of the pivoting methods

and their respective practical performances can be traced to implementation. Ad-

ditional procedures, such as preprocessing and steps for overcoming degeneracy, are

programmed into the interior-point method and dual simplex methods implemented

in MATLAB to reduce run times further. The GILP and LSDS implementations in

4-15

this research, on the other hand, are simple in the sense that they are coded exactly

according to their theoretical descriptions in Chapter 3. That is, nondegeneracy is

assumed and no preproccessing is conducted to reduce the size of the polytope and

gain improvements in run times.

4.4 Preprocessing

The goal of preprocessing in LP is to reduce the dimensions of the problem, thus

allowing the chosen algorithm to converge faster. A standard preprocessing strategy

is presented in [9], but Portnoy and Koenker [55] insist that the special structure

of the QRMEP is not conducive to such a standard approach. In response, they

propose an alternative strategy designed exclusively for the QRMEP and specifically

in conjunction with the predictor-corrector variant of the primal-dual path following

algorithm. The following is a brief description of this alternative strategy. Com-

prehensive discussions on this method can be found in [55] and [38].

Consider the optimality condition from Theorem 1, which may be expressed as

the directional derivative of the objective function [38]. LettingR (b) =∑n

j=1 ρq (yj − xjb),

the derivative in direction dk is computed by

∇R (b,dk) = −n∑j=1

ψq

(yj − xjb, −xjdk

)xjdk,

where dk is the kth column of B−1, 1 ≤ k ≤ p, and

ψq

(yj − xjb, −xjdk

)=

q − I (rj < 0) , if rj 6= 0

q − I (−xjdk < 0) , if rj = 0

.

4-16

Since xjdk = 0 for j 6= k and xjdk = 1 for j = k, then ∇R (b,dk) can be written in

dual space as

∇R (b,dk) =

wj + 1− q, if wj ∈ [q − 1, 0)

q − wj, if wj ∈ (0, q]

.Therefore, a solution is optimal if ∇R (b,dk) ≥ 0 for all directions k = 1, . . . , p.

This alternative statement of optimality provides a means by which the number

of constraints in (1.8) may be reduced. Preprocessing seeks to identify a subset

of observations which are guaranteed to lie above/below the optimal regression hy-

perplane. Once identified, these observations are globbed [55], thus reducing the

dimensionality of (1.8). Let JL, JH be indexing subsets of observations whose dual

variables are anticipated to be nonbasic at (q − 1) and q, respectively. The objective

function can therefore be rewritten such that

minb∈Rp

∑j∈S\(JL∪JH)

ρq (yj − xjb) + (1− q) (yL − xLb) + q (yH − xHb) ,

where S is the indexing set of all observations, xL =∑

j∈JL xj, and xH =∑

j∈JH xj.

Note that yL must be made small enough and yH made large enough to guaran-

tee that the residuals in these globs are negative and positive, respectively. The

challenge is determining which observations are included in JL and JH . Let M

be a subsample of m observations. Obtain an initial estimate of b by solving the

QRMEP for the subsample M only, and compute a confidence interval around the

solution. Compute a confidence band of the form [XbL,XbU ], where bL is the lower

confidence estimate of b and bU is the upper confidence estimate of b. Provided the

value ofm is appropriately chosen, the setM contains the indices of the observations

falling inside the confidence band. Therefore, JL and JH should contain the indices

of the observations falling outside the confidence band, and two globbed observa-

tions, (yL, xL) and (yH , xH), are constructed. A new estimate of b is then obtained

4-17

by solving the globbed LP, which now consists of (m+ 2) observations. If the signs

of the residuals of the observations in the globs match their assignment to JL or JH ,

then the procedure terminates and returns the optimal solution. Otherwise, the

procedure is repeated after adjusting the composition of the globs and updating M .

Portnoy and Koenker [55] show significant run time improvements over the

Barrodale-Roberts algorithm, at least 10 times better, when preprocessing is applied

to interior-point methods. The improvement is even greater for large problems, since

the run time curves for the BR algorithm in [55] increase quadratically, much like

the run time curves for GILP and LSDS in this research. Chen’s results in [18] also

demonstrated marked run time improvement with preprocessing. Time limitations

unfortunately prevented the application of preprocessing to either GILP or LSDS in

this research.

4.5 Summary

The experimental results in this chapter demonstrate the computational ad-

vantage that the LSDS method has over interior-point methods and the dual simplex

method, for models and problems up to a certain size. The location of the crossover

point, the point at which either an interior-point algorithm or the dual simplex

method gains computational dominance, was also shown to be dependent on the

target quantile. GILP, on the other hand, only exhibited faster average run times

than the baseline algorithms for very small problems. However, GILP was shown

to converge in fewer iterations than the LSDS algorithm in most cases, particularly

as the size of the model increased. With a view to increase the crossover point

locations by improving the respective computational performances of GILP and the

LSDS method, the details of a preprocessing strategy designed specifically for the

QRMEP [55] were also presented. The next chapter summarizes the efforts in this

research, states how this research contributes to the field of Operations Research,

and suggests topics for future research into the QRMEP.

4-18

V. Conclusion

Quantile regression is becoming increasingly popular as an alternative to least squares

for describing the conditional distribution of a response variable. Rather than using

a single conditional mean model to make inferences about a distribution, researchers

can use one or more conditional quantile models and provide a more complete picture

of the same distribution. The first two hundred years of research into LAD models

saw limited progress [11], but the advent of LP in the late 1940s paved the way

for Koenker and Bassett [32] to define the regression quantile. Since 1978, quantile

regression has been employed extensively in econometrics, and it is quickly becoming

a popular application in the finance, medical, and environmental industries [17].

Simplex algorithms and interior-point methods are supported by advanced

computing power and have become the standard methods by which the QRMEP

is solved, but the speed of these algorithms is negatively affected when large-scale

problems are encountered, so many researchers avoid the headache and abandon

quantile regression in favor of OLS. Because of the special structure of the QRMEP,

the dual simplex method is the standard simplex algorithm best suited for solving it.

More effi cient pivoting algorithms which leverage the unique properties and exploit

the special structure of the QRMEP have been developed, namely the Barrodale-

Roberts and Koenker-d’Orey methods. Interior-point methods, on the other hand,

stand as the most conputationally effi cient QRMEP solution techniques by far, par-

ticularly in terms of run time and iterations required.

Affi ne scaling, primal path following, and primal-dual path following algo-

rithms have been developed specifically for the QRMEP, but it is the predictor-

corrector variant of the primal-dual path following algorithm that is most popular

with commercial solvers. Experimentation has shown that pivoting methods are

preferred when the problem is small to moderately sized [18], with the added bonus

that they guarantee exact solutions.

5-1

This research was interested in finding alternative means of solving the QRMEP

such that large problems can be solved with run times either comparable or supe-

rior to those of dual simplex and interior-point methods. Two alternative pivoting

algorithms were explored in detail: GILP and the LSDS method. I-LP was first

developed by Robers and Ben-Israel [56] to solve the l1-approximation (conditional

median) problem. Extending the algorithm to any q ∈ (0, 1) required few modifi-

cations to the overall method, but it exhibited the slowest run times compared to

the dual simplex, interior-point, and LSDS methods. It is, however, distinct from

other pivoting methods because it operates exclusively in dual space by pivoting

between dual infeasible solutions. The LSDS method was developed for general

bounded-variable maximization LPs with equality constraints [39], so extending it

to the quantile regression model class of problems was straightforward. Unlike

GILP, the LSDS algorithm operates in the primal space of the QRMEP and skips

over adjacent vertices by taking longer steps. Since search directions and step

lengths are computed at each iteration, LSDS can also be considered a line search

method. For small sample sizes, the LSDS algorithm was the fastest performing

method tested. On average, its run times were around 1 to 2 times faster than the

interior-point method, 2 to 4 times faster than the dual simplex algorithm, and 8

to 12 times faster than GILP. The dual simplex and interior-point methods, how-

ever, quickly outpaced the LSDS algorithm in run time as sample sizes increased.

The dual simplex and interior-point methods are two algorithms available in the

MATLAB environment, while the code for GILP and the LSDS algorithm must be

user-generated. The respective implementations of GILP and the LSDS method in

this research did not include a preprocessing strategy to reduce the dimensionality

of the problem. Both coding optimization and preprocessing are therefore necessary

to gain further run time improvements.

5-2

5.1 Contributions

This research contributes the following to the Operations Research field.

1. Extensions of Two Alternative Pivtoting Algorithms to the Class of Quantile

Regression Model Estimation Problems. The primary contributions of this re-

search are the extensions of two pivoting algorithms to the class of QRMEPs.

Interior-point methods are generally more effi cient computationally, but they

do not exhibit the same solution accuracy as that of simplex methods. GILP

and the LSDS algorithm possess two of the essential features identified in

Section 1.3: each produces exact solutions and solves the QRMEP for any

q ∈ (0, 1).

2. Development of Suboptimal Integer Programming Formulations of the Quantile

Regression Model Estimation Problem. Another theoretical contribution of

this research is expressing the QRMEP as an integer program. The QRMEP

is reconceptualized as a generalized assignment problem, and nonnegativity

and affi ne transformations are applied to (1.8). Applying the cardinality range

property to the affi ne form (3.31), followed by rewriting the boundary condition

(3.35), produces the BIGAP. The BIKP is easily formed by eliminating the

requirement from the BIGAP that exactly (n− p) nonbasic observations must

be assigned. Because neither formulation accounts for the values of the basic

variables, the BIGAP and BIKP were shown to be suboptimal for solving the

QRMEP. However, they may be employed to obtain starting solutions for

other algorithms in order to decrease processing times.

3. Algorithmic Implementation and Testing. The secondary contributions of this

research are the implementations of the GILP and LSDS algorithms, which

were tested on the Cars93 data set obtained from the literature, and the sub-

sequent experimental results. Uniform random samples were extracted from

Cars93 to produce QRMEPs with sample sizes ranging from n = 50 to n = 850

5-3

and model sizes from p = 3 to p = 15. The LSDS algorithm was shown to

converge to optimality faster, in terms of run time, than other simplex meth-

ods for problems and/or quantile regression models up to a certain size. The

LSDS method even performed well, for small problems and models, against

interior-point methods. However, the locations of the crossover points for the

LSDS method were lower than expected. GILP exhibited comparatively slow

run times, but it was shown in most cases to converge in fewer iterations than

the LSDS algorithm. Optimized coding coupled with preprocessing can reduce

the run times for GILP and the LSDS method, increase the crossover points,

and possibly produce run times to rival those of interior-point methods.

5.2 Future Research

The following sections suggest topics for future research into the QRMEP.

5.2.1 Preprocessing. The results from this research indicate the need for

preprocessing when implementing either GILP or LSDS. The procedure in [55] was

developed explicitly for the QRMEP, and it exploits the primal space properties of

(1.4). Since GILP operates exclusively in dual space, a preprocessing strategy that

takes advantage of the features in (1.8) may achieve reductions in the run times

of the algorithm such that it becomes competitive with simplex methods. The

LSDS algorithm, on the other hand, proceeds from (1.8) but conducts line searches

in primal space, so the preprocessing strategy in [55] follows naturally. However,

determining the effects of preprocessing on the computational effi ciency of the LSDS

method is also recommended.

5.2.2 Integer Programming Alternatives. This research has shown that the

QRMEP can be reconceptualized in the form of a generalized assignment problem.

Future research on the BIGAP formulation should focus primarily on resolving the

issues identified with (3.38). If the QRMEP is to be expressed as an integer pro-

5-4

gram successfully, whether as a BIGAP or a BIKP, then a means of accounting for

the values of the basic variables at optimality must be developed. On the other

hand, because the BIGAP and BIKP are suboptimal formulations of the QRMEP,

future research may also involve studying the effects of using either (3.38) or (3.39)

to generate starting solutions for GILP or the LSDS algorithm. It is recommended

that (3.38) or (3.39), together with preprocessing, be applied to the GILP and LSDS

extensions in order to determine if further improvements on run times can be ob-

tained. Developing a variant of the out-of-kilter method exclusively for the QRMEP

is also suggested.

5-5

Bibliography

1. I. Barrodale and F. Roberts, “An Improved Algorithm for Discrete l1 LinearApproximation”, SIAM Journal on Numerical Analysis, Vol. 10, No. 5, pp. 839-848, 1973.

2. G. Bassett, “A p-subset Property of L1 and Regression Quantile Estimates”,Computational Statistics & Data Analysis, Vol. 6, No. 3, pp. 297-304, 1988.

3. D. Batur and F. Choobineh, “A Quantile-Based Approach to System Selection”,European Journal of Operational Research, Vol. 202, No. 3, pp. 764-772, 2009.

4. M. Bazaraa, J. Jarvis and H. Sherali, Linear Programming and Network Flows,John Wiley & Sons, Inc., Hoboken, NJ, 2005.

5. M. Bazaraa, H. Sherali and C. Shetty, Nonlinear Programming: Theory andAlgorithms, John Wiley & Sons, Inc., Hoboken, NJ, 2006.

6. A. Ben-Israel and P. Robers, “A Decomposition Method for Interval LinearProgramming”, Management Science, Vol. 16, No. 5, pp.374-387, 1970.

7. D. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.

8. D. Bertsimas and J. Tsitsiklis, Introduction to Linear Optimization, Athena Sci-entific, Belmont, MA, 1997.

9. R. Bixby, J. Gregory, I. Lustig, R. Marsten and D. Shanno, “Very Large-ScaleLinear Programming: A Case Study in Combining Interior Point and SimplexMethods”, Operations Research, Vol. 40, No. 5, pp. 885-897, 1992.

10. M. Buchinsky, “The Dynamics of Changes in the Female Wage Distribution inthe USA: A Quantile Regression Approach”, Journal of Applied Econometrics,Vol. 13, No. 1, pp. 1-30, 1998.

11. M. Buchinsky, “Recent Advances in Quantile Regression Models: A PracticalGuideline for Empirical Research”, Journal of Human Resources, pp. 88-126,1998.

12. B. Cade, J. Terrell and R. Schroeder, “Estimating Effects of Limiting Factorswith Regression Quantiles”, Ecology, Vol. 80, No. 1, pp. 311-323, 1999.

13. B. Cade and B. Noon, “A Gentle Introduction to Quantile Regression for Ecol-ogists”, Frontiers in Ecology and the Environment, Vol. 1, No. 8, pp. 412-420,2003.

14. B. Cade, B. Noon and C. Flather, “Quantile Regression Reveals Hidden Biasand Uncertainty in Habitat Models”, Ecology, Vol. 86, No. 3, pp. 786-800, 2005.

BIB-1

15. D. Cattrysse and L. Van Wassenhove, “A Survey of Algorithms for the General-ized Assignment Problem”, European Journal of Operational Research, Vol. 60,No. 3, pp. 260-272, 1992.

16. K-H. Chang, “A Direct Search Method for Unconstrained Quantile-based Simu-lation Optimization”, European Journal of Operational Research, Vol. 246, No.2, pp. 487-495, 2015.

17. C. Chen and Y. Wei, “Computational Issues for Quantile Regression”, The In-dian Journal of Statistics, Vol. 67, No. 2, pp. 399-417, 2005.

18. C. Chen, “A Finite Smoothing Algorithm for Quantile Regression”, Journal ofComputational and Graphical Statistics, Vol. 16, No. 1, pp. 136-164, 2007.

19. C. Davino, M. Furno and D. Vistocco, Quantile Regression: Theory and Appli-cations, John Wiley & Sons, Inc., New York, NY, 2014.

20. E. Eide and M. Showalter, “The Effect of School Quality on Student Perfor-mance: A Quantile Regression Approach”, Economics Letters, Vol. 58, No. 3,pp. 345-350, 1998.

21. M. Fisher, R. Jaikumar and L. Van Wassenhove, “A Multiplier AdjustmentMethod for the Generalized Assignment Problem”, Management Science, Vol.32, No. 9, pp. 1095-1103, 1986.

22. D. Fulkerson, “An Out-of-Kilter Method for Minimal-Cost Flow Problems”,Journal of the Society for Industrial and Applied Mathematics, Vol. 9, No. 1,pp. 18-27, 1961.

23. J. Garcia, P. Hernandez and A. Lopez-Nicolas, “How Wide is the Gap? AnInvestigation of Gender Wage Differences Using Quantile Regression”, EmpiricalEconomics, Vol. 26, No. 1, pp. 149-167, 2001.

24. P. Gill, W. Murray, M. Saunders, J. Tomlin and M. Wright, “On Projected New-ton Barrier Methods for Linear Programming and an Equivalence to Karmarkar’sProjective Method”, Mathematical Programming, Vol. 36, No. 2, pp.183-209,1986.

25. C. Gutenbrunner and J. Jureckova, “Regression Rank Scores and RegressionQuantiles”, The Annals of Statistics, pp. 305-330, 1992.

26. C. Gutenbrunner, J. Jureckova, R. Koenker, and S. Portnoy, “Tests of Lin-ear Hypotheses Based on Regression Rank Scores”, Journal of NonparametricStatistics, Vol. 2, No. 4, pp. 307-331, 1993.

27. L. Hall and R. Vanderbei, “Two-thirds is Sharp for Affi ne Scaling”, OperationsResearch Letters, Vol. 13, No. 4, pp. 197-201, 1993.

28. L. Hao and D. Naiman, Quantile Regression, No. 149, Sage Publications, Inc.,Thousand Oaks, CA, 2007.

BIB-2

29. R. Jackson, P. Boggs, S. Nash and S. Powell, “Guidelines for Reporting Results ofComputational Experiments. Report of the Ad Hoc Committee”, MathematicalProgramming, Vol. 49, No. 1, pp. 413-425, 1990.

30. L. Jaeckel, “Estimating Regression Coeffi cients by Minimizing the Dispersionof the Residuals”, The Annals of Mathematical Statistics, Vol. 43, No. 5, pp.1449-1458, 1972.

31. A. Koberstein, “Progress in the Dual Simplex Algorithm for Solving Large ScaleLP Problems: Techniques for a Fast and Stable Implementation”, ComputationalOptimization and Applications, Vol. 41, No. 2, pp. 185-204, 2008.

32. R. Koenker and G. Bassett, “Regression Quantiles”, Econometrica: Journal ofthe Econometric Society, Vol. 46, No. 1, pp. 33-50, 1978.

33. R. Koenker and V. d’Orey, “Algorithm AS 229: Computing Regression Quan-tiles”, Journal of the Royal Statistical Society, Vol. 36, No. 3, pp. 383-393, 1987.

34. R. Koenker and V. d’Orey, “A Remark on Algorithm AS 229: Computing DualRegression Quantiles and Regression Rank Scores”, Journal of the Royal Statis-tical Society, Vol. 43, No. 2, pp. 410-414, 1994.

35. R. Koenker and B. Park, “An Interior Point Algorithm for Nonlinear QuantileRegression”, Journal of Econometrics, Vol. 71, No. 1, pp. 265-283, 1996.

36. R. Koenker and O. Geling, “Reappraising Medfly Longevity: A Quantile Regres-sion Survival Analysis”, Journal of the American Statistical Association, Vol. 96,No. 454, pp. 458-468, 2001.

37. R. Koenker and K. Hallock, “Quantile Regression”, Journal of Economic Per-spectives, Vol. 15, No. 4, pp. 143-156, 2001.

38. R. Koenker, Quantile Regression, No. 38, Cambridge University Press, Cam-bridge, UK, 2005.

39. E. Kostina, “The Long Step Rule in the Bounded-Variable Dual SimplexMethod:Numerical Experiments”,Mathematical Methods of Operations Research, Vol. 55,No. 3, pp. 413-429, 2002.

40. Y. Li and J. Zhu, “L1-Norm Quantile Regression”, Journal of Computationaland Graphical Statistics, 2012.

41. R. Lock, “1993 New Car Data”, Journal of Statistics Education, Vol. 1, No. 1,1993.

42. I. Lustig, R. Marsten and D. Shanno, “On Implementing Mehrotra’s Predictor-Corrector Interior-Point Method for Linear Programming”, SIAM Journal onOptimization, Vol. 2, No. 3, pp. 435-449, 1992.

BIB-3

43. I. Lustig, R. Marsten and D. Shanno, “Interior Point Methods for Linear Pro-gramming: Computational State of the Art”, ORSA Journal on Computing, Vol.4, No. 1, pp. 1-14, 1994.

44. J. Machado and J. Mata, “Earning Functions in Portugal 1982—1994: EvidenceFrom Quantile Regressions”, Empirical Economics, Vol. 26, No. 1, pp. 115-134,2001.

45. J. Machado and J. Mata, “Counterfactual Decomposition of Changes in WageDistributions Using Quantile Regression”, Journal of Applied Econometrics, Vol.20, No. 4, pp. 445-465, 2005.

46. P. Martins and P. Pereira, “Does Education Reduce Wage Inequality? QuantileRegression Evidence From 16 Countries”, Labour Economics, Vol. 11, No. 3, pp.355-371, 2004.

47. J. Mata and J. Machado, “Firm Start-Up Size: A Conditional Quantile Ap-proach”, European Economic Review, Vol. 40, No. 6, pp. 1305-1323, 1996.

48. K. McShane, C. Monma and D. Shanno, “An Implementation of a Primal-DualInterior Point Method for Linear Programming”, ORSA Journal on Computing,Vol. 1, No. 2, pp. 70-83, 1989.

49. S. Mehrotra, “On Finding a Vertex Solution Using Interior Point Methods”,Linear Algebra and Its Applications, Vol. 152, pp. 233-253, 1991.

50. S. Mehrotra, “On the Implementation of a Primal-Dual Interior Point Method”,SIAM Journal on Optimization, Vol. 2, No.4, pp.575-601, 1992.

51. L. Meligkotsidou, I. Vrontos and S. Vrontos, “Quantile Regression Analysis ofHedge Fund Strategies”, Journal of Empirical Finance, Vol. 16, No. 2, pp. 264-279, 2009.

52. B. Melly, “Public—Private Sector Wage Differentials in Germany: Evidence FromQuantile Regression”, Empirical Economics, Vol. 30, No. 2, pp. 505-520, 2005.

53. R. Mueller, “Public—Private Sector Wage Differentials in Canada: EvidenceFrom Quantile Regressions”, Economics Letters, Vol. 60, No. 2, pp. 229-235,1998.

54. J. Nocedal and S. Wright, Numerical Optimization, Springer-Verlag New York,Inc., New York, NY, 1999.

55. S. Portnoy and R. Koenker, “The Gaussian Hare and the Laplacian Tortoise:Computability of Squared-Error Versus Absolute-Error Estimators”, StatisticalScience, Vol. 12, No. 4, pp. 279-300, 1997.

56. P. Robers and A. Ben-Israel, “An Interval Programming Algorithm for DiscreteLinear L1 Approximation Problems”, Journal of Approximation Theory, Vol. 2,No. 4, pp. 323-336, 1969.

BIB-4

57. P. Robers and A. Ben-Israel, “A Suboptimization Method for Interval LinearProgramming: A New Method for Linear Programming”, Linear Algebra andIts Applications, Vol. 3, No. 3, pp. 383-405, 1970.

58. G. Ross, R. Soland and A. Zoltners, “A Note on the Bounded Interval General-ized Assignment Problem”, Research Report CCS 253, DTIC, 1976.

59. D. Stifel and S. Averett, “Childhood Overweight in the United States: A Quan-tile Regression Approach”, Economics & Human Biology, Vol. 7, No. 3, pp.387-397, 2009.

60. R. Vanderbei, M. Meketon, and B. Freedman, “A Modification of Karmarkar’sLinear Programming Algorithm”, Algorithmica, Vol. 1, No. 1-4, pp. 395-407,1986.

61. R. Vanderbei, Linear Programming, Springer-Verlag New York, Inc., New York,NY, 2015.

62. S. Vaz, C. Martin, P. Eastwood, B. Ernande, A. Carpentier, G. Meaden and F.Coppin, “Modelling Species Distributions Using Regression Quantiles”, Journalof Applied Ecology, Vol. 45, No. 1, pp. 204-217, 2008.

63. Y. Zhang, “Solving Large-Scale Linear Programs by Interior-Point Methods Un-der the Matlab* Environment”, Optimization Methods and Software, Vol. 10,No. 1, pp. 1-31, 1998.

BIB-5

REPORT DOCUMENTATION PAGE Form ApprovedOMB No. 0704–0188

The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, includingsuggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704–0188), 1215 Jefferson Davis Highway,Suite 1204, Arlington, VA 22202–4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collectionof information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.

1. REPORT DATE (DD–MM–YYYY) 2. REPORT TYPE 3. DATES COVERED (From — To)

4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

6. AUTHOR(S)

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORTNUMBER

9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORTNUMBER(S)

16. SECURITY CLASSIFICATION OF:

a. REPORT b. ABSTRACT c. THIS PAGE

17. LIMITATION OFABSTRACT

18. NUMBEROFPAGES

19a. NAME OF RESPONSIBLE PERSON

19b. TELEPHONE NUMBER (include area code)

Standard Form 298 (Rev. 8–98)Prescribed by ANSI Std. Z39.18

14–09–2017 Doctoral Dissertation Oct 2014–Sep 2017

Duality Behaviors of the Quantile Regression Model Estimation Problem

Robinson II, Paul D., Major, USAF

Air Force Institute of TechnologyGraduate School of Engineering and Management (AFIT/EN)2950 Hobson WayWPAFB, OH 45433-7765


United States Army Cyber Command (ARCYBER)ATTN: Cade Saie, LTC, USA8825 Beulah StFort Belvoir, VA 22060

ARCYBER

12. DISTRIBUTION / AVAILABILITY STATEMENT

Distribution Statement A: Approved for Public Release; Distribution Unlimited

13. SUPPLEMENTARY NOTES

This work is declared a work of the U.S. Government and is not subject to copyright protection in the United States.

14. ABSTRACT

15. SUBJECT TERMS

quantile regression, linear programming, optimization

U U U UU 127Dr. James W. Chrissis, AFIT/ENS

(937) 367-6760 [email protected]

air force institute of technologyquantiles, is shown to be the solution to a parametric minimization...

Documents