a fluid model for an overloaded bipartite queueing system with...

Submitted to Operations Research

manuscript (Please, provide the mansucript number!)

Authors are encouraged to submit new papers to INFORMS journals by means ofa style file template, which includes the journal title. However, use of a templatedoes not certify that the paper has been accepted for publication in the named jour-nal. INFORMS journal templates are for the exclusive purpose of submitting to anINFORMS journal and should not be used to distribute the papers in print or onlineor to submit the papers to another publication.

A Fluid Model for an Overloaded Bipartite QueueingSystem with Heterogeneous Matching Utilities

Yichuan Ding, S. Thomas McCormick, Mahesh NagarajanSauder School of Business, University of British Columbia, Vancouver, BC V6T1Z2,

{Daniel.Ding, Tom.McCormick, Mahesh.Nagarajan}@sauder.ubc.ca

We consider a bipartite queueing system (BQS) with multiple types of servers and customers, where different

customer-server combinations may generate different utilities. Whenever a server is available, it serves the

customer with the highest index, which is the sum of a customer’s waiting index and the matching index. We

assume that the waiting index is an increasing function of a customer’s waiting time and the matching index

depends on both the customer and the server’s types. We develop a fluid model to approximate the behavior

of such a BQS system, and show that the fluid limit process can be computed over any finite horizon. We

prove that the fluid limit process converges to a unique steady state, which can be efficiently computed.

These results enable the policy designer to predict the behavior of a BQS when an indexing priority rule

takes the form as we have defined, and choose an indexing formula that optimizes a given set of performance

metrics. We illustrate the use of our machinery by analyzing the public housing assignment in the city of

Pittsburgh using a real data set.

Key words : Bipartite Queueing System, Min Cost Max Flow, Nested Cuts for Parameterized Network,

Value-based Routing, Public Housing Assignment, Scarce Resource Allocation

Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 1

1. Introduction

Many service systems can be modeled as queuing systems that allocate service capacity between

customers (clients), and servers (resources). In settings such as organ donation or assignment of

public housing, the demand for services is typically higher than the capacity to service the demand.

We refer to such systems as being overloaded. As a result, some customers end up abandoning the

queue without being served.

In applications with homogenous customers and homogenous servers where matching customers

to servers is not a significant issue and where it is desirable to have customers be equally likely to

get served, the first-come-first-serve (FCFS) rule is conventionally accepted as the gold standard

of fairness.

In other applications that have heterogenous customers (such as different classes of patients)

and/or heterogenous servers (such as different qualities of organs), FCFS is less appropriate. Differ-

ent server-customer pairs in these settings can generate different levels of welfare from the service

provider’s perspective. For example, for patients with end-stage renal disease who are awaiting kid-

ney transplants, certain donor-recipient pairs can lead to higher post-transplant survival rates; and

certain hospitals can provide better treatment for some types of medical conditions than others. By

implementing priority rules in such settings, the service provider can generate more advantageous

server (resource)-customer pairs and achieve a better system-level efficiency than that achieved by

applying the FCFS rule. Doing so may, however, result in longer waiting times for customers for

whom matching a server is difficult, which may in turn lead to higher abandonment rates.

When one considers settings such as the public sector example mentioned above, the allocation

of scarce resources usually faces two conflicting needs: maximizing efficiency requires an increase

in the proportion of efficient resource-customer pairs, while minimizing inequity (or unfairness)

requires a reduction in the disparity of waiting times across customers of different types. Ideally one

would like to solve a stochastic optimization problem whose objective includes some combination of

the above two factors. Unfortunately, it appears likely for the class of problems that interest us that

this would lead to optimal policies that are too difficult to understand, and so difficult to implement

in practice. Instead we consider a class of sophisticated priority rules that takes into account the

matching outcome as well as the customers’ waiting time in order to heuristically make a tradeoff

between these two conflicting objectives. Understanding the dynamics of the system under such

priority rules can be challenging. The goal of this paper is to develop tools which approximate

the dynamics of the queueing system under a broad but specific class of priority rules, so that the

policy designer can use the understanding of the dynamics to study the impact of using specific

priority rules. This is crucial in helping decide which priority rule one may want to use to achieve

a particular overall objective.

Author: An Overloaded Bipartite Queueing System with Matching Cost2 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)

1.1. Overview of the Model

In the next several sections we introduce our model and the notations needed to study service

systems such as the ones above. We use a bipartite queueing system (BQS), which consists of I types

of customers and J types of resource pools or service providers (the terms “resource” and “service”

will be used interchangeably from now on). Customers and resources arrive at the system with

time-varying mean arrival rates λi(t) and µj(t), respectively, for customer type i(= 1,2, . . . , I) and

resource type j(= 1,2, . . . , J). Since this model is motivated by the allocation of scarce resources,

the focus of our study is on scenarios when the BQS on the customer side is overloaded most of the

time. As a result, customers of type i have to buffer in the i-th queue and the servers are busy most

of the time. A waiting customer in class i independently abandons the system after a random time

duration that follows a fixed distribution with continuous cdf Fi(·). We assume that a customer

will permanently leave the system by either being served or abandoning the queue.

A measure of efficiency would be to assume that a utility U(j, i) is generated whenever a customer

of type i receives service of type j, and then the system efficiency is the average utility of all

matches. There are several ways to measure fairness. One possibility is to look at measures that

capture the differences across queues in their likelihood of getting served. This can be rigorously

measured as the variance of an underlying random variable. We elaborate and discuss this carefully

in Section 3, where we study the impact of the scoring rule on the system and the overall objective

of the service provider, which in turn depends on the above two factors.

Since trying to directly optimize these objectives would likely lead to unimplementable policies,

we consider instead an implementable class of policies that ranks the customers by what we call an

M+W index or score (we use the terms “index” and “score” interchangeably from now on), where

“M” and “W” stand for matching score and waiting score, respectively, and “+” indicates that

the total score is a sum of the waiting and matching scores. For the “M” part, we assume that a

matching score L(j, i) is given or can be computed for each resource-customer pair (j, i), indicating

how good it would be to serve customer type i by server type j. One could base L(j, i) on U(j, i),

but it is possible that one could get better performance by choosing L(j, i) different from U(j, i).

The “W” part depends on the amount of waiting time a customer has spent in the queue. Thus

we assume that a waiting score gi(WT) is given or can be computed for customer type i. We

assume that gi(WT) is a strictly increasing and continuous function of WT to incent the system to

choose to serve customers who have waited longer. The special case where gi(·) is a linear function,

gi(WT) = ciWT, has been commonly used in other ranking policies. Therefore the total score of

type i customers as seen by servers of type j, denoted by s(j, i,WT), is

s(j, i,WT)=L(j, i)+ gi(WT). (1)


Whenever a server of type j becomes available, it will be assigned to the HOL customer in queue

i with the highest M+W index s(j, i,WT). We only consider non-preemptive service disciplines,

because in our model the start of a service means that the resource has been offered to a customer

and this allocation is final.

Because customers of the same type (so in the same queue) have the same matching score with

respect to the same server, and the waiting score is strictly increasing in each customer’s waiting

time, we know that within each queue, the discipline is first come, first served (FCFS). Customers

at the head of different queues have to compete for service determined by comparing their scores.

Since gi(·) is strictly increasing and continuous, and the inter-arrival and service completion times

both have continuous distributions, the probability that two customers’ indices are tied is zero.

We refer to a BQS equipped with the M+W indexing rule as the M+W-BQS model. This

framework has applications in many settings where scarce resources are allocated across different

types of service requests, particularly in public sectors. We now provide two examples that have

been extensively studied in various literatures as examples of important social problems.

Motivating Example 1: Cadaver Kidney Allocation . The United Network of Organ Sharing

(UNOS) is a non-profit organization which coordinates U.S. organ transplant activities. UNOS

(Committee 2011) continues to face a serious shortage of kidney donations. There were more than

90,000 patients registered on the kidney transplant waitlist at the end of year 2014 (OPTN/UNOS

2015). This number continues to grow. This waitlist can be modeled as a BQS, where patients

(kidneys) with similar physical attributes are regarded as being in the same queue. Patient in this

system renege either due to death or by receiving live donations.

The UNOS has been seeking an evidence-based and transparent ranking policy that strikes

a balance between maximizing the total life years saved by transplant and minimizing inequity

across different types of patients (Zenios et al. 2000). In 2008, the UNOS Scientific Registry of

Transplant Recipients (SRTR) proposed to rank candidates using their kidney allocation score

(KAS) (OPTN/UNOS 2008), which can be expressed as

KAS=0.8× (1−DPI)

0.8×DPI+0.2×LYFT+CPRA× 4/100+WT, (2)

where WT is the patient’s waiting time, LYFT ( life years from transplant) is a measure of the

candidate’s net gain from transplantation (which depends on both the donor and patient type),

DPI (donor profile index) is a function of donor type, and CPRA (calculated panel reactive anti-

body) compensates patient types for which it is difficult to find a matched kidney. Formula (2)

is a particular case of M+W index by defining L(j, i) = 0.8×(1−DPI)

0.8×DPI+0.2× LYFT+CPRA× 4/100 and

gi(WT)=WT (We assume gi(0) = 0, so the terms independent of WT have been added into L(j, i)).

The tools and methods developed in this paper using our M+W-BQS framework enable one to

study the performance of the system if one uses such a KAS.


Motivating Example 2: Housing Assignment. Kaplan (1988) was the first to study the problem

of assigning limited public housing units to deserving applicants who register sequentially and have

heterogenous preferences for housing types and locations. Kaplan considers a scenario in which

each applicant has to specify their acceptable housing types and houses of each type will only

be offered to applicants who accept that type. An extension of the Kaplan’s model is to allow

applicants to specify their ordinal preferences over different types of houses, a practice followed

in some North American cities. For example, an applicant may rate each house type as “first

preference”, “second preference”, etc. Our M+W-BQS framework is able to investigate a housing

allocation system that accommodates such ordinal preferences. In Section 5, using a data set from

the Housing Authority of the City of Pittsburgh, we predict and compare allocation outcomes

when different M+W-indexing rules are used in the allocation of public housing in Pittsburgh.

Next, we review the literature relevant to M+W-BQS, and clarify the contributions and the

relative positioning of our paper. This review will reveal that M+W-BQS is in general not in

the class of models that are exactly solvable. Hence this paper will analyze M+W-BQS using a

deterministic fluid approximation of the underlying stochastic process.

1.2. Literature Review

BQS covers a broad set of service systems, and its formulation exhibits subtle differences depend-

ing on the specific application. According to the traditional definition of a BQS in the queueing

literature, each resource type j = 1, . . . , J can only serve a subset of compatible customer types,

say I(j). The topology of the BQS can be represented by a bipartite graph whose vertices on

each side represent the customer and server types, respectively. An arc connecting vertices j and

i represents that (j, i) is a compatible resource-customer pair. Depending on the application that

is being modeled, each vertex on the resource side may represent a type of resource, or a single

server, or a pool of many servers of the same type.

In settings that resonate with our motivating examples, at least two approaches are possible.

The first approach (“optimization”) studies what the optimal control policy would be for a BQS

where one has a clear objective that is being optimized. The policy that one derives using this

approach often yields a priority rule. The optimization approach is practically useful if the priority

rule is not complicated and easy to understand and implement.

In public sector settings such as organ donation, transparency and ease of understanding is a key

requirement, and the policies coming from the optimization approach would be too complicated.

The second and more appropriate approach (“rule-based”) in such cases is to invent a scoring rule

that takes into account different relevant considerations and uses this to allocate available capacity.

In the rule-based approach, understanding the impact of the scoring rule on the system is vital.


We take the rule-based approach in this paper. Our paper has connections to both approaches and

the literature review is loosely organized based on the approach each paper takes to studying the

BQS.

One stream of literature uses the rule-based approach to characterize the dynamics of a BQS

under the FCFS service disciplines (hereafter referred to as FCFS-BQS). The FCFS rule in BQS

specifies that each newly available server j has to serve the first-arrived customer among those in

I(j). The FCFS-BQS model was first proposed by Schwartz (1974) in the study of a “lane-selection”

problem, and later motivated by other applications such as the allocation of public housing (Kaplan

1988) and the adoption of children (Caldentey and Kaplan 2007). Talreja and Whitt (2008) first

studied a fluid approximation for the FCFS-BQS with abandonment, where all flows are regarded

as deterministic. They discussed the conditions under which the system is globally FCFS, that

is, all customers are FCFS regardless of their types. They pointed out that the fluid process may

be not unique when the residual network of the bipartite graph contains a loop. Caldentey et al.

(2009) studied a stochastic BQS where customers and resources both arrive according to a renewal

(non-Poisson) process with constant mean. They proposed a novel state description to characterize

the system behavior as a Markovian process. Later, Adan and Weiss (2012) modeled the same

BQS with a different Markovian process and derived its stationary distribution, which surprisingly

takes a product form. Using this state description, product-form stationary distributions have been

found for settings when a newly arriving customer was routed to a randomly selected compatible

server with fixed probability (Visschers et al. 2012), or the longest-idle compatible server (Adan

and Weiss 2014). Adan and Weiss (2014) also proved the convergence of the stochastic processes in

the FCFS-BQS to the fluid limit process when each type of service is provided by a single server.

Another stream of literature uses the optimization approach to examine scheduling rules in

BQS, and most of the problems in this literature have an objective of minimizing total delays

in the system. An early non-BQS result by Van Mieghem (1995) introduced the Gcµ rule for a

single-server multi-class queueing system with a holding cost Ci(Qi) for each queue i, where Ci(Qi)

is a convex increasing, differentiable function of the queue length Qi. The Gcµ rule says that a

newly available server j should myopically maximize the decrease in cost by serving the queue

i with the highest index µiC′i(Qi). Mandelbaum and Stolyar (2004) extended this result to the

BQS setting and proved that if each server j follows the Gcµ rule by serving the highest index

µijC′i(Qi), then it asymptotically minimizes both instantaneous and cumulative total holding cost

over all routing policies. An analogous result for a BQS with many servers for each type is proved

by Gurvich and Whitt (2009) under additional conditions. When there is holding cost of c per unit

time and customers renege at rate θ, Atar et al. (2010) proved that the cµ/θ rule minimizes linear

holding costs in a Markovian queue with homogeneous servers and multitype customers. However,


it is not known whether the optimality result of the cµ/θ rule can be extended to the BQS case.

In fact, Stolyar and Tezcan (2011) show that a shadow-routing based control policy numerically

outperforms the cµ/θ rule for the X-model (the simplest BQS with two customer classes and two

server pools) in the sum of the served customers weighted by their types. Other scheduling policies

for a BQS have been studied by Dai and Tezcan (2008), Larranaga et al. (2014), and Ghamami

and Ward (2013).

The above literature on BQS assume that different customer-server pairs may have different

service speeds, but generate the same value customer-server value. In contrast, in our problem

the total reward depends on the customer-server match. This is an important difference. Papers

that consider a BQS and model this difference resort to numerical approaches and show limited

theoretical results. One possible reason is that such a BQS model, although it captures the trade-

offs and dynamics in public sector examples such as the ones described earlier, is not often relevant

to commonly studied service systems such as call centers. Examples include a recent paper by

Sisselman and Whitt (2007) that proposed the idea of value-based routing in contrast to the

traditional congestion-based routing for a BQS in which different customer-server pairs are assumed

to generate different utility values for the service provider. Related to this, Mehrotra et al. (2012)

numerically compares several scheduling rules in call centers where the resolution probability for

each call request depends on both the server and request types.

An important point to note is that the matching utility cannot be easily incorporated into the

analytical framework developed in the existing literature on BQS to obtain results that reconcile

with customer-server utilities. For example, in Section 4 we develop an optimal indexing rule

for the steady state of the fluid model which cannot be derived from the Gcµ nor cµ/θ rule by

simply including the matching utility U(j, i) into the cost rate c. In addition, the fairness objective

requires us to consider congestion-based measurements such as queue length and waiting time, so

this problem has to be considered under a queueing framework. Our paper takes a first step to

investigate this problem by focusing on the fluid approximation of the system with the underlying

belief that the fluid process offers a robust approximation of the stochastic behavior of the system.

This belief is well-founded for at least two reasons. First, our simulation experiment in Section 5

shows that the fluid model gives a close approximation even though some queues have a thin

arrival stream such as less than ten customers per unit time. Second, Adan and Weiss (2014) has

proved that the scaled stochastic process in FCFS-BQS converges to a fluid process using the state

description introduced in (Adan and Weiss 2012). We expect and anticipate that a similar result

follows for our M+W-BQS. However, we believe that this extension would require a significant

technical overhead since M+W-BQS exhibits much more complicated dynamics than FCFS-BQS.


We thus leave the issue of convergence to a fluid process for future research and in this paper we

focus on the fluid approximation to the M+W-BQS.

Note that the M+W indexing policy represents a general class of priority rules and subsumes

FCFS (by setting L(j, i) = 0 and gi(WT) =WT). In particular, it generalizes the dynamic priority

policy proposed by Jackson (1960) for a queueing system with a single server and multi-type

customers. Their dynamic priority rule prioritizes customers using the index

s(i,WT)=L(i)+ gi(WT), (3)

where Jackson (1960) call L(i) the “urgent number” for customer type i. Jackson (1960) studied

the simplest waiting score gi(WT) =WT. Nelson (1990) considered the more general case when

the gi(WT) are affine functions whose slopes and intercepts both depend on i. Kleinrock and

Finkelstein (1967), Netterman and Adiri (1979), Grindlay (1965) further analyzed situations when

gi(WT) are nonlinear functions. The M+W rule generalizes the dynamic priority policy by allowing

function L to depend on both customer and server types.

The M+W index uses three variables in ranking the candidates: customer type, server type,

and customer waiting time. We do not consider other ranking criteria, such as the arrival rates

of customers and resources, or the queue length, which have been widely used in other dynamic

routing policies in overloaded systems, e.g., (Zenios et al. 2000, Stolyar and Tezcan 2011). The

reason is that avoiding these other ranking criteria makes the policy more likely to be used in

practice. For example, it is difficult to rank patients on the kidney waitlist by the queue length or

arrival rates, because the real-time queue length would be difficult to monitor. Moreover in settings

such as these of interest to the general public, having a simple and a somewhat predictable ranking

mechanism is more likely to be publicly acceptable.

1.3. Main Contributions

Our paper makes the following contributions:

1. We present the first study on a BQS where the matching utility depends on both customer and

resource types. The ranking policy we proposed, namely the M+W indexing rule, incorporates

two different objectives that a social planner may have, i.e, maximizing matching utility

and reducing inequity across different queues. Including both objectives makes the model

quite relevant for applications. Using this framework, we show several useful and important

theoretical properties of this system, and we show an application of the tools we develop in

this paper by applying it to a case study on public housing.


2. We develop an algorithm to compute the transient state of the fluid process over any finite

horizon. Our characterization of the transient state of the fluid process has both theoretical

and practical significance. From a theoretical perspective, the construction of the fluid process

demonstrates its existence and uniqueness. Moreover, building on the transient analysis, we

show under certain assumptions that the fluid process converges to the steady state, which

means the steady state we characterize can not only be theoretically achieved but can be used

in practice. From a practical perspective, the arrival rates of both resources and customers

can vary in time. Even if the arrival rates are stationary, an overloaded queueing system

with reneging customers typically takes a substantial amount of time before converging to

the steady state. E.g., for the public housing assignment example we study in Section 5, it

takes 50 years to reach steady state. The policy maker is likely to be interested in the system

performance only over a relatively short time horizon. In such cases studying the transient

behavior would be quite relevant for policy evaluation and design. Thus proving results for

the transient case as we do in this paper is important.

3. We characterize the steady state of the fluid process as a solution to a network flow problem,

and propose an optimal M+W index when the fairness is measured by the variance in the

reneging rate across different queues. Although there is no theoretical guarantee of the opti-

mality of this indexing rule for different fairness metrics or for a transient period, the proposed

M+W index can be used as a benchmark policy based on which one could explore possible

adjustments for improvements. Showing results for a fluid model lets us test various rules and

its impact on possible fairness and efficiency metrics. Having the theoretical approximation

has the advantage that one does not have to resort to simulation, which can be more expensive

and less robust. Moreover, we prove structural results that give useful insights on the effect

of the policy which would be hard to imagine without this type of theoretical result.

4. Both the transient and stationary analysis of the fluid model rely heavily on tools from com-

binatorial optimization, which is not typical in the queueing literature. We need these tools

because the dynamics in our M+W-BQS are extremely complicated, and even the fluid process

alone is hard to describe without these optimization techniques. Our study gives an example

where combinatorial optimization techniques are applied to characterizing complicated queue-

ing networks, which is an area where researchers in both the combinatorial optimization and

queueing communities can make contributions.

The rest of the paper is organized as follows. Section 2 formally defines the fluid process as a

solution to a set of dynamic equations. In Section 3 we prove that over any finite horizon, under

certain conditions, the transient states of the fluid process can be constructed by an algorithm we

develop, which also proves that the fluid process is unique. In Section 4 we characterize the steady


state as a solution to a min-convex-cost network flow problem, and derive the optimal form of the

M+W policy with respect to some special efficiency and fairness metrics. Section 5 presents a case

study for public housing allocation in Pittsburgh. Finally, Section 6 discusses the limitations of our

paper and future research directions.

2. The Fluid Model

We consider the system as I parallel double-sided buffers (or queues). Demand fluid of type i flows

into the i-th buffer at a rate λi(t), and as customers renege it leaks out at a rate fi(WT), where

WT is the amount of time that the demand fluid has waited in a buffer. We call WT the age of a

demand fluid, and call the demand fluid that has arrived at time t as cohort t. Supply fluid of type

j flows into the system at rate µj(t). Following the terminology in (Liu and Whitt 2012a,b, 2011),

we assume that both λi(t) and µj(t) are piecewise continuous functions, which means that they

are right-continuous functions with finitely many discontinuities in each finite interval, where left

limits exist at each discontinuity point. This broad assumption on arrival and service rates allows

us to model situations where a rate suddenly changes due to an external shock.

Whenever supply fluid meets demand fluid, each is canceled out by an equal amount of the other.

The supply fluid always cancels out the demand fluid with the highest index s(j, i,WT), where

s(j, i,WT) is computed according to equation (1). According to the M+W index, demand fluid

of the same type will be ranked by age. Hence, FCFS applies in each buffer, and so demand and

supply fluid are virtually flowing into a buffer from two opposite directions. Figure 1 provides a

graphical illustration of the fluid model.

We next give a formal mathematical description of the fluid model for M+W-BQS discussed

informally above.

Model Inputs: Index set I and J , piecewise continuous arrival rates (λi(t))i∈I and

(µj(t))j∈J , continuous cumulative distribution functions (cdf) of abandonment time (Fi(·))i∈I ,

matching scores (L(j, i))j∈J ,i∈I, and waiting score functions (gi(·))i∈I.

State Variables: Head-of-line (HOL) waiting time in each buffer W (t) := (Wi(t))i∈I,

amount of demand fluid in each buffer Q(t) := (Qi(t))i∈I, routing rates (or service rates) r(t) :=

(rji(t))j∈J,i∈I, and scores s(t) := (sji(t))j∈J ,i∈I. Note that t is a continuous time index.

Assumptions and Comments on Model Inputs and State Variables: Suppose a unit

of demand fluid arrives at buffer i at time 0 and has not been served by time t. We assume

that the cumulative proportion of demand fluid that has reneged up to time t is given by the

cdf Fi(t). We require Fi(t) to be real analytic on [0,+∞)1, and its pdf fi(·) to stay positive over


Figure 1 The Fluid Approximation of M+W-BQS.

[0,W i] with W i ∈ (0,+∞] (we allow W i =+∞ to represent an infinite support of fi(·)) . We use

FCi (t) := 1−Fi(t) to denote the complementary cdf associated with Fi(·).

The HOL waiting time refers to age of the fluid at the head of a queue. We use Wi(t) and Qi(t) to

denote the HOL waiting time and length of queue i at time t, respectively, and use rji(t) to denote

the rate of supply fluid that is routed to buffer i from server j at time t. At any time t, the supply

fluid at the head of each queue i is assigned an M+W score sji(t) =L(j, i)+ gi(Wi(t)) with respect

to each server j. We call sji(t) the score of queue i w.r.t. server j. The waiting score function gi(·)

is assumed to have the following properties: (1) real analytic on [0,+∞); (2) gi(0) = 0; and (3)

strictly increasing.

2.1. Definition of a Fluid Process

Definition 1 A fluid process in an M+W-BQS is a deterministic process {(W (t),Q(t), r(t)) | t≥

0} with W (t) and Q(t) continuous, r(t) piecewise continuous, which satisfies the following con-

straints:


1. Budget Constraint:∑

i∈I

rji(t) = µj(t), ∀ j ∈J . (4)

2. Flow Dynamic Equation: For all i ∈ I, the derivative Q′i(t) exists almost everywhere (a.e.)

with

Q′i(t) = λi(t)−

∑

j∈J

rji(t)−

∫ t

t−Wi(t)

λi(s)fi(t− s)ds. (5)

In (5), λi(t) and∑

j∈J rji(t) represent the total rate of demand and supply fluid that flow in

to buffer i, fi(t− s) gives the reneging rate for cohort s at time t, and∫ t

t−Wi(t)λi(s)fi(t− s)ds

counts the total reneging rate for queue i at time t. Note that t−Wi(t) could take negative

values, in which case we need to know λi(s) at time s < 0 in order to calculate Q′i(t).

3. One-to-One Correspondence between Qi(t) and Wi(t):

Qi(t) =

∫ t

t−Wi(t)

λi(s)FCi (t− s)ds, (6)

where λi(s)FCi (t− s) gives the amount of fluid in cohort s that remains in the buffer without

having reneged.

4. Highest-Indexed-Served-First: Let A(t, j) denote the set of buffers which may receive supply

fluid of type j at time t. Often A(t, j) is referred to as the active set, which is the set of queues

with the highest scores for supply fluid of type j at time t, that is,

A(t, j) = argmax {sji(t) | i∈ I}. (7)

However, if all queues in argmax {sji(t) | i ∈ I} are empty at time t, then the active set has

to be expanded to allow the surplus of supply fluid of type j to flow into other buffers. We

do not consider this special case in our construction of the fluid process in order to keep

the discussion simpler (this special case can be handled by an easy reduction step as given

in Appendix EC.1). After identifying A(t, j), our priority rule of serving the highest indexed

fluid requires rji(t) > 0 only in periods when queue i ∈ A(t, j), which leads to the following

complementary slackness condition:

rji(t)> 0 ⇒ i ∈A(t, j) = argmax {sji(t) | i∈ I}. (8)

In our above definition of fluid process, we assumed that the queue length process Qi(t) is contin-

uous, and its derivative exists a.e., which implies the same property for Wi(t). This assumption is

without loss of generality for the following reasons. Process Qi(t) can be alternatively expressed as

a difference of integrals that counts the cumulative arrivals and departures for buffer i, which shows

that Qi(t) is continuous everywhere and differentiable a.e. We thus simply characterize the fluid


process using Q′i(t), by noting that the integral expression for Qi(t) would require us to introduce

a significant amount of notation, but is equivalent to the characterization using Q′i(t).

However, the assumption on piecewise continuity of r(t) is indeed restrictive, as in theory there

can be reasonable fluid approximations to the stochastic system where the service rates r(t) does

not have such nice properties. However, we anticipate that most service disciplines used in practice

have piecewise continuous service rates rji(t). Therefore our assumption reflects a reasonable com-

promise that avoids the technical difficulty of analyzing non-piecewise continuous rji(t) without

compromising the practical relevance of our model. The assumption of piecewise continuous service

rates has been used in the queueing literature that looks at time-varying arrival rates, for example,

(Liu and Whitt 2012a).

2.2. A Compact Characterization of the Fluid Process

We next propose a compact characterization of the fluid process which only uses state variables

W (t) and r(t). With W (t) and r(t), we can use (6) to recover Q(t).

We first look at the constraints thatW (t) has to satisfy. Since Qi(t) is differentiable a.e., equation

(6) implies that W ′i (t) exists a.e. and satisfies the equation

Q′i(t) = λi(t)−λi(t−Wi(t))F

Ci (Wi(t))(1−W ′

i(t))−

∫ t

t−Wi(t)

λi(s)fi(t− s)ds. (9)

The two equations on Q′i(t), (9) and (5) then imply the following constraint on W ′

i (t):

W ′i (t) =

(

λi(t−Wi(t))FCi (Wi(t))

)−1(

λi(t−Wi(t))FCi (Wi(t))−

∑

j∈J

rji(t))

. (10)

The scores sji(t) only depends on t through gi(Wi(t)), so its derivative s′ji(t) is independent of

j. Thus we can use θi(t) to denote the score change rate of queue i with respect to all servers, and

θi(t) = g′i(Wi(t))W′i (t)

= g′i(Wi(t))(λi(t−Wi(t))FCi (Wi(t)))

−1

(

λi(t−Wi(t))FCi (Wi(t))−

∑

j∈J

rji(t)

)

=: ϑWi

(

∑

j∈J

rji(t))

,

(11)

where the function ϑWi(z) defined above returns the score change rate of queue i when its HOL

waiting time is Wi and has an aggregate service rate z.

By considering equations (9) (10), and (11), piecewise continuity of r(t) implies thatQ′i(t),W

′i (t),

and θ′i(t) have left and right limits everywhere, and have only a finite number of discontinuity

points in any finite interval. Note that Q′i(t), W

′i (t), and θ′i(t) might not exist at such discontinuous

points, but we can safely assign their function values to be their right limits at those discontinuity

points without changing the process Qi(t) and Wi(t). With this clarification, henceforth, we assume

that Q′i(t), W

′i (t), and θ′i(t) exist everywhere and are piecewise continuous in t.

Next we discuss the constraints the service rates r(t) has to satisfy.


(a) The budget constraint (4).

(b) Supply fluid flows from j to i only if i ∈ argmax sji(t), that is, (8).

(c) If server j is serving multiple queues at t, then the flow rates rji(t) have to be split such that

all queues being served by server j have the same score change rate at t, that is,

rji(t)> 0 ⇒ θi(t) = maxk∈A(t,j)

θk(t). (12)

Constraints (a) and (b) directly follows from the constraints for the fluid process. Constraint

(c) follows from constraint (b) and piecewise continuity of r(t). If constraint (c) were violated,

then without loss of generality we can assume that the score of queue i ∈ argmax sji(t) increases

slower than the other queues in argmax sji(t). As a result, queue i can no longer maintain its

active status with respect to server j, and the service rate rji jumps to zero right after t. Now

this in turn implies rji(t) is not right-continuous at t, contradicting the fact that rji is piecewise

continuous (our definition of piecewise continuity requires right-continuity). Thus constraint (c)

has to be satisfied.

We next propose the following definitions.

Definition 2 We refer to r(t) := (rji(t))j∈J ,i∈I as feasible routing rates or a feasible routing-

rate vector with respect to W (t) if r(t) and the associated θi(t) = ϑWi(∑

j∈J rji(t)) (i ∈ I) satisfy

conditions (4), (8), and (12).

Since both r(t) and Q(t) can be constructed from W (t), it suffices to use the HOL waiting-time

trajectory {W (t) | t≥ 0} to describe the fluid process.

Definition 3 An HOL waiting-time vector process {W (t) | [t0, t1]} is said to be the fluid process in

an M+W-BQS if there exists a queue-length vector process {Q(t) | [t0, t1]} and service process {r(t) |

t∈ [t0, t1]} such that {(W (t),Q(t), r(t)) | t∈ [t0, t1]} satisfy the constraints given in Definition 1.

We can now give compact criteria to verify a M+W-BQS fluid process to replace the multiple

equations in Definition 1. Our construction of the fluid process starting from state variables W (t)

and r(t) works by noting that W (t) and r(t) provide sufficient information to recover the other

state variables for the M+W-BQS fluid process.

Lemma 1 An a.e. differentiable HOL waiting-time vector process {W (t) | t ∈ [t0, t1]} is a fluid

process in an M+W-BQS if and only if there exists a piecewise continuous {r(t) | t ∈ [t0, t1]} such

that r(t) are the feasible routing rates with respect to W (t) for all t ∈ [t0, t1], and they satisfy

equation (10).

Proof.


“only if”: At the beginning of this section, we argued that rji(t) (j ∈ J , i ∈ I) must satisfy

conditions (4), (8), and (12) in order to satisfy the constraints in Definition 1.

“if”: We recover Q(t) using (6). By taking the derivative of (6), we obtain equation (9). Plugging

(10) into equation (9) then leads to (5). Conditions (4) and (8) follow from r(t) being feasible

routing rates with respect to W (t).

3. Transient Behavior of the Fluid Process

We start with a simple example to provide some intuition about the transient behavior of the fluid

process.

Example 1 Consider a system with two servers, J = {1,2}, and customers of two types, I =

{a, b}, with λa = 1, λb =10, µ1 = µ2 =0.5, and exponentially distributed abandonment time Fa(τ) =

Fb(τ) = 1− exp(−0.1τ). We assume Wa(0) =Wb(0) = 0 and L(1, a) = 0, L(1, b) =−100, L(2, a) = 0

and L(2, b) = −1. We assume that ga(WT) = gb(WT) = 2WT. According to this M+W index, at

time 0, we have s1a(0) = 0> s1b(0) =−100 and s2a(0) = 0> s2b(0) =−1 so that both servers only

serve queue a. Since the queues are overloaded, both Wa(t) and Wb(t) and all scores increase. Since

queue b does not receive any service, Wb(t) increases faster than Wa(t), and so s1b(t) and s2b(t)

increase faster than s1a(t) and s2a(t). Suppose at time t1, score s2b(t) = 2Wb(t)− 1 catches up with

s2a(t) = 2Wa(t), but that s1a(t1) = 2Wa(t) is still bigger than s1b(t1) = 2Wb(t)− 100. As a result,

after t1 server 2 will serve both b and a, and server 1 continues to serve only queue a. Because

both queues share server 2, their scores must become identical after t1, i.e., s2b(t)≡ s2a(t) for t in

some interval after t1.

Because queue b has a much larger arrival rate than queue a, eventually the tie between the

two queues for server 2 will break. That means, after some time t2 > t1, the score of queue b will

increase faster than that of queue a and s2b(t)> s2a(t) for t > t2, so servers 2 can no longer serve

both queues. Instead, after t2 server 1 and 2 will be dedicated to queues a and b, respectively. In

Figure 2 we depict the changes by showing how the routing components (the connected components

of the graph showing which servers are serving which queues) change over time. Note that it is

possible that there are further changes to the routing structure after t2 till a steady state is reached.

To provide some intuition towards how the fluid process of Section 2 approximates the original

stochastic process, consider the original stochastic process in Example 1. At t1 the difference

between the scores of queues a and queue b is small, and so at the top of the waitlist is a mixture of

customers from both queues. As a result, server 2 has to frequently switch back and forth between


(a) For t ∈ [0, t1) we have

s1a(t) > s1b(t) and s2a(t) >

s2b(t), and so both servers

serve queue a. The routing

components are {1,2, a} and

{b}.

(b) For t ∈ [t1, t2) we have

s2a(t) = s2b(t) and s1a(t) >

s1b(t), and so both queues

share server 2 and keep their

scores for server 2 tied, and

queue a is also served by server

1. The routing component is

{1,2, a, b}.

(c) For t ∈ [t2, t3) (for some

t3, possibly t3 = ∞) we have

s2a(t) < s2b(t) and s1a(t) >

s1b(t), and so queue a is served

by server 1, and queue b is

served by server 2. The rout-

ing components are {1, a} and

{2, b}.

Figure 2 How the routing components of Example 1 change over time.

the two queues. This situation also implies that the average change rates of the highest score in

the two queues must stay equal, otherwise one queue will soon dominate the other in the rankings

and prevent the other queue from getting served. However, this configuration will last only until

time t2, after which server 2 has to stop serving customers in queue a. Throughout the process,

the routing rates are not controlled by the service provider, but endogenously determined by the

highest-score-first-served rule.

Each of the three periods of Example 1 has a different routing structure. As illustrated in Figure 2

this can be represented by a bipartite graph with arcs showing which server(s) are serving which

queue(s) (referred to as basic activity in some literature, e.g., (Stolyar and Tezcan 2011)). The

connected components of this graph partition servers and queues into routing components. A more

rigorous definition of routing components is given later in Definition 5. In Example 1, t1 and t2 are

called switch times, at which the routing components change.

From the above example, we see that identifying the routing components is a key step in charac-

terizing the transient behavior of the fluid process. Once the routing components are determined,

we show below that the trajectory of Wi(t) and Qi(t) can be characterized as solutions to an

ordinary differentiable equation (ODE) subject to boundary conditions. The boundary conditions

specify at which switch times the routing components change, and consequently how the ODEs

have to be reformulated.


3.1. Construction of the Transient Solutions

We focus on constructing the fluid process described by the trajectory {W (t) | t≥ 0}. We identify

the rare condition under which the fluid process can have non-unique trajectories of {W (t) | t≥

0}, and in the absence of this condition show that the construction process guarantees that the

fluid process has a unique trajectory. After obtaining W (t), Q(t) can be uniquely computed from

W (t) using (6). We will show that the service rates r(t) might have multiple solutions for a given

trajectory W (t), but all the feasible solutions r(t) must lead to the same trajectories of W (t) and

Q(t). In later sections, when we refer to the unique fluid process, it always refers to the unique

state description using W (t) or Q(t).

The construction can compute the transient behavior of the fluid process over any finite horizon

[0, T ]. The choice of T depends on the time window of interest to the policy designer. For example,

the time window for developing a kidney allocation policy may be the next several decades, whereas

for the allocation of public housing it may only be the next several years.

Given W (0), we construct {W (t) | t∈ [0, T ]} via the following steps:

Step 0 Inductively suppose that we have already computed (W (t)) up to t = t0. Given W (t0), for

each server j ∈ J , construct active sets A(t0, j) according to (7) (or (EC.1) in the special case

when all queues in argmax {sji(t0) | i∈ I} are empty). Check whether the following condition

holds

∃j1 6= j2 ∈J , i∈A(t0, j1)∩A(t0, j2), such that Wi(t0) = 0 and λi(t0)<µj1(t0)+µj2(t0), (13)

where i∈A(t0, j1)∩A(t0, j2) means that the empty queue i is in the active set of both server j1

and j2. If (13) holds, Appendix EC.2 shows that the fluid process might admit more than one

trajectory of W (t), and so we terminate the construction process (because the construction

does not serve the purpose of being predictive when the trajectories are not unique). Otherwise,

proceed to Step 1.

Step 1 Given W (t0), we define and/or construct the following combinatorial structures: a rout-

ing graph G(t0), an associated parameterized network Gθ(t0), a partition of the vertices of

G(t0) into K different primitive components (Gk(t0))k=1,...,K , feasible routing-rates r(t) :=

(rji(t0))j∈J ,i∈I, and score-change rates θ(t0) := (θk)k=1,...,K . We prove structural properties

about the primitive components and feasible routing rates.

Step 2 If all score-change rates θk(t0)’s are distinct (which we call the non-degenerate case), then the

routing components in the next period starting from t0 are exactly the primitive components,

and we can skip this step and go to Step 3; otherwise, we call it the degenerate case, and we

compute the routing components (Gk(t0)) by merging certain primitive components.


Step 3 For each routing component that we constructed in Steps 1–2, we formulate an ODE and

identify the boundary conditions at which the routing components change. The solution to

the ODE characterizes the trajectories {W (t) | t ∈ [t0, t1]}, where t1 denotes the next switch

time when the solution hits a boundary. We update t0← t1 and if t1 <T we go back to Step 0.

Next, we provide more details on each step.

Step 0: Appendix EC.2 shows that the fluid process is likely to have more than one trajectory

of W (t) when condition (13) is satisfied, which results in ambiguity when one tries to use the fluid

model to predict the behavior of the waitlist system. However, having empty queues is unusual in

scenarios with overloaded queues, so we expect condition (13) to hold only in a few unusual cases.

Thus we do not expect this condition to be a significant impediment to applying the fluid model for

predictive analytics. The next proposition provides an easy-to-verify condition which guarantees

condition (13) to fail, so we do not have to worry about Algorithm 1 terminating at Step 0 without

successfully constructing a unique fluid process.

Proposition 1 Suppose that λi(t) ≡ λi and µj(t)≡ µj for all i ∈ I and j ∈ J . If condition (13)

fails at state 0 (0 denotes an all-zero vector of dimension I), then condition (13) can never be

satisfied at any HOL waiting-time vector W ≥ 0.

Proof. Proof by contradiction. Suppose condition (13) holds at some W ≥ 0. Then at W , an

empty queue, say queue ℓ, must be active for at least two different server j1, j2, and λℓ <µj1 +µj2 .

Note that Wℓ = 0 because ℓ is an empty buffer, whereas Wi ≥ 0 for all i 6= ℓ. Therefore, queue ℓ

can only be more competitive at state 0 compared to at state W . So queue ℓ has to remain active

for server j1 and j2 at 0, with λℓ <µj1 +µj2 , which contradicts the assumption that condition (13)

fails at 0.

From now on, we assume that condition (13) is never satisfied. We will show that then the

construction always returns a unique fluid process {W (t) | t≥ 0}.

Step 1: We next introduce a parameterized network flow framework (see Ahuja et al. (1993)

for background on network flows) which allows us to compute feasible routing rates r(t), score

change rates θ(t), and routing components for a given W (t). (Some of these tools were used in

the concurrent paper Adan and Weiss (2014) to analyze an FCFS-BQS, but the network we deal

with is more complicated due to the M+W indexing.) The components we compute using this

framework, which we call primitive components, may need to be merged in Step 2 to get the routing

components we are looking for.


Formally, given a fixed t and any W (t)≥ 0, we define the routing graph G(t) := (V,E(t)), with

vertex set V with nodes

V := {S}∪J ∪I ∪ {T}, (14)

where S and T are artificial source node and sink nodes, and directed arc set

E(W (t)) (or simply E(t)) := {(S, j) | j ∈J }∪ {(j, i) | i∈A(t, j)}∪ {(i, T ) | i∈ I}. (15)

We also define the following subset of E(t) by restricting to the arcs from I to J :

Eb(t) := {(j, i)∈E(t) | j ∈J , i∈ I}. (16)

Note that the arc sets E(t) and Eb(t) depends on the scores s(t), and thus W (t). Therefore, when

we later consider a different value of t, both arc sets can change, which is not the case in an FCFS-

BQS where the routing graph is invariant with time. Our routing graph is G(t) = (V,E(t)). Since

t is fixed, we often omit t and write G= (V,E) to simplify notation.

We define a sequence of (S,T )-networks (Gθ), whose capacities are parameterized by a real

number θ ∈ (−∞,+∞). To get an intuition for the meaning of θ, recall the definition of ϑWi(z)

from (11) for z =∑

j∈J rji(t). We will see that we can interpret rji(t) as a flow on arc (j, i), and so

by flow conservation z =∑

j∈J rji(t) is the flow on (i, T ). Thus we can think of ϑWi(x) as a function

whose input is a flow z on (i, T ), and whose output is a score change rate. Then its inverse, ϑ−1Wi

(θ),

is a function whose input is a target score change rate θ, and whose output is a flow z, and so we

can think of ϑ−1Wi

(θ) as the capacity of (i, T ). With this in mind, for each θ we define the capacity

for the flow on each arc e∈E as

ue(θ) =

µj when e= (S, j);∞ when e= (j, i)∈Eb;ϑ−1Wi

(θ) when e= (i, T );(17)

Since ϑWi(z) is an affine, strictly decreasing function of z, ϑ−1

Wi(·) is also affine and strictly decreasing.

It can happen that ϑ−1Wi

(θ) is negative, reflecting that even with zero service rate, the score change

rate of queue i is smaller than the target value θ. This would lead to a negative capacity on (i, T ),

which seems strange, but which occurs in other applications of parametric network flow. If one

prefers to avoid this possibly negative capacity on arc (i, T ), a standard trick is to introduce a

new arc (S, i) with capacity (ϑ−1Wi

(θ))− (the negative part of the capacity on (i, T ) in the original

graph), and assign the positive part (ϑ−1Wi

(θ))+ as the capacity of (i, T ). Our subsequent analysis

works for either formulation. The construction of Gθ is illustrated in Figure 3.

Recall that an (S,T ) cut in this network is a partition (A,B) of V such that S ∈A and T ∈B,

and every min (S,T ) cut has this form (all cuts discussed in this paper are (S,T ) cuts, and so


Figure 3 The max flow network (Gθ) with capacities parametric in θ. The parameterized upper bound ue(θ) is

labeled above each arc.

we use “cut” to mean “(S,T ) cut” from now on). We often refer to cut (A,B) as just A. It is

well-known that every max flow network has a unique minimal min cut Amin and maximal min

cut Amax. The parametric capacities in Gθ are strictly decreasing in θ on arcs (i, T ) and constant

elsewhere, which is a special case of the property called strict source-sink monotone (S-SSM) by

Granot et al. (2012). Property S-SSM implies that if A1 is a min cut for θ1, and A2 is a min cut

for θ2 > θ1, then Amax1 ⊆Amin

2 , i.e., that min cuts are nested.

When a max flow network has parametric capacities affine in θ that satisfy S-SSM, it is clear

that the graph of max flow value as a function of θ is piecewise linear with O(|V |) breakpoints

θ1 < θ2 < · · ·<θp. Denote the minimal (maximal) min cut at θ by Amin(θ) (Amax(θ)). Then we have

that both Amax(θk) and Amin(θk+1) (which could be the same) are min cuts for any θ ∈ [θk, θk+1],

which implies that Amin(θk) is a strict subset of Amax(θk) for all k. When S-SSM holds, Gallo

et al. (1989) gave an algorithm to compute all O(|V |) breakpoints and Amin(θk). Since our Gθ has

parametric capacities affine in θ that satisfy S-SSM, we can use the their algorithm to compute all

breakpoints θk and Amin(θk), k =1, . . . , p.

Our subsequent analysis also requires knowing all min cuts at each breakpoint θl. Given a max

flow X in a network, Picard and Queyranne (1982) showed how to compute a nested sequence of

min cuts Amin(θl) = Al0 ⊂ Al

1 ⊂ · · · ⊂ Alk = Amax(θl) that represent all min cuts. We insert these

between Amin(θl) and Amin(θl+1) = Amax(θl) in the sequence of Amin(θk) to get a grand nested

sequence Π of min cuts

Π= {(Ak,Bk) | {S}=A0 (A1 ( . . .(AK = V \{T}}. (18)


The algorithm in Gallo et al. (1989) is often referred to as “GGT” after the authors’ initials, so

we call the overall process of computing Π the GGT Algorithm. Since all cuts in Π are nested, it

contains only O(|V |) = O(I + J) =K min cuts. We re-define θk to be the value of θ where both

Ak−1 and Ak are min cuts, and so there can be several consecutive θk with the same value if a

breakpoint has more than two min cuts, which is referred to as the degenerate case. Based on Π,

we can partition the vertex set I ∪J into K subsets: Gk := Ak\Ak−1, k = 1, . . . ,K. We call each

subset Gk a primitive component.

We first summarize some useful properties of max flows in these parametric network that will

allow us to prove that routing-rates come from max flows. Recall that for cut A, δ+(A) is the

subset of arcs with their tail but not head in A (“exiting A”), and δ−(A) is the subset of arcs with

their head but not tail in A (“entering A”).

Lemma 2 Let Xk be a max flow in Gθ w.r.t. θ= θk, so that both Ak−1 and Ak are min cuts w.r.t.

Xk. Then

1. δ+(Ak−1)∩Eb = δ+(Ak)∩E

b = ∅;

2. Xke = 0 for all e∈ δ−(Ak−1 ∩E

b);

3. XkSj = µj(t) for all j ∈Gk;

4. XkiT = uiT (θ) = ϑ−1

Wi(θk) for all i∈Ak; and

5. Xk restricted to arcs of the subnetwork {S}∪Gk ∪{T} is again a max flow in the subnetwork,

and so∑

j∈Gk

µj =∑

i∈Gk

ϑ−1Wi

(θk). (19)

Proof. 1: Follows because ue =∞ for e∈Eb and no infinite-capacity arc can exit a min cut.

2. Follows because all arcs entering a min cut must have flow 0 in any max flow.

3. For j ∈Gk =Ak\Ak−1 it follows because then (S, j)∈ δ+(Ak−1) and so (S, j) must be saturated

by Xk. Note that this is true even for k=1 since A0 = {S} (since θ0 =−∞).

4. Follows because all arcs exiting a min cut must be saturated.

5. By 1 and 2 Xk puts no flow on any arc between the subnetwork and its complement, and so

Xk, and (4) above, all arcs exiting the subnetwork are saturated and so Xk must be a max flow.

Then (19) follows from 3 and flow conservation.

The properties of routing rates rji(t), score change rates θi(t), and their connections with the

breakpoints and primitive components are summarized in the next proposition. Figure 4 provides

an illustration of the construction of the primitive components and feasible routing rates.


(a) The Parameterized Network,Gθ with (ue(θ)) on

each arc e.

(b) When θ =−∞, the min cut A0 = {S}. All arcs

(S, j) are in δ+(A0). (Arcs (S, j) and (i, T ) have Xe

(ue(θ)) on them, and (j, i) has just Xe.)

(c) When θ increases to θ1 = −84, both A0 and

A1 = {S,2, a} are min cuts, and arc (a,T ) is satu-

rated.

(d) When θ increases to θ2 = −18, both A1 and

A2 = {S,1,2, a, b} are min cuts, and both (a,T ) and

(b,T ) are saturated.

(e) When θ increases to θ3 =−5, both A2 and A3 =

{S,1,2,3, a, b, c} are min cuts, and all (i, T ) arcs

are saturated.

(f) The GGT Algorithm computes the three prim-

itive components indicated by dotted lines. The

inter-component arc (3, b) goes from a faster com-

ponent to a slower one, and carries zero flow.

Figure 4 Construction of primitive components on Gθ and feasible routing rates. Bold arcs are those saturated

by a max flow. The feasible routing flows are labeled above each arc.


Proposition 2 The nested cuts Π and the sequence of breakpoints θ1 ≤ . . .≤ θK computed by the

GGT algorithm satisfy the following properties:

(a) Queues in the same primitive component Gk =Ak\Ak−1 have the same score change rate, i.e.,

θi = θk for all i ∈ Gk. Consequently, a primitive component with a smaller (larger) index k

corresponds to an equal or smaller (larger) score increment rate θk. If the score change rate

is smaller (larger), the primitive component with a smaller (larger) index is referred to as a

slower (faster) component.

We can compute θk(t) as follows:

θk(t) =(

∑

i∈Gk


g′i(Wi(t))

)−1 (∑

i∈Gkλi(t−Wi(t))F

Ci (Wi(t))−

∑

j∈Gkµj(t)

)

=: ΨGkW (t),

(20)

where the function ΨGkW (t) represents the score change rate for component Gk at state W (t)

when the supply fluid flows from outside into Gk at a net rate 0 (so queues in Gk are served

by its own servers at rate∑

j∈Gkµj(t)).

(b) Given W =Wi(t), the set of feasible routing-rate vectors is the following polytope:

Γ(W ) :=

r≥ 0

∣

∣

∣

∣

∣

∣

∑

{i|(j,i)∈E} rji = µj, ∀ j ∈J∑

{j|(j,i)∈E} rji = ϑ−1Wi

(θk), ∀ i∈Gk, k= 1,2, . . . ,K

rji = 0, ∀ j ∈Gk, i∈Gk′ , k 6= k′ or (j, i) /∈E

(21)

Since (21) restricted to the subnetwork {S} ∪Gk ∪ {T} w.r.t. θ = θk is just the constraints

describing a max flow Xk = {XkSj, X

kji, X

kiT | j, i∈Gk} on the subnetwork, any feasible routing-

rate vector rji can be constructed via

rji =

{

Xkji if j, i∈Gk for some k

0 if j, i not in the same Gk.(22)

(c) There are no arcs going from a slower component to a faster one. Any arc from a faster

component to a slower component must carry zero flow. As a result, no supply fluid can be sent

between different primitive components.

(d) We call H (Gk a closed subset of Gk if H contains no outgoing arc to other parts of Gk. If

H is a closed subset of Gk, then

∑

j∈H

µj <∑

i∈H

ϑ−1Wi

(θk). (23)

(e) Each primitive component Gk is connected by Eb. However, a connected component of Eb could

be the union of multiple primitive components.

Remark 1 According to Properties (a)–(c) in Proposition 2, each primitive component forms an

independent subsystem, in which queues share the same score-increment rate and are served only


by servers in the same component. Moreover, the partition of primitive components and the score

change rates in each component can be explicitly computed using the GGT algorithm, and the

feasible routing rates can then be computed from a max flow for each Gk. Properties (d) and (e)

will also be used in subsequent analysis.

Proof. To show that rji constructed in (22) are feasible routing rates, it suffices to prove that

they satisfy conditions (4), (8), and (12).

First, since the rji’s are constructed from the flows Xk, they take a positive values only for arcs

(j, i)∈Eb (or E). By the definition of E, a server is only connected to its active queues, so (j, i)∈E

if and only if i ∈A(t, j), which verifies (8).

Now (4) follows from flow conservation at node j and Lemma 2 part 3, and (12) follows from

Lemma 2 part 4 since it implies that all queues being served by server j ∈Ak have the common

score change rate of θk. This proves Properties (a) and (b). Property (c) follows from Lemma 2

parts 1 and 2.

To prove Property (d), consider the max flow Xk in Gθ w.r.t parameter θ= θk. Since max flow

Xk sends zero flow to other components, the RHS of inequality (23) exactly counts the total flow

out of subset H, whereas the LHS of inequality (23) is only part of the total flow into subset H,

and so∑

j∈H

µj ≤∑

i∈H

ϑ−1Wi

(θk). (24)

Notice that if (24) is not strict, then Ak−1 ∪H would satisfy complementary slackness w.r.t. Xk,

and so would also be a min cut. This contradicts the choice of Ak as a minimal superset of Ak−1

that is a min cut.

For Property (e), if Gk is not a connected component, then define H as one of the connected

components of Gk. Then H has no outgoing arcs to other parts of Gk, and so by the same argument

as Property (d), Ak−1 ∪H would also be a min cut for Xk, again a contradiction.

We note that different primitive components might be connected, for example, arc (1, a) in Figure

4 (f). In most cases, such inter-component arcs are from a (strictly) faster component to a slower

component, but will disappear right away because ties could not last between two queues with

different score change rate. In the degenerate case, an inter-component arc connects two primitive

components with the same θk. Then it is unclear whether this arc will remain there (so the two

primitive components get merged), or disappear and so disconnect the two components.


Step 2: Step 1 used the GGT algorithm to generate a sequence of nested cuts Ak and break-

points θk, k= 1, . . . ,K such that both Ak−1 and Ak are min cuts at θk. Based on the values of θk

we classify the routing graph at a given state W (t) as being degenerate or non-degenerate:

Definition 4 The network at time t given W (t) is non-degenerate if θ1 < θ2 < . . . < θK, and is

degenerate if θk = θk+1 for some k= 1,2, . . . ,K− 1. Equivalently, the network is non-degenerate at

t if and only if the parameterized network Gθ has at most two different min cuts for all parameter

values θ.

If the graph is non-degenerate, the primitive components are exactly the routing components

that we are looking for; otherwise, two or more primitive components correspond to the same

breakpoint θk. If these primitive components are also connected, then we do not know which

primitive component is faster at the next moment. When multiple primitive components with the

same score change rate are connected we need an additional procedure to determine which of these

components should be merged and which should stay separate during the next time period. Our

detailed discussion of the degenerate case is in Appendix EC.4.

Step 3: Intuitively, routing components specify which subset of servers are served by which

subset of queues. We next formally define the routing components. Recall that given a vector W (t),

its associated arc set E(W (t)) is defined in (15), which connects each server j to its active queues

according to their M+W scores.

Definition 5 Given a vector-valued process {W (t) := (Wi(t))i∈I | t ∈ [t0, t1]}, if its associated arc

set E(W (t)) is invariant over (t0, t1), then the connected components of Eb(t) are the routing

components over (t0, t1).

Since routing components are defined over an open interval (t0, t1) only when the arc set stays

invariant over that interval, the routing components are also invariant over that interval. This

definition allows E(t0) and E(t1) to differ from E(t) for t ∈ (t0, t1) without affecting the routing

components.

The next Proposition proposes a sufficient and necessary condition for a given piece of vector-

valued process to be the HOL waiting-time trajectory of the fluid process. It relies on using xk(t)

to denote the total amount of score change for queues in a routing component Gk from time t0 up

to time t.

Proposition 3 Suppose {W (t) | t ∈ [t0, t1]} is a continuous vector-valued process with associated

arc set E(t)≡E for all t∈ (t0, t1), and suppose {Gk}k=1,...,K are the routing components in (t0, t1).


Then {W (t) | t ∈ [t0, t1]} is the fluid process in an M+W-BQS over [t0, t1] if and only if for each

k= 1, . . . ,K, {W (t) | t∈ [t0, t1]} is the unique solution to the following ODE:

(ODE)

dxk(t)

dt= Ψ

Gk

W (t) for k= 1,2, . . . ,K

Wi(t) = g−1i [gi(Wi(t0))+xk(t)] for all i∈ Gk, k =1,2, . . . ,K

xk(0) = 0 for , k= 1,2, . . . ,K,

(25)

where the function ΨGk

W (t) is defined as in equation (20).

The proof of this Proposition is in Appendix EC.3.

Proposition 3 says that if the arc set is invariant over interval (t0, t1), then the fluid process

{W (t) | t∈ [t0, t1]} can be uniquely determined from the ODE (EC.2). The next proposition shows

that there always exists such an interval over which the arc set stays invariant. Define arc set

Eb(t+0 ) := {(j, i) ∈Eb(t0) | j, i∈Gk for some k}, (26)

i.e., Eb(t+0 ) is the set of edges in Eb(t0) inside the same primitive component.

Proposition 4 If the parameterized network at t0 is non-degenerate, then there exists an interval

(t0, t1) such that a unique fluid process {W (t) | t∈ [t0, t1]} exists as a solution to the ODE (EC.2),

in which {Gk | k= 1, . . . ,K} are exactly the primitive components of the parameterized network at

t0. Moreover, for all t∈ (t0, t1) we have Eb(t)≡Eb(t+0 ).

Proof.

We first prove the “uniqueness” part. If {W (t) | t∈ [t0, t1]} is a fluid process, then its associated

Eb(t) must be in the form of (26) for all t∈ [t0, t1], which contains only arcs within each primitive

component Gk (k= 1, . . . ,K). We now prove subclaims (a)–(c).

(a) Eb(t)⊆Eb(t0), that is, no new arcs can be added to Eb(t).

Suppose j ∈ J , i∈ I, but (j, i) /∈Eb(t0). According to our construction of Eb(t0), we have

δji(t0) :=maxℓ∈I

sjℓ(t0)− sji(t0)> 0. (27)

Because all scores sji(t) are continuous in t, δji(t) has to be continuous in t. Therefore, there

exists an interval [t0, t1] over which δji(t)> 0. By selecting t1 sufficiently small, all (j, i) /∈Eb(t0)

will continue to stay outside Eb(t) for t∈ (t0, t1). Note that the above argument actually proves

that Eb(t) (and E(t)) is upper hemicontinuous2 at all t.

(b) All inter-component arcs in Eb(t0) will not be contained in Eb(t).

Suppose (j, i)∈Eb(t0) and j, i are in two different primitive components Gk(t0) and Gk′(t0),

respectively, with k 6= k′. Since each primitive component is connected by Eb(t0), node j in Gk

must be connected to some node i′ ∈Gk. Thus, server j is sending supply fluid to both i and


i′. However, since i and i′ are in two different primitive components, which have different score

change rates by the non-degenerate assumption, i and i′ must also have different score change

rates. Since arcs can only go from a faster component to a slower component, i′ must have a

larger score change rate than i and thus (j, i) has to disappear after t0.

(c) All intra-component arcs in Eb(t0) will remain in Eb(t).

Proof by contradiction. If the above claim is not true, then there is a sequence of time points

{tℓ} approaching t0 such that at each of those time points, at least one arc (j, i) ∈ Eb(t0)

disappears. Whenever this happens, the primitive component containing (j, i) will be split into

disconnected sub-components G1, G2, . . . (if that component is still connected, our argument

at the beginning of Proof of Proposition 3 shows that all queues in that component will keep

their scores changing by the same amount, but then arc (j, i) cannot disappear). Therefore,

at each tℓ, we obtain a sequence of sub-components. For each sub-component, we calculate

the score change from t0 until tℓ, which is gi(tℓ)− gi(t0), with queue i in that sub-component.

Let G1ℓ denote the sub-component which has the minimal score change. Because G1ℓ only has a

finite number of possible choices, there must be a subsequence {tℓu} such that G1ℓ ≡:H at all

of those time points, where H must be the sub-component of one of the primitive components,

say, Gk without loss of generality.

We claim that H must be a closed subset of Gk with respect to Eb(t0). Otherwise, there

would be an arc e ∈ Eb(t0) that goes from H to other parts of Gk. By our construction of

H, queues in the other sub-components of Gk must have their scores increased faster than H

during the interval (t0, tℓu), so this edge e has to remain in Eb(tℓu) for all u=1,2, . . .. However,

if e ∈ Eb(tℓu), then H must be connected to other parts of Gk at tℓu, which contradicts that

H is disconnected from other parts of Gk. Therefore, we deduce that Eb(t0) contains no arcs

from H to other parts of Gk, and H thus has to be a closed subset of Gk at t0.

For u = 1,2, . . ., let θ(tℓu) denote the score change rate of all queues i ∈H at tℓu . By our

previous definition of ϑ−1Wi(tℓu)(·), the following equality holds at all times tℓu

∑

j∈H

µj(tℓu) =∑

i∈H

ϑ−1Wi(tℓu)(θ(tℓu))→

∑

i∈H

ϑ−1Wi(t0)

(θ), when u→∞, (28)

where θ ∈ [−∞,+∞] denote the limit of θ(tℓu), which exists by right continuity. We know

that θ must be no more than θk, because otherwise, in a sufficiently small interval of t0 the

score change rate of H will be strictly larger than θk, which will not allow H as the slowest

component at time points tℓu→ t0. Thus, we deduce that θ≤ θk and consequently,

∑

j∈H

µj(t0) =∑

i∈H

ϑ−1Wi(t0)

(θ)≥∑

i∈H

ϑ−1Wi(t0)

(θk), (29)


where the equality follows from (28), and the inequality follows from that ϑ−1Wi(t0)

(θ) is decreas-

ing in θ. Inequality (29) then contradicts Property (d) in Proposition 2.

We have thus proved that over a certain interval (t0, t1), Eb(t) is exactly the set of edges inside

each primitive component that is generated by the GGT algorithm at t0. As a result, the routing

components over (t0, t1) are exactly those primitive components. Then by Proposition 3, the fluid

process, if exists, must have W (t) solving the ODE over [t0, t1] which is formulated based on these

routing components.

We next prove the “existence” part. By formulating the ODE based on the primitive components

Gk, we could solve for the HOL waiting time trajectories {W (t) | t∈ [t0, t1]}. According to the ODE,

queues in the same primitive component will keep their scores tied so will stay connected. Moreover,

we have proved in the “uniqueness” case that no new arc will appear in (t0, t1) by selecting t1

sufficiently small, and arcs between different primitive components will disappear because of the

difference in their score change rate. Therefore, the primitive components will be the connected

components with respect to Eb(t) for t∈ (t0, t1), and thus are exactly the routing components. We

then invoke Proposition 3 to show that the {W (t) | t ∈ [t0, t1]} computed from the ODE is a fluid

process.

If G(t0) is degenerate, certain inter-component edges in Eb(t0) may remain in Eb(t) over some

interval [t0, t1]. We provide a method in Appendix EC.4 to find those inter-component edges. We

also show that Eb(t) stays invariant in some interval [t0, t1] in the degenerate case. Therefore, no

matter whether the graph at t0 is degenerate or non-degenerate, we can always construct an arc

set Eb(t+0 ) and routing components (Gk) which stay invariant over a certain interval (t0, t1) for

some t1 > t0. If we want to construct the fluid process over a finite horizon interval [0, T ], we

can first solve ODE (EC.2) over [t0, T ] and obtain {W (t) | t∈ [t0, T ]}, while being aware that this

construction may not be valid because the arc set and the routing components may change at some

time t∗ ∈ (t0, T ]. We call such a t∗ a switch time. There are three types of time points that could

possibly become a switch time:

• Type-1 : At a Type-1 switch time, an arc (j, i) not in the arc set Eb(t+0 ) is added into Eb(t+0 ).

Recall that (j, i) /∈Eb(t) if and only if δji(t)> 0, where δji(t) represents the difference between the

score of queue i and an active queue of j, which can be computed using equation (27) for a given

W (t). The next type-1 switch time after t0 can thus be characterized as

δji(t) = 0 for some (j, i) /∈Eb(t+0 ). (30)

A Type-1 switch happens when the score of one routing component grows faster and catches up

with other routing components, which results leads to a re-configuration of the routing graph. In


Example 1, t1 is a Type-1 switch time, at which the routing graph changes from Figure 2(a) to

Figure 2(b).

• Type-2 : Right after a Type-2 switch time, a connected component in E(t+0 ) splits into two or

more sub-components, and some edge(s) has to disappear. Recall that in the proof of Proposition

4, we showed that if any arc disappears, there must be a sequence of time points tℓu ↓ t, such that

there is a slowest closed subset H of some routing component Gk for which inequality (29) holds.

Thus, the Type-2 switch is characterized by the following condition,

∃ k and closed subset H ( Gk, such that∑

j∈Hµj ≥

∑

i∈Hϑ−1Wi(t)

(θk)(31)

The above condition cannot hold at t0, because Property (d) in Proposition 2 states that such a

closed subset H cannot exist in G(t0).

A type-2 switch happens when some subset of a routing component has their scores growing too

slowly (because they have too much service) to catch the other parts of the routing component and

get split off the original routing component. In Example 1, t2 is a Type-2 switch time, at which

the routing graph changes from Figure 2(b) to Figure 2(c).

• Type-3 : The discontinuous points of edge capacities {ue(t) | e ∈ E(t)} may also lead to a re-

configuration of the routing components. Since uSj(t) depends on µj(t), and uiT (t) depends on

λi(t−Wi(t)), a Type-3 switch time, which is the discontinuous point of the edge capacities, can be

characterized asλi(t−Wi(t)) is discontinuous at t for some i∈ Ior µj(t) is discontinuous at t for some j ∈J .

(32)

Since t−Wi(t) is non-decreasing in t, the number of Type-3 swich times has to be finite over any

finite interval.

The conditions we have listed above are necessary but not sufficient conditions for a switch time.

If we define

t∗ =min{t∈ (t0, T ] | (30), (31), or (32) holds}, (33)

then t∗ can be interpreted as the next time point which could possibly be a switch time. We call t∗ a

potential switch time. If such a potential switch time does not exist in [0, T ], then we know for sure

that the edge set and routing components will stay invariant in [0, T ] and thus the vector-valued

process {W (t) | t∈ (t0, T ]} constructed from the ODE is the unique fluid process by Proposition 3;

otherwise, we have to compute the new edge set and the routing component at t∗, and reformulate

the ODEs. To do that, we update t0← t∗ and repeat Steps 0–3 with the new t0 so the construction

can continue.

The entire procedure of constructing a fluid process {W (t) | t∈ [0, T ]} over a given finite interval

[0, T ] is summarized in Algorithm 1 below.


Algorithm 1: Construct a Fluid Process over (0, T ].

Data: T , W (0)

Result: {W (s) | t∈ [0, T ]}

Initialize: t0← 0, W (t0)← t0

Step 0: if Condition (13) holds thenTerminate Algorithm and return “{W (s) | t∈ [0, T ]} not unique”

end

Step 1: Construct E(t) via (15). Apply the GGT Algorithm to get a sequence of breakpoints

(θk)k=1,...,K and the corresponding nested cuts Ak, and compute the primitive components

Gk =Ak\Ak−1 for k=1, . . . ,K.

Step 2: if θk−1 = θk for any k=1, . . . ,K then

Generate Gk according to Proposition 6 in Appendix EC.4

else

Gk =Gk

end

Step 3: Calculate {W (t) | t∈ [t0, T ]} as the unique solution to the ODE (EC.2). Find a

potential switch point t∗ using (33).

if t∗ exists thent0← t∗, go to Step 0

elseTerminate the Algorithm and return {W (t) | t∈ [0, T ]}.

end

Algorithm 1 presents a constructive approach to prove the existence of the fluid process and

characterizes its dynamic behavior. Unfortunately, there is no performance guarantee for Algorithm

1 because (1) the accuracy of solving an ODE with boundary condition depends on the smoothness

of functions gi and FCi ; (2) the number of Type-3 switch times depends on the number of discontin-

uous points of the input arrival rates λi(t) and µj(t); (3) the number of Type-1 and Type-2 switch

times depends on smoothness of gi and FCi . By assuming that gi and FC

i are analytic functions,

we ensure that the fluid process has only a finite number of Type-1 and Type-2 switch times over

any finite interval, so the fluid process can be constructed by Algorithm 1 over any finite horizons.

Theorem 1 Given any T > 0 and W (0)≥ 0, if Algorithm 1 does not terminate because of condition

(13), then it constructs {(W (t) | 0≤ t≤ T}, which is the unique fluid process in an M+W-BSQ.

Proof.


Existence: Given t0, we may compute the routing component Gk in interval [t0, t1) by using

either the primitive component (non-degenerate case) or merged primitive components (degenerate

case). Identifying the routing components allows us to compute {W (t) | t∈ [t0, T )}, based on which

we can find the next switch time t∗ according to (33). If such a switch time does not exist during

[t0, T ], then it implies that over [t0, T ], the arc set stays invariant as E(t+0 ) and thus Proposition

3 states that {W (t) | t ∈ [t0, T )} is a fluid process over [t0, T ]; otherwise, we find the next switch

time t∗. By the definition of switch time, the arc set and routing components stay invariant over

[t0, t∗], and thus the constructed W (t) over [t0, t

∗] is a fluid process according to Proposition 3. It

remains to show that there is a finite number of switch points over a finite horizon [0, T ]. Then by

repeating the above procedure by a finite number of times, our construction can cover [0, T ].

Proof by contradiction. If there is an infinite number of switch times in [0, T ], then there must be

a sequence of switch points tℓ→ T ∗. We do not need to consider the Type-3 switch times because

there are only finite number of discontinuous points for λi(t) and µj(t). Note that if tℓ is a Type-1

switch time, an arc is added to E(tℓ); if tℓ is a Type-2 switch time, an arc is removed from E(t+ℓ ).

Since there are finitely many arcs to add (or to remove), at least one arc (j, i) must have been

repeatedly added and removed to the arc set at a sequence of Type-1 and Type-2 switch times,

which converge to T ∗. Because the arc set E(tℓu) has finite size, there must be two sub-sequence

of switch times {tvℓu | u = 1,2, . . .} (v = 1,2) such that E(tvℓu) ≡ Ev (v = 1,2), and (j, i) ∈ E1 but

(j, i) /∈E2.

As in the proof of Proposition 4 we have shown that the arc set E(t) is upper hemicontinuous at

all T ∗, so (j, i)∈E1 ⊆E(T ∗). If (j, i) is an intra-component edge, it must remain in a neighborhood

of T ∗ by our argument in the proof of Proposition 4, case (c); otherwise, (j, i) is an inter-component

edge. According to Proposition 6 in Appendix EC.4, an inter-component edge either remains in the

graph, or disappears in a neighborhood of t0 (by using the property that all gi(·) and FCi (·) (i∈ I)

are analytic functions), which contradicts that (j, i) appears at time points E(t1ℓu) and disappears

at E(t2ℓu) for both sequences tvℓu→ T ∗ (v= 1,2). So we proved that such a limiting point of switch

points T ∗ cannot exist, and therefore there are only finitely many switch times over any finite

horizon.

Uniqueness: Suppose {W (t) | t ∈ [0, T ]} and {W (t) | t ∈ [0, T ]} are two different fluid processes

and W (0) = W (0). Let t0 := inf{t |W (t) 6= W (t)}. We claim that W (t0) = W (t0). Otherwise, conti-

nuity ofW (t) implies thatW (t0−∆t) 6= W (t0−∆t) for some sufficiently small ∆t, which contradicts

the definition of t0. Then by Proposition 3, the fluid process is unique over an interval (t0, t1). Thus

W (t)≡ W (t) for t∈ [t0, t1), which then contradicts the definition of t0. Thus the fluid process must

be unique.


3.2. Calculating Performance Metrics for a Fluid Process

The construction of the fluid process enables the social planner to calculate certain performance

metrics based on the fluid process and to evaluate M+W scoring rules. In the setting of scarce

resource allocation, we assume that the system designer is concerned with two major objectives:

efficiency (Ef) and fairness (Fa). In this section, we assume that the system designer is only inter-

ested in the performance of the BQS over a finite horizon [0, T ]. The fluid process we constructed

earlier can then be used to predict the system performance of a particular M+W indexing policy

over that horizon.

There is no universal consensus in the literature on a measure of fairness (Fa). Here we propose

one possible way to measure fairness using the variance of the likelihood of getting service across

all customers. To calculate this variance, we first note that the average likelihood of getting service

is simply the ratio of supply versus demand, that is,

P (T ) =µ

λ. (34)

Thus the variance can be computed as

Fa(T ) :=−∑

i∈I

λi

λ

(

∫ T

0

∑

j∈Jrji(t)dt

Tλi

−P (T )

)2

. (35)

Alternatively, one may measure fairness by the variance in waiting times (Zenios et al. 2000), or

by the minimum likelihood of getting service among all types of customers. Most of those fairness

metrics can be represented as functions of Wi(t) and∑

j∈J rji. The latter represents the aggregate

service rate for each queue and has a one-to-one mapping with W ′i (t) by (10). Therefore, the

performance of the system under the above fairness metrics can be determined by the fluid process

{W (t) | t∈ [0, T ]}.

Compared to the measurements of fairness, there is more agreement on measuring efficiency (Ef)

as the expected utility for all resource-customer matches over [0, T ]:

Ef(T ) :=

∑

i∈I, j∈J

∫ T

0U(j, i)rji(t)dt

µT. (36)

Given a fluid process {W (t) | t∈ [0, T ]}, the corresponding service process rates r(t) do not have

to be unique. In fact, at each W (t), the set of feasible routing rates is a polytope Γ(W (t)) which

is characterized in Proposition 2 property (b). In the proof of Proposition 3, we showed that by

specifying a certain lexicographic order over different routing-rate vectors, we can always pick the

maximum rmax(t) with respect to that order, and rmin(t) is piecewise continuous.

If we define the lexicographic order according to the entries of the utility matrix U (ties are

broken by a fixed order) as

(j, i)≻ (j′, i′) iff U(j, i)>U(j, i′), (37)


then the lex-max rmax(t) is also the solution to the following problem

maxr(t)∈Γ(W (t)

∑

j,i

rji(t)U(j, i). (38)

Thus, the expected utility in the fluid model is maximized by choosing the maximum routing rates

rmax(t) with respect to the lexicographic order (37). Similarly, the expected utility is minimized at

routing rates rmin(t) which is the minimal routing rate-vector with respect to that order. Using

rmax(t) and rmin(t), we can derive the best- and worst-case bounds for the efficiency level for the

BQS fluid model that can be achieved by a given M+W index.

Corollary 1 The efficiency (Ef) of an M+W-BQS fluid model, if measured by (36), has the fol-

lowing range

Ef(T )∈

[

∑

i∈I, j∈J

∫ T

0U(j, i)rmin

ji (t)dt

µT,

∑

i∈I, j∈J

∫ T

0U(j, i)rmax

ji (t)dt

µT

]

. (39)

Proof. We argued that both rmin(t) and rmax(t) are feasible routing rates and piecewise

continuous on [0, T ]. Thus, the lower and upper bounds are both attainable by rmin(t) and rmax(t),

respectively. If we define the convex combination of rmin and rmin as

rc(t) := crmin(t)+ (1− c)rmax(t),

then the mean-value theorem implies that any value between the upper and lower bounds can be

attained by rc(t) for some c∈ [0,1]. Finally, we know rc ∈ Γ(W (t)) by the convexity of Γ(W (t)), and

is piecewise continuous over [0, T ], so {rc(t) | t ∈ [0, T ]} satisfies the criteria for being the service

rates of a valid fluid process.

Once an M+W indexing rule is implemented, the selection of r(t)∈ Γ(W (t)) is determined by the

stochastic behavior of the system as argued in (Talreja and Whitt 2008), and therefore is out of the

system manager’s control. However, the system manager can eliminate some arcs from the routing

graph by modifying the corresponding matching score L(j, i), so that on the remaining graph only

the optimal r(t) is feasible. More details about this technique will be discussed in Section 4.

3.3. An Application to an Overloaded FCFS-BQS

As we noted in Section 1.1, an FCFS-BQS is a special case of M+W-BQS by defining the M+W

index as follows,

gi(WT)=WT, L(j, i) =

{

0 if (j, i)∈Eb

−∞ if (j, i) /∈Eb , (40)


where Eb denotes the set of compatible server-customer pairs in the FCFS-BQS. We can define

a fluid process in an FCFS-BQS in the same way as in an M+W BQS. A fluid process in an

FCFS-BQS is said to be globally FCFS if all queues have equal HOL waiting times at all the times

(See Lemma 1 in (Talreja and Whitt 2008)).

Talreja and Whitt (2008) explored the question of when a BQS is globally FCFS, and derive

certain sufficient conditions for global FCFS to hold at the steady state when the bipartite graph

is fully connected or sparsely connected by Eb. The machinery developed in this paper allows

us to extend the result by Talreja and Whitt (2008) by providing two sufficient and necessary

conditions for global FCFS for general bipartite graphs. We need some definitions to express the

conditions. Define λi(t,W1(t)) := λi(t−W1(t))FCi (W1(t)) for all (i ∈ I), and define λ(t,Wi(t)) :=

∑

i∈I λi(t,W1(t)). Following Cong et al. (2015) we say that the arrival rates {λi(t,W1(t))}i∈I and

{µj(t)}j∈J satisfy the Generalized Chaining Condition(GCC) if ∃ θ1 > 0, such that (1− θ1)λ(t) =

µ(t), and for all non-empty subsets A of I,

(1− θ1)∑

i∈∪j∈AI(j)

λi(t)≥∑

j∈A

µj, (41)

where I(j) denotes the set of customer types compatible with server j.

Corollary 2 Suppose a fluid process in FCFS BQS has initial state Wi(0) =W1(0) for all i ∈ I

and λ(t)>µ(t) at all t. Then the following statements are equivalent:

1. The fluid process is globally FCFS.

2. The fluid process is exactly the one constructed by Algorithm 1 using the M+W index (40),

which has Wi(t) =W1(t) for all i∈ I, and W1(t) is the unique solution to the following integral

equation

W1(t) =W1(0)+

∫ 1

0

(1−µ(t)

λ(t,W1(t)))dt. (42)

3. Given W1(t) constructed as above, the arrival rates {λi(t,W1(t))}i∈I and {µj(t)}j∈J satisfy

the GCC.

The proof of this Corollary is in Appendix EC.5.

4. Steady State of the Fluid Process

Until now we allowed arrival and service rates to vary with time. In this section, by contrast, in

order to analyze the steady state we assume throughout that the BQS has stationary arrival rates,

that is, λi(t) ≡ λi (i ∈ I), and service rates µj(t) ≡ µj (j ∈ J ). We further assume that the cdf

fi(·) has compact support interval [0,W i] with W i <∞. We make this assumption to rule out the

case when Wi(t)→+∞ and a steady state may not exist. We impose the sufficient condition that


we made in the Proposition 1 such that (13) fails at state 0 to make sure that the steady state is

unique.

An HOL waiting-time vector W ∗ ∈ [0,+∞)I is said to represent a steady state of the fluid process

in an M+W-BQS if and only if starting with W (0) = W ∗, the unique fluid process is given by

W (t) ≡W (0) for all t ≥ 0. A matrix r∗ := (r∗ji)i∈I,j∈J is defined as the steady-state routing-rate

vector, if r∗ is a feasible routing-rate vector at W ∗, which means that r∗ has to satisfy conditions

(4), (8), and (12).

At the moment we don’t know whether a steady state W ∗ exists, and if they do, how one

may compute them. However, we can use the definition of steady state to develop properties

W ∗ would have to satisfy. By the definition of steady state, we know that, w.r.t. steady-state

routing rates r∗ji corresponding to W ∗, the inflow rate λi = λi(FCi (W ∗

i )+Fi(W∗i )) and outflow rate

λiFi(W∗i )+

∑

j∈J r∗ji must be equal at each buffer i at the steady state, and so that

λiFCi (W ∗

i ) =∑

j∈J r∗ji, ∀i∈ I. (43)

We could also derive (43) from (11) by observing that θi =0 for all i∈ I at W ∗.

We can invert (43) to get

W ∗i = (FC

i )−1

(

∑

j∈J r∗jiλi

)

, (44)

where (FCi )−1 is well defined over domain [0,1] by its strict monotonicity. This is useful because

we know from Section 3 that routing rates correspond to flows in a bipartite max flow network.

Thus we define the max flow network G∗ := (V,E,u∗,C) as follows. Vertex set V is as defined in

(14). Arc set E is defined as

E := {(S, j) | j ∈J }∪ {(j, i) | j ∈J , i∈ I}∪ {(i, T ) | i∈ I}, (45)

i.e., E consists of all possible arcs in the (S,T )-network, which differs from the arc set E(t) that

we defined in the transient analysis. Arc capacities u∗e for each arc e are

u∗e =

µj if e= (S, j);∞ if e= (j, i);λi if e= (i, T ).

(46)

We now need to define non-linear costs on G∗ such when we compute optimal flows X∗e on G∗,

when we set r∗ji =X∗ji for all j ∈J and i∈ I and compute W ∗ via (44), we will indeed get steady-

state waiting times. When we write down the first-order conditions for optimality (see (51) below),

we discover that if X∗ji > 0 we need that the score of (j, i) is L(j, i) + gi(W

∗i ), which should come

from the negatives of the derivatives of Cji(X∗ji) and CiT (X

∗iT ) (the negative comes from the fact

that we maximize scores, but minimize the cost of flows). Thus we choose CiT (X∗iT ) such that its


derivative matches −gi(W∗i ) (with W ∗ as defined in (44)), and we set Cji(X

∗ji) so that its derivative

matches −L(j, i). Therefore when the flow on arc e is Xe, its cost is Ce(Xe), defined as

Ce(Xe) =

0 if e= (S, j);−L(j, i)Xe if e= (j, i);

−∫ Xe

0gi(

(FCi )−1( u

λi))

du if e= (i, T ).(47)

Since gi(

(FCi )−1(·)

)

is strictly decreasing in its domain [0,1], CiT (XiT ) =−∫ XiT

0gi(

(FCi )−1( u

λi))

du

has strictly increasing derivatives and is therefore strictly convex over [0, u∗iT (= λi)]. See Figure 5

for an example of this construction.

Figure 5 The (S,T )-network we construct for computing the steady state. On a few representative arcs we put

the cost function outside the parenthesis, and the capacity inside the parenthesis.

Theorem 2 below will show that this indeed works. This will imply the following algorithm for

computing a steady-state routing-rate vector r∗ and HOL waiting-time vector W ∗: We compute a

min-convex-cost flow X∗ in G∗ and set r∗ji =X∗ji for j ∈ J and i ∈ I. Then we compute W ∗ from

r∗ via (44).

After we compute W ∗, we can then define E∗ :=E(W ∗) as the arc set with respect to W ∗ using

definition (15) Theorem 2 below shows that this W ∗ is in fact the unique steady-state waiting-time

vector. Since the score change rates of all queues equal zero, condition (12) holds trivially at the

steady state. The steady-state routing-rate vector r∗ can thus be characterized by condition (4),

(8), and (43). The set of steady-state routing-rate vectors can thus be formulated as the polytope

Γ∗(W ∗) :=

r≥ 0

∣

∣

∣

∣

∣

∣

∑

i∈I rji = µj, ∀ j ∈J∑

j∈J rji = λiFCi (W ∗

i ) ∀ i∈ Irji = 0, ∀ (j, i) /∈E∗

. (48)


When Γ∗(W ∗) contains multiple members, the proof of Theorem 2 shows that finding one that

maximizes the total matching score,∑

j∈J ,i∈I L(j, i)r∗ji, has particular significance, as we show later

in this section. The problem of searching for such an r∗ can be formulated as the linear program

(LP)max

∑

j∈J ,i∈I L(j, i)r∗ji

s.t. r∗ ∈ Γ∗(W ∗).(49)

We call the solution to the LP in (49) a maximum steady-state routing-rate vector.

The next theorem characterizes W ∗ and a maximum steady-state routing-rate vector r∗ via a

min-convex-cost flow solution X∗ to G∗. It shows that there can be multiple steady-state routing-

rate vectors r∗, but the steady-state HOL waiting time W ∗, and thus the steady-state Q∗, must

be unique.

Theorem 2 1. The fluid process under an M+W index (1) always has a unique steady state

W ∗.

2. A flow X∗ := {X∗e | e ∈E} is a min-convex-cost flow on G∗ if and only if {Xji | j ∈ J, i ∈ I}

is a maximum steady-state routing-rate vector.

Proof. We first characterize solutions to the min-convex-cost flow problem via KKT points

(i.e., points that satisfies the above KKT conditions). To do this, we formulate the min-convex-cost

flow problem as the convex program:

min∑

i∈I,j∈J −L(j, i)Xji−∫

∑j∈J Xji

0gi(

(FCi )−1( u

λi))

du

s.t.∑

i∈I Xji = µj

Xji ≥ 0.

(50)

Let sj and νji denote the dual variables that correspond to constraints∑

i∈I Xji = µj and Xji ≥ 0,

respectively. Then a solution to problem (50) must satisfy the following KKT necessary conditions

(Nocedal and Wright 2006)

νji = sj −(

L(j, i)+ gi(

(FCi )−1(

∑j∈J Xji

λi)))

, ∀j ∈ J , i∈ I first order condition (FOC)∑

i∈I Xji = µj, ∀j ∈ J , Xji ≥ 0, ∀i∈ I, j ∈ J primal feasibilityνji ≥ 0, ∀j ∈ J , i∈ I dual feasibilityXji > 0⇒ νji = 0, ∀j ∈J , i∈ I complementary slackness.

(51)

In particular, sj can be interpreted as maxi∈I sji, the highest score with respect to server j.

If we define W ∗i from (44), then the above KKT conditions are equivalent to conditions (4), (8),

and (43). Specifically, primary feasibility is equivalent to the budget constraint (4); the FOC, dual

feasibility, and complementary slackness are equivalent to condition (8); and the definition of Wi

in (44) is equivalent to condition (43). Thus, there is a one-to-one correspondence between a KKT

point and a steady-state routing-rate vector.


We now show that all KKT points X∗ have the same value of∑

j∈J X∗ji =X∗

iT for all i ∈ I, so

that all KKT points lead to the same W ∗i if constructed as in (44). We prove this by contradiction.

Suppose that X1 and X2 are two optimal min-convex-cost flows with, say, X1iT <X2

iT . To get flow

conservation at all nodes, we consider the extended network with arc (T,S) that carries the total

flow through the network. Network flow theory then says that there is a cycle D in the extended

network containing (i, T ) forward such that X1e <X2

e for all forward arcs of D, and X1e >X2

e for

all backward arcs of D, and some α> 0 such that X1+βχ(D) is an optimal flow for all 0≤ β ≤ α,

implying that

C(X1 +βχ(D))= 0 for all 0≤ β ≤ α. (52)

Cycle D includes exactly two arcs incident to T . One such arc is (i, T ). The other arc cannot

be (T,S) as a forward arc, since both X1 and X2 are max flows due to primal feasibility in (51).

Thus D must include an arc (k,T ) as a backward arc. Then all arcs in D other than (i, T ) and

(k,T ) have linear cost. This would imply from (52) that −CiT (X1iT + β)−CkT (X

1kT − β) equals

∑

{e∈D|e6=(i,T ),(k,T )} ce(X1e + βχ(D)e) for all 0 ≤ β ≤ α, i.e., that CiT (X

1iT + β) + CkT (X

1kT − β) is

linear in this interval, contradicting that CiT (XiT ) and CkT (XkT ) are strictly convex. Thus all KKT

points X∗ must indeed have the same value for X∗iT for all i ∈ I. Once we finish proving part 2,

this will prove part 1.

To find a min-convex-cost flow, it suffices to search all KKT points to find the one that mini-

mizes the total cost. Because all the KKT points have the same value of W ∗, only the first term∑

i∈I,j∈J −L(j, i)Xji varies with the choice of X in (50). Therefore, to find the min-convex-cost

flow, it suffices to select a KKT point that minimizes the first term, or equivalently maximizes the

objective in the LP (49). Since the set of KKT points is exactly the set of feasible routing rates

Γ∗(W ∗), finding a min-convex-cost flow is equivalent to solving the LP (49), which finishes the

proof of part 2.

Since min-convex-cost flow can be computed to accuracy ǫ (i.e., the best solution that is an

integer multiple of ǫ) in time polynomial in 1/ǫ (Ahuja et al. 1993), Theorem 2 provides a useful

technique to compute a maximum steady-state routing-rate vector and the unique steady state.

4.1. Searching for an M+W Index that optimizes the Steady-State Performance

Given an M+W index, by leveraging Theorem 2, we can find a unique steady state W ∗ and the set

of steady-state routing-rate vectors Γ∗(W ∗). Given W ∗ and any r∗ ∈ Γ∗(W ∗), we can then measure

efficiency (Ef) and fairness (Fa) of the system according to certain metrics, which are usually


functions of W ∗ and r∗. The question in this section is to find an M+W index that optimizes the

steady-state performance of the fluid process with respect to these measurements.

Suppose that we have functions EfW∗,r∗(∞) and FaW∗,r∗(∞) which measure the average system

efficiency and fairness at the steady state, respectively, which are equivalent to the metrics given

in (36) and (35) when T →∞. If we use scalar parameter η to combine these functions, then we

can formulate this question as the multi-objective planning problem

max EfW∗,r∗(∞)+ ηFaW∗,r∗(∞)s.t. W ∗ is the steady state under a certain M+W index

r∗ is the unique steady-state routing-rate vector.(53)

Theorem 2 implies that there can be multiple steady-state routing-rate vectors in general, so (53)

asks us to search for a particular M+W index under which Γ∗(r∗) is a singleton, which is the

maximum steady-state routing-rate vector. Thus, we ensure that sub-optimal steady-state routing-

rate vectors would never be used.

For general functional forms of EfW∗,r∗(∞) and FaW∗,r∗(∞), (53) is likely to be intractable.

However, for a special class of functions EfW∗,r∗(∞) and FaW∗,r∗(∞), a closed-formM+W index can

be derived under which (W ∗, r∗) solves the optimization problem (53). Specifically, we consider the

following efficiency and fairness measures, where EfW∗,r∗(∞) solely depends on r∗ and FaW∗,r∗(∞)

solely depends on W ∗,Efr∗(∞) =

∑

j∈J , i∈I U(j, i)r∗jiFaW∗(∞) =−

∑

i∈Iλi

λ

(

FCi (W ∗

i )−µ

λ

)2 (54)

Note that function Efr∗(∞) calculates the average matching utility at the steady state, and

FaW∗(∞) measures the variance in the likelihood getting service before abandoning the queue, in

which λµrepresents the mean likelihood of getting service.

We solve the multi-objective planning problem (53) in two steps.

Step 1. Solve an auxiliary problem: If we relax the second constraint in (53) by allowing multiple

feasible routing-rate vectors at the steady state, we obtain an auxiliary problem

max Efr∗(∞)+ ηFaW∗(∞)s.t. W ∗ is the steady state under a certain M+W index

r∗ is a steady-state routing-rate vector.(55)

The following Proposition derives a closed-form M+W index at which W ∗ and r∗ solve the above

auxiliary problem.

Proposition 5 Let W ∗ and r∗ denote the steady state and the maximum steady-state routing-rate

vector of the fluid process under the M+W index

sji =U(j, i)+ η2

λFi(WT). (56)

Then (W ∗, r∗) solves (55).


Proof. We establish an equivalence between the auxiliary problem (55) and the min-convex-

cost flow problem on G∗ so that we can apply Theorem 2 to construct a steady-state routing-rate

vector which solves the auxiliary problem.

If (W ∗, r∗) is a feasible solution to the auxiliary problem (55), it must satisfy the stationarity

condition (43). Plugging equation (43) into the expression of Fa∗(·) in (35) leads to

Fa∗(W ∗) = −∑

i∈Iλi

λ

(∑j∈J r∗ji

λi− µ

λ

)2

= −∑

i∈I

(∑

j∈J r∗ji)2

λλi+(µ

λ)2.

(57)

Removing the constant term (µλ)2 from the objective function of (55) will not change the optimal

solution. Thus, optimizing (55) is equivalent to solving the following problem

max∑

j∈J ,i∈I r∗jiUji−

∑

i∈I

(∑

j∈J r∗ji)2

λλi

s.t. r∗is a steady-state routing-rate vector.(58)

The constraint of the above problem requires r∗ to satisfy conditions (4), (8), and (43). Condition

(43) holds trivially because we have plugged (43) into the objective (58) to make it a function of

r∗. By removing constraint (8), we obtain the following relaxation of (58),

max∑

j∈J ,i∈I r∗jiUji−

∑

i∈I

(∑

j∈J r∗ji)2

λλi

s.t.∑

i∈I r∗ji = µj, ∀ j ∈J

(59)

If we define CSj = 0, Cji =−Uji and CiT (x) =x2

λλifor all j ∈J , i∈ I, then solving (59) is equivalent

to solving a min-convex-cost flow on network G∗(V,E,u∗,C), where V , E, and u are constructed

according to (14), (45), and (46).

Note that by defining gi(WT)=− 2λFC

i (WT), the above cost function has the alternative expres-

sion for all i∈ I

CiT (x) =x2

λλi

=−

∫ x

0

(−2

λ)u

λi

du=−

∫ x

0

gi(

(FCi )−1(

u

λi

))

du (60)

which coincides with the cost function we defined in (47). Then by invoking Theorem 2, the min-

convex-cost flow on G∗ gives a feasible routing rate vector r∗ associated with some steady state

W ∗ under the following M+W index

L(j, i)+ ηgi(WT)=L(j, i)− η2

λFC

i (WT). (61)

Note the above M+W index is equivalent to the M+W index (56) because the two M+W indices

differ by a constant 1. We propose to use (56) instead of (61) to comply with the assumption

gi(0) = 0.

Thus we have proved that (i) a min-convex-cost flow on G∗ provides an optimal solution to the

relaxed problem (59); and (ii) the min-convex-cost flow corresponds to a steady-state routing-rate


vector r∗ under the M+W index (63). Therefore we deduce that r∗ maximizes the problem (59),

which is a relaxation of (58). From (2), we further deduce that r∗ is feasible solution to problem

(58). Thus, r∗ must be the optimal solution to problem (58), and the equivalent optimization

problem (55).

Step 2. Removing Sub-optimal r∗s: Using the M+W index (56), one may obtain steady state W ∗

and r∗ ∈ Γ∗(W ∗) which solves the auxiliary problem (55). However, Γ∗(W ∗) may contain multiple

feasible routing-rate vectors, some of which might be sub-optimal in maximizing Efr∗(∞). The

selection of r∗ depends on the stochastic nature of the system (see discussions in Talreja and Whitt

(2008)), so it is not guaranteed that the optimal r∗ would be picked by the fluid process.

To address this issue, we modify the index formula (56) so that under the revised M+W index,

the maximum steady-state routing-rate vector will be the unique steady-state routing-rate vector.

Let rmfs denote a minimal-facet solution to the LP (49), which means that rmfs is an extreme point

of the polytope G∗(W ∗). Given rmfs, we construct a J by I perturbation matrix ∆ with

∆(j, i) =

{

−ǫ if rmfsji = 0 and (j, i)∈E∗

0 otherwise(62)

where ǫ is a positive constant. After changing the matching-score matrix from U to U +∆, any arc

(j, i) that is not used by rmfs has to be removed, because the score of queue i with respect to server

j has decreased by ǫ. By doing that, we make sure that rmfs is the unique steady-state routing-rate

vector under the modified index.

Theorem 3 Suppose W ∗ is the steady state of the fluid process under the M+W index (56) and

rmfs denotes a minimal facet solution to the LP (49). Define ∆ according to equation (62), and

define a new M+W index

sji =U(j, i)+∆(j, i)+ η2

λFi(WT). (63)

Then (W ∗, rmfs) are the unique steady-state HOL waiting-time and routing-rate vectors under the

above M+W index, and thus solve the multi-objective planning problem (53).

Proof. Since W ∗ is the steady state under the M+W index (56) and rmfs is the associated

maximum steady-state routing-rate vector, by Proposition 5 they must solve the auxiliary problem

(55), which is a relaxation of the original problem (53). We next show that (W ∗, rmfs) is feasible to

(53), and thus must also solve (53).

We first show that (W ∗, rmfs) are the HOL waiting-time and steady-state routing-rate vectors

under the above M+W index. Suppose rmfsji > 0, which implies that i is in the active set of server

j. For all other i′ 6= i in the active set of server j, their scores sj,i′ can only decrease after using the


matching score U +∆ rather than U . Thus, if rmfsji > 0, then queue i has to remain in the active set

of server j under the new index. Thus, rmfs and W ∗ still satisfy condition (8), which says server j

only serves queues in its active set. Since the values of rmfs are not changed, conditions (4) and (43)

remain valid. Thus, rmfs satisfies conditions (4), (8), and (12), and is a steady-state routing-rate

vector. Condition (43) also implies that W ∗ is a steady state under the new index, which is also

unique by Theorem 2.

It remains to show that rmfs is the unique steady-state routing-rate vector under the new index

(63). First, we note that the new arc set under (63) is given by

E∗ :=E∗\{(j, i) ∈E∗ | rmfsji = 0}. (64)

Suppose rmfs+∆r is another steady-state routing-rate vector associated with (sji) for some ∆r :=

(∆rji)j∈J ,i∈ I 6= 0. If rmfsji = 0 for some (j, i), then we know (j, i) /∈ E∗, so ∆rji = 0. If rmfs

ji = µj,

then rmfsji′ = 0 for all i′ 6= i, and thus (j, i′) /∈ E∗. Thus, (j, i) is the only arc in E∗ leaving j. Thus,

if rmfsji = µj, we must have rmfs

ji +∆rji = µj, which implies that ∆rji = 0. Thus, all inequalities

that are binding for rmfs will still be binding for rmfs + t∆, so rmfs + t∆ is on the minimal facet

which contains rmfs. By choosing |t| sufficiently small, we ensure that rmfs + t∆ lies in the relative

interior of the minimal facet. If∑

jiUji∆ji 6= 0, then for either t > 0 or t < 0, rmfs+ t∆ improves the

objective of the LP, which contradicts with the optimality of rmfs; if∑

jiUji∆ji =0, then rmfs± t∆

are both maximizers of the LP for sufficiently small t > 0. Since rmfs can be expressed as convex

combination of rmfs± t∆, it contradicts that rmfs is an extreme point of the polytope.

4.2. Convergence to Steady State

Finally, we prove that the fluid process will eventually converge to the unique steady state when

the pdf of abandoment time fi(·) has compact support set [0,W i] for some Wi(t)<∞. Therefore

it is reasonable to believe that the steady state provides an approximation to the long-run average

performance of the fluid process.

Theorem 4 Suppose W i <∞ for all i∈ I, then W (t)→W ∗ when t→∞, where W ∗ denotes the

steady state constructed as in Theorem 2.

The proof of this theorem is in Appendix EC.6.


5. Case Study: Public Housing Assignment in Pittsburgh

In this section we return to one of our motivating examples, i.e., the assignment of public housing

in cities, in order to demonstrate the usefulness of our model in practice. Considering this example

will allow us to show that:

1. The fluid approximation is close to the underlying stochastic system.

2. Our model makes it easy to try out different scoring policies.

3. It is easy to try out the policy implications of different wait time penalties.

5.1. Overview of the example

Most major North American cities provide affordable housing options for eligible low-income fam-

ilies, seniors, and persons with disabilities3. The housing authority (HA) in each city is responsi-

ble for the maintenance and assignment of such houses. Currently, such programs accommodate

approximately 1.2 million households4, but the supply is far from large enough to meet all the

demand for housing. Applicants to such programs typically have strict constraints on the housing

options they prefer based on location, the community type, and features of the housing unit such as

the number of rooms etc., which exacerbates the shortage for some types of housing. For example,

the waiting time for single-bedroom apartments was reported to be around 28 years in the District

of Columbia (Stockton 2013).

We illustrate the use of the tools we developed in this paper using data from the Housing

Authority of the City of Pittsburgh (HACP) that was gathered by Geyer and Sieg (2013). The

HACP requires applicants to specify their preference among the available types of units, and among

the communities where public housing is available. Houses are allotted based on availability in the

preferred community and according to the applicants’ registration dates when joining the waitlist

for housing. Therefore the waitlist can be effectively partitioned into multiple queues according to

the applicants’ selections. If we assume that the priority is first-come-first-served in each queue5,

then the waitlist system is a BQS.

We take the following steps to study the optimal scoring policies that the HACP could use to

allocate their houses. First, Section 5.2 estimates parameters of this BQS using historical data that

was obtained from the HACP and the data from the Survey of Income and Program Participation

(SIPP). Next, Section 5.3 calculates the fluid limit process for this BQS using the tools developed

in this paper. We compare the fluid limit process to the stochastic process that is simulated using

the same set of parameters. The comparison shows that the fluid model generates very robust

predictions. Lastly, Section 5.4 shows that the predictions of the our fluid model make it easy to

evaluate different scoring rules with respect to various performance measures.


5.2. Parameter Estimation

Geyer and Sieg (2013) developed a discrete-choice-model to predict how low-income households

select their housing community by studying the public housing allocation system of HACP.

Although our objective in this paper is different than theirs, we use data sets from their study and

calibrate many useful system parameters based on estimates reported in their study. They iden-

tified 34 community types in the city of Pittsburgh, and further categorized them into six broad

community types, each of which consists of fairly homogeneous housing units. To make our BQS

even simpler, we consider only three of these six broad community types, denoted by PH1, PH2,

and PH3. The majority of the housing units in these three broad types are offered to non-senior

families, so we can focus on these three types independently from the other types.

Applicants specify a first choice among the three types, and are encouraged (but not required)

to add a second choice. An applicant’s choice profile can then be represented by a single number

or an ordered pair. For example, choice profile [12] means that the applicant’s first choice is PH1

and and second choice is PH2, and choice profile [2] means that the applicant’s first choice is PH2

and they did not specify a second choice.

The parameters we estimated are summarized in Table 1, and the technical details of the param-

eter estimation are in Appendix EC.7. Some of these estimates are ad-hoc, but this case study

serves the purpose of illustrating the application of the tools we developed.

Table 1 Summary of Estimated Parameters

Choice Profile (Queue) i [12] [13] [1] [21] [23] [2] [31] [32] [3]Probability 0.1013 0.0793 0.049 0.1318 0.1939 0.1061 0.0906 0.1703 0.0776

Community Type j PH1 PH2 PH3Service Rates µj 118.9 29.5 9.1

Time Window Quarter 1 and 2 LaterTotal Arrival Rate λ 1232 443

Reneging Rate d 0.07

Matching Utility U(·, ·)

Queues [12] [13] [1] [21] [23] [2] [31] [32] [3]PH1 1 1 1 0.66 -100 -100 0.66 -100 -100PH2 0.66 -100 -100 1 1 1 -100 0.66 -100PH3 -100 0.66 -100 -100 0.66 -100 1 1 1

5.3. Comparison of the Fluid Approximation and the Stochastic Process

A fluid approximation to the waitlist system assumes that both applicants and housing units arrive

according to deterministic processes. Using the machinery developed in Section 3, we can calculate

the HOL waiting times in the fluid model using the parameters reported in Table 1. To see whether


the fluid approximation is an accurate approximation of the underlying stochastic process, we

compare the HOL waiting times predicted by the fluid model to that in a simulated stochastic BQS

using the same set of parameters and assuming that all arrival processes are Poisson. We simulate

the HOL waiting times of this stochastic BQS by bootstrapping each applicant’s attributes from

the waitlist pool and predicting their choice profiles.

We assume that applicants on the waitlist are ranked by scoring formula (56), where the weight

η is set to be 50. According to Corollary 5, this scoring formula optimizes the steady-state per-

formance with respect to the objective function (53), where efficiency (Ef) and fairness (Fa) are

defined according to equations (36) and (35), respectively. Figure 6 plots two sets of HOL waiting

time curves over a five-year horizon (20 quarters). One set are predictions made by the fluid model,

and the other set are simulated sample paths for the stochastic BQS. The two sets of curves stay

fairly close and the fluid process captures the behavior of the simulated sample paths quite well.

In the first period, each community PHi only serves the queues that indicated PHi as their first

choice. However, this routing configuration changes quickly because scores of queues in routing

components of PH2 and PH3 increase at a higher speed than those of other queues due to the

acute shortage of housing units in PH2 and PH3. Consequently, a subset of queues served by PH2

and PH3 catch up with queues served by PH1 in their scores, and get merged into the component

of PH1. For example, at T3, queues [32] and [3] in the faster component are merged into the slower

component of PH1, which leads to the breakpoints in the waiting-time curves. We plot the curves

for five years, which covers all switch points, though it takes fifteen years in total for the fluid

process to reach the steady state. The change in waiting times is smaller than .00001 after 15

years, indicating convergence. The routing configurations in different time periods are depicted in

Figure 5.3.

We repeat the simulation 100 times and calculate the mean absolute deviation (MAD) for the

HOL waiting times in each queue, and find MADs that range from 0.097 to 0.116 quarters, which

is 8–10 days, which means that the fluid approximation is accurate up to a ten-day period. We

also observe that the fluid approximation remains accurate even for queues with thin service rates,

such as queues [32] and [3], whose throughput rate is less than 10 per quarter. This demonstrates

that fluid models for overloaded systems were quite robust in our experiments.

5.4. Policy Evaluation

Our fluid model enables the housing authority to predict the properties of the system when housing

allocations are made according to any scoring policy of the form (1) discussed in this paper.

In practice, most housing authorities try to match the applicants’ choice of community types

to the available capacity, but an unintended consequence is that some community types might


Time (Quarter)0 2 4 6 8 10 12 14 16 18 20

HO

L W

aitin

g T

ime

(Qua

rter

)

0

2

4

6

8

10

12

14

16

18

20

T1 T2 T3

[12][13][1]--F[21]--F[31]--F[2][23]--F[3][32]--F[12]--S[13]--S[1]--S[21]--S[31]--S[2]--S[23]--S[32]--S[3]--S

Figure 6 HOL Waiting Times for the nine queues predicted by Fluid Model (F) versus a Simulated Stochastic

Process (S). Note that some queues, such as queues [32] and [3], have the same HOL waiting times

curves in the fluid process. T1, T2, T3 are switch times at which the routing components change. Other

kinks on the curve are due to discontinuity of the arrival rate functions λi(t).

require a much longer waiting time than others. In practice, the queue length or waiting time is

usually unobserved by the applicants. As a result, even applicants who are somewhat indifferent

to the community type might have to wait much longer than the other applicants due to their

unintentionally selecting a housing type that corresponds to a longer queue. Such scenarios can be

minimized if a housing authority actively tries to reduce the waiting-time disparity across different

queues while aiming to match applicants’ choices. To demonstrate how a HA can accomplish this,

in our housing assignment example we assume that HACP uses a multi-objective function in the

form of (53), with efficiency (Ef) and fairness (Fa) measured by (36) and (35), respectively. We use

the optimal index formula for an infinite horizon problem from (56) as the benchmark policy for

our finite horizon problems for tractability. We then compare the performance of (56) to several

alternative formulas.


PH1 PH2 PH3

[13] [1] [21] [23] [2] [31] [32] [3][12]

(a) Initially, houses in each community type are allocated

only to applicants whose first choice is matched.

PH1 PH2 PH3

[13] [1] [21] [23] [2] [31] [32] [3][12]

(b) After T1 = 2.67, queue [31] starts to be served by

PH1 instead of PH3.

PH1 PH2 PH3

[13] [1] [21] [23] [2] [31] [32] [3][12]

(c) After T2 = 3.09, queue [21] starts to be served by PH1

instead of PH2.

PH1 PH2 PH3

[13] [1] [21] [23] [2] [31] [32] [3][12]

(d) After T3 = 19.15, the routing components of PH1 and

PH3 merge into a single routing component.

Figure 7 Routing configurations in each period

We estimate the matching utility matrix U is estimated in Appendix EC.7. By looking into the

fluid process, we can easily identify the routing components and feasible routing-rate vectors exist

in each time periods. For example, as shown in Figure 5.3 (d), after T3, the corresponding residual

network contains a cycle PH1→ [32]→PH3→ [3]→PH1. We could construct alternative feasible

routing rates by pushing a nonzero flow around these cycles, which would not change the matching

utility because the utility around these cycles sums to zero. In other words, different routing rates

lead to the same expected matching utility∑

j∈J ,i∈I rjiUji. Thus, in this example there is no need to

perturb the matching score L. Indeed, our numerical experiments show that different perturbations

of L lead to close performance with respect to the efficiency and fairness measurements (36) and

(35).

We next fix the matching score to be L = U and test the following parametric forms of the

waiting score:

gi(WT) = WTgi(WT) = Fi(WT)= 1− exp(−dWT)gi(WT) = (WT)2

(65)

which represents linear, concave, and convex waiting scores. For a given weight η, we calculate

the corresponding fluid process over a 20-quarter horizon. We measure efficiency using the average

matching utility (36), but use two different fairness measurements. One is the negative variance


in likelihood of getting service given by (35), and the second is the minimum likelihood of getting

service across all customers, which can be calculated by

mini∈I

∫ T

0

∑

j∈J rji(t)dt∫ T

0λi(t)dt

. (66)

By sampling different values of η, we obtain the Pareto frontiers corresponding to the three types

of waiting scores and plot them on the efficiency-fairness spectrum in Figure 5.4, where 5.4 (a)

uses the variance as fairness measure, and (b) uses the negative minimum likelihood as fairness

measure. For the first fairness measure, we have a theoretical characterization of the optimal

M+W index formula for the steady state, while the second fairness measure is much simpler and

possibly more practical in real life. Figure 5.4 (a) shows that the status quo, gi(WT)= Fi(W (t)) =

1−exp(−dW (t)) outperforms the other two forms of waiting scores with respect to the first fairness

measure; however, if the second fairness measure is used, then the convex waiting score gi(WT)=

Fi(W (t)) = 1− exp(−dW (t)) dominates the other two. Intuitively, if one wishes to lower bound

the likelihood of getting an offer, then one may consider using a convex waiting score to prioritize

queues with the smallest likelihood.

In practice, the policy designer may wish to maximize the efficiency measure solely by taking the

fairness measure as a constraint, for example, require customers in all queues to have a probability

of at least 10% to get service. Then one may look up η at which the Pareto frontier intersects the

line fairness= 10% and obtain the corresponding M+W index.

Efficiency0.87 0.88 0.89 0.9 0.91 0.92 0.93

Fai

rnes

s

-0.07

-0.065

-0.06

-0.055

-0.05

-0.045

-0.04

-0.035

-0.03Quarter=20

1-e-0.07WT

WT

WT2

(a) Fairness metric used: negative variance in the probability

of getting offer

Efficiency0.87 0.88 0.89 0.9 0.91 0.92 0.93

Fai

rnes

s

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13Quarter=20

1-e-0.07WT

WT

WT2

(b) Fairness metric used: minimum probability of getting offer

Figure 8 The Pareto frontiers when L=U and gi(WT) takes the three forms in (65)


6. Discussions

In this paper we propose a mechanism for allocating scarce resources, namely the M+W indexing

policy, which generalizes several well-known ranking policies, such as FCFS, static priority, and

dynamic priority. The M+W indexing policy is particularly well-suited to contexts where public

goods are allocated to several types of customers, and there are not enough servers to satisfy all

demand. The ranking criteria used by an M+W index is restricted to waiting time and the degree

of matching, because the use of other factors, such as queue length, might be criticized as being

unfair for public goods. An example is that waiting time and blood/tissue type compatibility are

the main criteria being used in the current policy for allocating cadaver kidneys. Given that the

M+W index represents an important class of priority mechanisms, we believe that it deserves more

attention from researchers in queueing and applied probability. The fluid model presented in this

paper acts as the first step to investigate this policy.

The analysis of the fluid model is complicated by both the combinatorial and dynamic nature of

the bipartite queueing system. A key observation is that the fluid process can be characterized as

solutions to ODEs in time intervals where the routing components remain invariant. To identify

the routing components, we utilize two classical combinatorial results: the nested-cut structure

of parametric min cut with SSM capacities, and the existence of a linear complementary pair.

The latter was used to resolve the case when multiple primitive components have the same score-

increment rates.

To extend our model to cover other practical applications, we intend to consider the following

extensions in the future: (1) an overloaded BQS in which each candidate has the autonomy to accept

or reject a resource when it is offered; (2) a double-end BQS in which each queue has either excessive

demand fluid or excessive supply fluid; (3) a BQS in which different server-customer combinations

lead to different service speeds instead of utility. Regarding (1), Moulin and Sethuraman (2013) and

Luss (1999) have studied the resource rationing problem in a single period in presence of customer

choice. However, we are not aware of any literature that has discussed a similar problem in the

queueing context. In fact, modeling customers’ dynamic choices can be challenging. In reference to

(2), when the priority rule is FCFS Adan andWeiss (2012) proved that the steady-state distribution

of the Markovian process has a product form, and Afeche et al. (2014) derived

closed-form results for a double-ended FCFS queue with batch arrivals. It is not clear whether

the closed-form results in these works can be generalized to the SDSS case. Model (3) is usually

discussed in the context of skill-based routing in a call center (Wallace and Whitt 2005, Gurvich

and Ward 2012). Although mathematically it is tempting to study the application of SDSS to

systems as described in (3), it is not clear whether SDSS is appealing to call-center managers given

that it is less flexible but more transparent than a dynamic routing policy.


Endnotes

1. A real valued function f(x) is real analytic at x0 if there is a neighborhood of x0 where the

Taylor series∑∞

n=11n!f (n)(x0)xn converge to f(x) point-wise.

2. Suppose B is a discrete set. A correspondence E : [0,∞)→B is upper hemicontinuous at t if

there exists a neighborhood A of t such that for all t′ ∈A, E(t′)⊆E(t).

3. The rent for public housing is usually around 30% of the renter’s gross income. This can vary

depending on the size of the residence and the type of the community.

4. http://portal.hud.gov/hudportal/HUD?src=/topics/rental_assistance/phprog

5. In fact, most households with the same selection of community types will be sequenced accord-

ing to their registration date. The HACP may prioritize a small number of households with urgent

needs, but this is often negligible.

Acknowledgments

This paper has benefited from the many helpful, deep, and insightful comments from the Area Editor, and

two anonymous referees, and we thank them for their efforts. We would like to extend our sincere appreciation

to Peter Glynn and Stefanos Zenios for their many insightful comments and enormous help with this project.

We are grateful to Judy Geyer for providing much of the data we used in the case study, which is an excerpt

from the data she obtained from the HACP. We thank Professor Yunan Liu and Professor Yehua Wei for

their helpful suggestions and comments. We thank students Jiaxin Liang and Kai Wang for their substantial

help with the computations for the case study. The research of all three authors is partially supported by

their respective NSERC Discovery Grants.

References

Adan, Ivo, Gideon Weiss. 2012. Exact FCFS matching rates for two infinite multitype sequences. Opera-

tions Research 60(2) 475–489. doi:10.1287/opre.1110.1027. URL http://or.journal.informs.org/

content/60/2/475.abstract.

Adan, Ivo, Gideon Weiss. 2014. A skill based parallel service system under FCFS-ALIS-steady state, over-

loads, and abandonments. Stochastic Systems 4(1) 250–299.

Afeche, Philipp, Adam Diamant, Joseph Milner. 2014. Double-sided batch queues with abandonment: Mod-

eling crossing networks. Operations Research 62(5) 1179–1201. doi:10.1287/opre.2014.1300.

Ahuja, R. K., T. L. Magnanti, J. B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications .

Prentice-Hall, Englewood Cliffs.

Atar, Rami, Chanit Giat, Nahum Shimkin. 2010. The cµ/θ rule for many-server queues with abandonment.

Operations Research 58(5) 1427–1439.

Caldentey, Rene, Kaplan Edward H, Weiss Gideon. 2009. FCFS infinite bipartite matching of servers and

customers. Advances in Applied Probability 41(3) 695–730.


Caldentey, Rene A, Edward H Kaplan. 2007. A heavy traffic approximation for queues with restricted

customer-server matchings. working paper .

Committee, The OPTN/UNOS Kidney Transplantation. 2011. Concepts for Kidney

Allocations. open resource URL http://www.unos.org/SharedContentDocuments/

KidneyAllocationSystem--RequestForInformation.pdf.

Cong, Shi, Yehua Wei, Yuan Zhong. 2015. Process flexibility for multi-period production systems. Available

at SSRN 2655790 .

Cottle, Richard W. 1964. Note on a fundamental theorem in quadratic programming. Journal of the Society

for Industrial & Applied Mathematics 12(3) 663–665.

Dai, JG, Tolga Tezcan. 2008. Optimal control of parallel server systems with many servers in heavy traffic.

Queueing Systems 59(2) 95–134.

Gallo, G., M. D. Grigoriadis, R. E. Tarjan. 1989. A fast parametric maximum flow algorithm and applications.

SIAM J. Comput. 18(1) 30–55. doi:10.1137/0218003. URL http://dx.doi.org/10.1137/0218003.

Geyer, Judy, Holger Sieg. 2013. Estimating a model of excess demand for public housing. Quantitative

Economics 4(3) 483–513.

Ghamami, Samim, Amy R Ward. 2013. Dynamic scheduling of a two-server parallel server system with

complete resource pooling and reneging in heavy traffic: Asymptotic optimality of a two-threshold

policy. Mathematics of Operations Research 38(4) 761–824.

Granot, F., S. T. McCormick, M. Queyranne, F. Tardella. 2012. Structural and algorithmic properties for

parametric minimum cuts. Math. Prog. 135(1-2) 337–367.

Grindlay, Andrew A. 1965. Tandem queues with dynamic priorities. OR 16(4) pp. 439–451. URL http:

//www.jstor.org/stable/3006711.

Gurvich, Itai, Amy Ward. 2012. On the dynamic control of matching queues. Preprint .

Gurvich, Itay, Ward Whitt. 2009. Scheduling flexible servers with convex delay costs in many-server service

systems. Manufacturing & Service Operations Management 11(2) 237–253.

Jackson, James R. 1960. Some problems in queueing with dynamic priorities. Naval Research Logis-

tics Quarterly 7(3) 235–249. doi:10.1002/nav.3800070304. URL http://dx.doi.org/10.1002/nav.

3800070304.

Kaplan, Edward H. 1988. A public housing queue with reneging. Decision Sciences 19(2) 383–391.

Kleinrock, Leonard, Roy P. Finkelstein. 1967. Time dependent priority queues. Operations Research 15(1)

pp. 104–116. URL http://www.jstor.org/stable/168514.

Larranaga, Maialen, Urtzi Ayesta, Ina Maria Verloop. 2014. Index policies for a multi-class queue with

convex holding cost and abandonments. ACM SIGMETRICS Performance Evaluation Review , vol. 42.

ACM, 125–137.


Lemke, Carlton E. 1965. Bimatrix equilibrium points and mathematical programming. Management science

11(7) 681–689.

Liu, Yunan, Ward Whitt. 2011. Large-time asymptotics for the gt/mt/st+gi many-server fluid queue with

abandonment. Queueing systems 67(2) 145–182.

Liu, Yunan, Ward Whitt. 2012a. The gt/gi/st+gi many-server fluid queue. Queueing Systems 71(4) 405–444.

Liu, Yunan, Ward Whitt. 2012b. Stabilizing customer abandonment in many-server queues with time-varying

arrivals. Operations research 60(6) 1551–1564.

Luss, Hanan. 1999. On equitable resource allocation problems: A lexicographic minimax approach. Operations

Research 47(3) 361–378.

Mandelbaum, Avishai, Alexander L Stolyar. 2004. Scheduling flexible servers with convex delay costs: Heavy-

traffic optimality of the generalized cµ-rule. Operations Research 52(6) 836–855.

Mehrotra, Vijay, Kevin Ross, Geoff Ryder, Yong-Pin Zhou. 2012. Routing to manage resolution and waiting

time in call centers with heterogeneous servers. Manufacturing & service operations management 14(1)

66–81.

Moulin, Herve, Jay Sethuraman. 2013. The bipartite rationing problem. Operations Research 61(5) 1087–

1100.

Nelson, Randolph D. 1990. Heavy traffic response times for a priority queue with linear priorities. Operations

Research 38(3) pp. 560–563. URL http://www.jstor.org/stable/171370.

Netterman, A., I. Adiri. 1979. A dynamic priority queue with general concave priority functions. Operations

Research 27(6) pp. 1088–1100. URL http://www.jstor.org/stable/172085.

Nocedal, Jorge, Stephen Wright. 2006. Numerical optimization. Springer Science & Business Media.

OPTN/UNOS. 2008. Kidney Allocation Concepts: Request for Information. open resource .

OPTN/UNOS. 2015. OPTN/UNOS online Data Report. open resource URL https://optn.transplant.

hrsa.gov/data/.

Picard, Jean-Claude, Maurice Queyranne. 1982. On the structure of all minimum cuts in a network and

applications. Mathematical Programming 22(1) 121–121. doi:10.1007/BF01581031. URL http://dx.

doi.org/10.1007/BF01581031.

Schwartz, Benjamin L. 1974. Queuing models with lane selection: a new class of problems. Operations

Research 22(2) 331–339.

Sisselman, Michael E, Ward Whitt. 2007. Value-based routing and preference-based routing in customer

contact centers. Production and Operations Management 16(3) 277–291.

Stockton, Halle. 2013. Pittsburgh, allegheny county experience critical shortage of public housing. Public-

Source .


Stolyar, Alexander L, Tolga Tezcan. 2011. Shadow-routing based control of flexible multiserver pools in

overload. Operations Research 59(6) 1427–1444.

Talreja, Rishi, Ward Whitt. 2008. Fluid models for overloaded multiclass many-server queueing systems with

first-come, first-served routing. Management Science 54(8) 1513–1527. doi:10.1287/mnsc.1080.0868.

URL http://mansci.journal.informs.org/content/54/8/1513.abstract.

Van Mieghem, Jan A. 1995. Dynamic scheduling with convex delay costs: The generalized c— mu rule. The

Annals of Applied Probability 809–833.

Visschers, Jeremy, Ivo Adan, Gideon Weiss. 2012. A product form solution to a system with multi-type jobs

and multi-type servers. Queueing Systems 70(3) 269–298.

Wallace, Rodney B, Ward Whitt. 2005. A staffing algorithm for call centers with skill-based routing. Man-

ufacturing & Service Operations Management 7(4) 276–294.

Zenios, Stefanos A., Glenn M. Chertow, Lawrence M. Wein. 2000. Dynamic allocation of kidneys to candi-

dates on the transplant waiting list. Oper. Res. 48(4) 549–569. doi:http://dx.doi.org/10.1287/opre.48.

4.549.12418.

e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec1

Appendices

EC.1. The Case when All Queues in argmax Sji(t) are Empty

We discuss how to compute A(t, j) in the case when Qi(t) = 0 for every i ∈ argmax Sji(t). We

define the following sequence of index sets

Iℓ(t, j) = argmax{Sjk(t) | k ∈ I\∪ℓ−1u=1 I

u(t, j)}.

When ℓ = 1, I1(t, j) = argmax Sji(t) is the set of queues with the largest scores with respect to

server j at time t. If all buffers in I1(t, j) are already empty, and the supply fluid of type j (after

cancelling out the demand fluid flowing into buffers in I1(t, j)) still has some surplus, then the

surplus has to flow to other buffers in I2(t, j) which have the second highest scores. By repeating

this step for I1(t, j), I2(t, j), . . ., A(t, j) can be constructed as

A(t, j) =∪kℓ=1I

ℓ(t, j), (EC.1)

where k is the smallest integer with Qi(t)> 0 for some i∈ Ik(t, j) or having∑

i∈∪kℓ=1

λi(t)>µj(t).

Each buffer i ∈∪k−1ℓ=1I

ℓ(t, j) will receive supply fluid of type j at rate λi(s), which exactly cancels out

the incoming demand fluid, and the type-j supply fluid still has µj(t)−∑

i∈∪k−1ℓ=1

λi(t) fluid to serve

the other buffers. By removing all queues in ∪k−1ℓ=1I

ℓ(t, j) and replacing µj(t) by µj(t)−∑

i∈∪k−1ℓ=1

λi(t),

the system reduces to the usual cases with no empty queues with the highest scores. We can then

apply the machinery we developed in Section 3 to construct the fluid process.

EC.2. An Example of Non-Unique Trajectory Wi(t) when Condition (13) Holds

Consider a system with two servers, J = {1,2}, and three types of customers, I = {a, b, c}. Suppose

that at time t0 queue b is empty and queues a and c are non-empty. Suppose that queue b has the

highest M+W score with respect to both server 1 and server 2. Moreover, λb < µ1 + µ2, so after

cancelling out all the incoming demand fluid in buffer b, there is still some surplus of supply fluid.

Suppose that queue a has the second highest M+W score for server 1, and queue c has the second

highest M+W score for server 2. In this case, the split of the surplus of supply fluid between queue

a and queue c depends on how much supply fluid has flowed to buffer b from each of server 1 and

server 2. The service rates r1a, r1b, r2a, and r2b can be characterized by the following system of

linear equations, which has infinitely many solutions:

r1a + r1b = µ1

r2b + r2c = µ2

r1b + r2b = λb.

Since Wa and Wc depend on the service rates r1a and r2c, there will be infinitely many possible

waiting time trajectories Wa(t) and Wc(t) starting from t0.

ec2 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost

EC.3. Proof of Proposition 3

Proposition 3 Suppose {W (t) | t ∈ [t0, t1]} is a continuous vector-valued process with associated

arc set E(t)≡E for all t∈ (t0, t1), and suppose {Gk}k=1,...,K are the routing components in (t0, t1).

Then {W (t) | t ∈ [t0, t1]} is the fluid process in an M+W-BQS over [t0, t1] if and only if for each

k= 1, . . . ,K, {W (t) | t∈ [t0, t1]} is the unique solution to the following ODE:

(ODE)

dxk(t)

dt= Ψ

Gk

W (t) for k= 1,2, . . . ,K

Wi(t) = g−1i [gi(Wi(t0))+xk(t)] for all i∈ Gk, k= 1,2, . . . ,K

xk(0) = 0 for , k= 1,2, . . . ,K,

(EC.2)

where the function ΨGk

W (t) is defined as in equation (20).

Proof. We first argue that all queues in the same component Gk must have the same score

change rate θk(t) for all t∈ [t0, t1]. Otherwise, we can find an (undirected) path that connects two

queues with different score change rates. Along this path, there must be a server that connects two

queues which have different score change rates. Due to the difference in their score change rates,

the tie between scores of the two queues has to break and one of the queues is then disconnected

from the server. This contradicts the assumption that E(t) is invariant over [t0, t1].

We next prove the “only if” and ”if” statement of the Proposition.

“only if”: According to Lemma 1, if (W (t), r(t)) describe the state of an M+W-BQS fluid pro-

cess, (10) must be satisfied. Thus, θk, which is the score change rate for all queues i ∈ Gk, has to

satisfy the following equation as a result of equation (11):

θk(t) = ϑWi(t)

(

∑

j∈Gk

rji(t)

)

, ∀ i∈ Gk. (EC.3)

Summing (EC.3) up over all i∈ Gk leads to equation (19), which has the unique solution

θk(t) =ΨGk

W (t). (EC.4)

Since xk(t) is the total amount of score change for queues in component Gk from time t0 up to time

t, we have xk(t)

dt= θk(t). Equation (EC.4) then leads to the first equation in the ODE. Intuitively,

the first equation characterizes the trajectory of score change in each component k given that

component k receives (or sends) no supply fluid from (to) the outside.

By the definition of xk(t), we have

xk(t) = gi(Wi(t))− gi(Wi(t0)), ∀ i∈Gk, (EC.5)

which implies that Wi(t) can be computed via the second equation in (EC.2). We have thus proved

the ODE must be satisfied by the vector-valued process {W (t) | t∈ [t0, t1]}.


“if”: We first argue that the ODE (EC.2) always admits a unique solution W (t). Notice that

the function ΨGkW (t) is continuous in t and W (t), except when t−Wi(t) is a discontinuity point of

λi(·). Since W ′i (t) ≤ 1 (the HOL waiting time of a queue increases at rate one when it receives

no service, which is the maximum possible W ′i (t)), t−Wi(t) is non-decreasing. Therefore, as λi(t)

follows our definition of piecewise continuous and Wi(t) is continuous in t, we can deduce that

λi(t−Wi(t)) is also piecewise continuous in t. Thus, function ΨGk

W (t) only has a finite number of

discontinuous points and is therefore integrable. By substituting the expression for Wi(t) in the

second equation of (EC.2) into the RHS of its first equation, we obtain an ODE of the basic form

dxk(t)

dt= f(xk(t)), where f is integrable. This ODE always has a unique solution {xk(t) | t∈ [t0, t1]}

under the boundary condition xk(0) = 0. After solving for xk(t), W (t) can be uniquely determined

using the second equation in (EC.2) for all t∈ [t0, t1].

Next, we construct a piecewise continuous feasible routing-rate vector r(t) for all t∈ [t0, t1], which

solves equation (10) together with the W (t) derived from the ODE. Lemma 1 then shows that

{W (t) | t∈ [t0, t1]} is a fluid process.

By the first equation in the ODE, at all time t ∈ [t0, t1] and all k = 1,2, . . . ,K, we have dxk

dt=

ΨGkW (t), which coincides with the expression of breakpoints (20) given in Proposition 2 property

(a). Thus, {dxk

dt| k = 1, . . . ,K} are exactly the sequence of breakpoints we obtain by applying the

GGT algorithm on G(t), and each of those breakpoints corresponds to at least two min cuts in the

sequence of nested cuts Π. Let Amax( dxk

dt) and Amin( dx

k

dt) denote the maximum and minimum min

cut at the breakpoint dxk

dt, respectively. Then Gk = Amax( dx

k

dt)\Amin( dx

k

dt) is the union of one (the

non-degenerate case) or multiple (the degenerate case) primitive components, all of which have the

same score change rate dxk

dt. Let {Gℓ | ℓ= 1, . . . ,K ′} denote the primitive components. According

to Property (b) in Proposition 2, the set of feasible routing-rate vectors r(t) can be represented by

the following non-empty linear polytope

Γ(W (t)) :=

r(t)≥ 0

∣

∣

∣

∣

∣

∣

∑

{i|(j,i)∈E} rji(t) = µj(t), ∀ j ∈J∑

{j|(j,i)∈E} rji(t) = ϑ−1Wi(t)

( dxk

dt), ∀ i∈ Gk, k= 1,2, . . . ,K

rji(t) = 0, ∀ j ∈Gℓ, i∈Gℓ′ , ℓ 6= ℓ′ or (j, i) /∈E

. (EC.6)

We next show that any feasible routing-rate vector r(t)∈ Γ(W (t)) and W (t) solve equation (10).

The second equation in the ODE implies equation (EC.5). Taking derivatives on both sides of

(EC.5) leads todxk

dt= g′i(Wi(t))W

′i (t), for all i∈ Gk. (EC.7)

Equation (EC.7) and the second constraint on r(t) in (EC.6) imply

ϑWi(t)(∑

{j|(j,i)∈E}

rji) =dxk

dt= g′i(Wi(t))W

′i (t), (EC.8)


which leads to equation (10) due to equation (11).

It remains to show that we can ensure r(t) to be piecewise continuous by specifying certain rules

for choosing an r(t) from Γ(W (t)). For this purpose, we write the feasible routing rates r(t) as a

JI-dimensional row vector, i.e.,

(r11(t), r12(t), . . . , r1I(t), r21(t), . . . , r2I(t), . . . , rJI(t)). (EC.9)

We then rank these vectors lexicographically, i.e., r(t)≻ r′(t) if the first non-zero entry in r− r′

is positive. This lexicographical order defines a complete order over the set of feasible routing-rate

vectors. Since the polytope Γ(W (t)) is closed, we can pick a unique lex-max rmax(s) from Γ(W (t)).

We next show that rmax(t) is piecewise continuous in (t0, t1). To this end, consider the following

maximization problem for each t∈ (t0, t1):

max∑

i∈I, j∈J (I ·J)−(jI+i)rji(t),

s.t. r(t)∈ Γ(W (t)).(EC.10)

The objective function is selected to ensure that the objective value of r in (EC.10) is strictly

larger than that of r′ if and only if r ≻ r′. As a result, rmax is the unique maximizer to the

optimization problem (EC.10), and there is a continuous bijection between the objective values

and the feasible set Γ(W (t)). Since the coefficients in the formulation of polytope Γ(W (t)) change

piecewise continuously with t (except at points of discontinuity of λi(t) and µj(t)), the optimal

value of (EC.10) (which is clearly continuous in the coefficients of Γ(W (t))) has to be piecewise

continuous in t. Thus the optimal solution rmax, as there is a continuous bijection between its

domain and the objective function values, is picewise continuous in t.

EC.4. The Degenerate Case

When G(t0) is degenerate, there are multiple primitive components corresponding to the same

breakpoint, which therefore have the same score change rate at t0 according to Proposition 2

Property (a). If some of those components are further connected, then queues in those components

not only have their current scores tied, but also have their score change rates tied. If that happens,

we cannot determine whether the inter-component arcs between those components will remain

in the arc set or disappear in the next infinitesimal period after t0, which we denote by t+0 for

simplicity. This section provides techniques to determine which of these inter-component arcs will

remain on the graph E(W (t)) in t+0 , so we can determine the routing components in t+0 .

Formally, suppose at t0 there areK primitive components {G1,G2, . . . ,GK}. We define a directed

graph G(t0) := (V (t0), E(t0)) with vertex set

V (t0) := {1,2, . . . ,K}, (EC.11)


and directed arc set

E(t0) := {(k, k′) | θk(t) = θk

′(t), and there is at least one outgoing arc

in E(t0) from Gk to Gk′}.(EC.12)

Figure EC.1 illustrates how G(t0) is constructed based on the original routing graph G(t0).

Figure EC.1 A degenerate graph with (Gk)k=1,2,3,4 (left) is represented by G (right). Each primitive component

Gk corresponds to a vertex in G. Note that vertices 4 and 1 are not connected because θ1(t) 6= θ4(t).

Graph G(t0) summarizes the information about inter-component arcs of the routing graph G(t0).

By Proposition 2 Property (c), we can index the primitive components so that arcs can only enter

a lower indexed component from a higher indexed component, and so there are no directed cycles

in G(t0). We may also construct a network flow r(t) := {re | e ∈ E} on E(t0) from any feasible

routing-rate vector r(t) by defining

ruu′(t) =∑

j∈Gu,i∈Gu′

rji(t). (EC.13)

Thus ruu′ can be interpreted as the total amount of flow sent from component Gu to Gu′ according

to some feasible routing-rate vector, so we call r a feasible inter-component routing-rate vector.

Our discussions in the proof of Proposition 4 show that if two primitive components Gk and Gk′

are disconnected at t0, or are connected but have different score change rates θk 6= θk′at t0, then

Gk and Gk′ must be disconnected at t+0 . For example, in Figure EC.1, since θ1 <θ4 at t0, the inter-

component arc (5, d) disappears at t+0 . If two components k and k′ are connected and also have


the same score change rate at t0, then the corresponding inter-component arcs might remain in

the graph or disappear at t+0 . Such arcs are all included in E(t0) and are candidate arcs that could

remain in t+0 , and we use E(t) to denote the inter-component arcs that remain at t and we will show

that for E(t)≡ E+ for all t∈ t+0 . If (k, k′)∈ (/∈)E+, it implies that queues in component k and k′ will

have the same (different) score change rates during t+0 , so all inter-component arcs from Gk to Gk′

have to remain (disappear) in t+0 . Let {Gu}u=1,...,K′ denote the connected components with respect

to E+. We call each Gu a super-component as it is a union of one or multiple primitive components.

For example, in Figure EC.1, if E+ = {(3,2)}, then G1 = {1}, G2 = {2,3}, and G3 = {4}. The

super-component G2 represents the merged component G2 ∪G3, which consists of servers {1,2}

and queues {a, b}.

Recall that we use ΨGk

W (t) to denote the score change rate of queues in primitive component

Gk provided that the net inflow rate for Gk equals zero. With a slight abuse of notation, we use

ΨGu

W (t)(z) to denote the score change rate if the net inflow rate for a super-component Gu ⊆ V

equals z ∈ (−∞,+∞). Therefore, if r is a feasible inter-component routing-rate vector on G, then

the score change rate of a super-component u is given by ΨGu

W (t)

(∑

e∈δ+(Gu)re(t)−

∑

e∈δ−(Gu)re(t)

)

,

where δ+(Gu) and δ−(Gu) denote the set of arcs in E that leave and enter Gu, respectively. We

then define a potential function for any two connected subsets A,B ⊆ V as

∆ΨA,B

W (t)(r(t)) :=ΨAW (t)

(

∑

e∈δ+(A)

re(t)−∑

e∈δ−(A)

re(t)

)

−ΨBW (t)

(

∑

e∈δ+(B)

re(t)−∑

e∈δ−(B)

re(t)

)

,

(EC.14)

which gives difference in the score change rates between A and B at state W (t). In the special case

when A and B themselves are connected components (so their net inflow rates are both equal to

zero), we adopt the following notation to represent their score change rate at W (t):

∆ΨA,B

W (t) := ΨAW (t)−ΨB

W (t). (EC.15)

Let |x| denote the absolute value of cumulative score change since t0. Given |x|> 0, the cumu-

lative score change in primitive component Gk is given by

xGk = sign(θk)|x|, (EC.16)

where sign(θk) denotes the sign of θk. The next Proposition provides three equivalent characteriza-

tion of E+ and {Gu}u=1,...,K . All three characterizations are based on the potential functions, but

the fluid processes are indexed by t in the first characterization, and indexed by |x| in the other two

characterizations. If scores in a connected subset A⊆ V have the same trajectory in t+0 , then for

all |x| in an infinitesimal interval (0, x) (denoted by 0+), a unique time point t(xA) exists at which


the score of queues in A has changed by xA from t0 to t(xA). Consequently, there is a one-to-one

correspondence between the score-change index xA and time index t. In the above notation, A can

be a single vertex {k} ∈ V , or a connected component Gu ⊆ V . Given |x|, the corresponding HOL

waiting-time vector W |x| := {W |x|i }i∈I and feasible inter-component routing-rate vector r|x| with

respect to E+ are given by

W|x|i := g−1

i (gi(W (t0))+xGk), for all i∈Gk, (EC.17)

and

r|x|

kk′:=

{

0 if (k, k′) /∈ E+;

rkk′(t(x{k,k′})) if (k, k′)∈ E+.

(EC.18)

Note that the definition of r|x| is always with respect to a given arc set E+. Given W |x| and r|x|,

we can define the potential function ∆ΨGu,Gu′

W |x| (r|x|) and ∆ΨGu,Gu′

W |x| (r|x|) similar to (EC.14) and

(EC.15). Because ΨGu

W |x|(r|x|) is an affine function of r|x|, ∆Ψ

Gu,Gu′

W |x| is also affine in r|x|.

Proposition 6 Suppose at W (t0) the routing graph is degenerate, and G(t0) := (V (t0), E(t0)) is

constructed according to (EC.11) and (EC.12). Then {W (t)} is the fluid process during some

interval [t0, t1] if {W (t) | t ∈ [t0, t1]} solves the ODE (EC.2) associated with the super-components

{Gu}u=1,...,K which are connected components with respect to some arc set E+ ⊆ E(t0), and it

satisfies the following properties for all t∈ (t0, t1):

• (1) If Gu can be partitioned into two subsets, G+u and G−

u such that there are only arcs in E+

going from G+u to G−

u , then

∆ΨG+

u ,G−u

W |x| ≤ 0. (EC.19)

• (2) If there are arcs in E(t0)\E+ going from Gu to Gu′ , then

∆ΨGu,Gu′

W (t) > 0. (EC.20)

Equivalently, there exists x > 0, such that for all |x| < x, and x := (xk)k∈V defined according to

(EC.16), the following properties hold.

• (1′) If Gu can be partitioned into two subsets, G+u and G−

u such that there are only arcs in E+

going from G+u to G−

u , then

∆ΨG+

u ,G−u

W |x| ≤ 0. (EC.21)

• (2′) If there are arcs in E(t0)\E+ going from Gu to Gu′ , then

∆ΨGu,Gu′

W |x| > 0. (EC.22)


Furthermore, Properties (1′) and (2′) hold if and only if there exists a feasible inter-component

routing-rate vector r|x| over E that solves the following linear complementary problem:

(LCP)

r|x|

kk′∆Ψ

Gk,Gk′

W |x| (r|x|) = 0 ∀ (k, k′)∈ E(t0)

r|x|

kk′≥ 0 ∀ (k, k′)∈ E(t0)

∆ΨGk,Gk′

W |x| (r|x|) ≥ 0 ∀ (k, k′)∈ E(t0)

(EC.23)

and

E+ = {(k, k′) ∈ E(t0) |∆ΨGk,Gk′

W |x| (r) = 0}. (EC.24)

Before proving the Proposition, we first provide some intuition towards Properties (1) and (2).

Property (1) is equivalent to the condition that there exists a network flow inside each Gu such

that all vertices have the same score change rate, that is,

ΨGkW (t)

(

∑

e∈δ+(k)

re(t)−∑

e∈δ−(k)

re(t)

)

=ΨGu

W (t) for all primitive components Gk ⊆ Gk. (EC.25)

In other words, Property (1) requires the primitive components contained by the same super-

component have the same score trajectory in t+0 . Property (2) states that if inter-component arcs

from Gu to Gu′ do not remain in E+, then ∆ΨGu,Gu′

W (t) > 0, which means the score change rate of Gu

must dominate that of Gu′ in t+0 .

Unfortunately, Properties (1) and (2) are difficult to verify directly, because at t0 we have no

knowledge about W (t) for t∈ t+0 . The characterization via Properties (1′) and (2′), by indexing the

fluid process using |x| instead of t, are easier to verify. We will show that the signs of ∆ΨGu,Gu′

W |x| and

∆ΨGu,Gu′

W (t) are identical in 0+ and t+0 , and thus Properties (1′) and (2′) are equivalent to Properties

(1) and (2) as a characterization of E+. Properties (1′) and (2′) are introduced to derive the LCP

characterization, which gives a general method to identify E+ in t+0 .

Proof. We first prove that Properties (1′) and (2′) are equivalent to Properties (1) and (2).

We will only prove the equivalence between (2) and (2′), as the proof of the equivalence between

(1) and (1′) is similar. Because Gu and Gu′ are connected at t0, they must have the same score

change rate at t0. If the score change rates of Gu and Gu′ are both equal to zero, then there will

be no change to their waiting times and scores, so Gu and Gu′ will stay connected and no arcs

between Gu and G)i′ will be in E(t0)\E+. In this case, Property (2) or (2′) both hold trivially.

Otherwise, we can assume that dxGu (t0)

dt= dx

Gu′ (t0)

dt> 0 without loss of generality. Then both xGu(t)

and xGu′ (t) are strictly increasing in t+0 , and their inverse functions, t(xGu) and t(xGu′ ), must both

exist in 0+.

The existence of dxGu(t)

dt> 0 in t+0 implies that t(xGu )

dxGuexists in some 0+. According to Property

(2), if arcs in E\E+ go from Gu to Gu′ , then the score of Gu increases faster than that of Gu′

so it must be that xGu(t)> xGu′ (t) for all t ∈ t+0 . That means that for all |x| ∈ 0+ it always takes


less time for Gu to reach a cumulative increment of |x| than Gu′ . Therefore, for all |x| ∈ 0+, given

xGu = xGu′ = |x| we have t(xGu)− t(xGu′ )< 0. As both Gu and Gu′ are connected components of

E(t), their score change rates are given by ΨGu

W |x| and ΨGu′

W |x| , respectively. We thus have

t(xGu )

dxGu|xGu=|x|

− dt(xGu′ )

dxGu′|xGu′=|x|

= ( dxGu(t)

dt|xGu=|x|

)−1− ( dxGu′ (t)

dt|xGu′=|x|

)−1

= (ΨGu

W |x|)−1− (Ψ

Gu′

W |x|)−1.

(EC.26)

Using the expression of ΨGuW given in (20), one may check that ΨGu

Wx is real analytical at |x|= 0

(i.e., it has a Taylor series converging to its function value in some neighborhood of 0), thus the

difference given in (EC.26) must be also real analytic at 0, and thus its sign has to stay invariant

for all |x| ∈ 0+. Because t(xGu)− t(xGu′ )< 0, we deduce that the above difference in (EC.26) has

to stay negative in 0+, which leads to (ΨGu

W |x|)−1 < (Ψ

Gu′

W |x|)−1 and thus Property (2′). A similar

argument proves the case where dxGu (t0)

dt= dx

Gu′ (t0)

dt< 0. The reverse direction (2′)⇒ (2) can also

be proved by an analogous argument.

We now prove that Properties (1′) and (2′) are equivalent to the LCP characterization. As

previously argued, Property (1′) is equivalent to the condition that there exists a network flow r|x|

that meets the demand for each vertex, where all vertices connected by E+ have the same score

change rate. This condition is equivalent to saying that given r|x|, the potential function on any

arc (k, k′) ∈ E+ equals zero. Property (2′) says that all arcs in E\E+ must connect two super-

components Gu and Gu′ , with Gu having a larger score change rate than Gu′ . This is true if and

only if all arcs (k, k′) ∈ E(t0) with k ∈ Gu and k′ ∈ Gu′ are excluded from E+, provided that the

score increment rate of k is strictly larger than that of k′ (so ∆ΨGk,Gk′

W |x| (r)> 0), which is exactly

how E+ is constructed from the LCP solution.

We next prove that W (t) is a fluid process in some interval [t0, t1] if W (t) solves ODE formulated

using the super-components {Gu}u=1,..., that satisfy Properties (1) and (2). Because W (t) is con-

structed from the ODE, which guarantees that queues in each super-component Gu have the same

score-change rate, it suffices to prove that {Gu}u=1,..., are routing components, or equivalently,

connected components with respect to arc set E(W (t)) for all t∈ t+0 . Because the super-component

is a collection of primitive components, queues in each primitive component must also have the

same score change rate. Thus, all intra-component arcs inside the same primitive components have

to remain in E(t) for t∈ (t0, t1).

For inter-component arcs, if they are inside the same super-component, then they must remain

in the graph because queues in the same super-component have the same score-change rate as

implied by Property (1); if they are connecting different super-components, say, from Gu to Gu′ ,

then the score of queues in Gu must increase faster than those in Gu′ according to Property (2).


Consequently, all inter-component arcs between Gu and Gu′ must disappear in t+0 . We have thus

proved {Gu}u=1,...,K are the connected components with respect to the arc set E(W (t)). Proposition

3 then implies that {W (t) | t∈ (t0, t1)} is the HOL waiting times of a fluid process.

We can further show that the arc set E+ constructed based on the solution to the LCP is

invariant for all |x| ∈ 0+. So the routing components of the fluid process in t+0 can always be

uniquely determined using (EC.24), which guarantees that the fluid process uniquely exists.

Proposition 7 Suppose at t0 the routing graph is degenerate. Then there always exists a fluid

process in t+0 whose inter-component arc set E+ is constructed from the solution to the LCP for

all |x|’s in some 0+. Furthermore, such a fluid process is unique in t+0 .

Proof.

“Existence”: It suffices to show that for each |x| ∈ 0+, the LCP has a solution, and all the

solutions at different |x|’s leads to the same E+.

For any given |x| ∈ 0+, ΨGk

W |x|(r|x|) can be expressed as an affine function of r|x| as

ΨGk

W |x|(r|x|)

=

(

∑

i∈Gk

λi(t−W|x|i

)FCi (W

|x|i

)

g′i(W

|x|i

)

)−1(

∑

i∈Gkλi(t−W

|x|i )−

∑

j∈Gkµj(t)+

∑

e∈δ+(Gk)re−

∑

e∈δ−(Gk)re

)

= ΨGk

W |x| +(∑

e∈δ+(Gk)re−

∑

e∈δ−(Gk)re)Φ

Gk

W |x| ,(EC.27)

where ΨGk

W |x| was defined in (20), and

ΦGk

W |x| :=

(

∑

i∈Gk

λi(t−W|x|i

)FCi (W

|x|i

)

g′i(W

|x|i

)

)−1

. (EC.28)

Define ΨW |x|(r|x|) := {ΨGk

W |x|(r|x|)}k=1,...,K , ΨW |x| := {Ψ

Gk

W |x|}k=1,...,K , and ΦW |x| := {ΦGk

W |x|}k=1,...,K .

Let P denote the vertex-arc incidence matrix of the directed graph G(t0), and let Diag (ΦW |x|)

denote the K-by-K diagonal matrix with diagonal ΦW |x| . Equation (EC.27) can be expressed in

vectorized form as

ΨW |x|(r|x|) =Diag (ΦW |x|)P r|x|+ΨW |x| . (EC.29)

The potential-function vector, defined as ∆ΨW |x|(r|x|) := {ΨGk,Gk′

W |x| (r|x|)}(k,k′)∈E(t0), can then be

expressed as

∆ΨW |x|(r|x|) = P TΨW |x|(r|x|) = P TDiag (ΦW |x|)P r|x|+P TΨW |x| . (EC.30)

Using the above equation, the LCP can be expressed in its vectorized form

(r|x|)T (P TDiag (ΦW |x|)P r|x|+P TΨW |x|) = 0r|x| ≥ 0

P TDiag (ΦW |x|)P r|x|+P TΨW |x| ≥ 0,(EC.31)


According to classical results on LCP (Cottle 1964, Lemke 1965), the LCP has a solution if (a)

the Hessian matrix P TDiag (ΦW |x|)P is symmetric positive-semidefinite; and (b) there is a pair of

vectors r|x| and ∆ΨW |x|(r|x|) which are both non-negative (not necessarily complementary). (a)

is straightforward by the non-negativity of ΦW |x| . (b) can be proved by constructing r|x| in the

following way: We first set r|x|e = 0 for all e ∈ E(t0). We check if there is any arc (k0, k1) ∈ E(t0)

corresponding to a negative potential ∆ΨGk0

,Gk1

W |x| (r|x|) < 0. If such an arc exists, we then push a

positive flow along a directed path (k0, k1, . . . , kℓ) with kℓ being a leaf node (having no outgoing

arcs). Such a path always exists because G(t0) contains no directed cycle. By pushing such a positive

flow, it increases the score change rate of Gk0 (ΨGk

W |x|(r|x|)) and decreases that of Gkℓ

(ΨGkℓ

W |x|(r|x|)),

whereas the score change rates of vertices in the middle of the path stay invariant as their net

inflow rates have not changed. As a result, pushing this positive flow along this path increases the

potential on arc (k0, k1) while keeping the potentials on all other arcs non-decreasing. By pushing

a sufficiently large amount of flow along the path, it can make ∆ΨGk0

,Gk1

W |x| (r|x|)≥ 0. We then repeat

the above procedure until all arcs have a non-negative potential value, which then leads to a pair

of nonnegative vectors, r and ∆ΨGk0

,Gk1

W |x| (r|x|).

At each |x|, the LCP may have multiple complementary pairs, each of which is non-negative

and minimizes (r|x|)T (P TDiag (ΦW |x|)P r|x| + P TΨW |x|). Since P TDiag (ΦW |x|)P is positive semi-

definite, a feasible solution has to take the form r|x|+∆r, where r|x| is any feasible solution, and ∆r

lies in the null space of P TDiag (ΦW |x|)P , or equivalently the null space of P , which means ∆r has

to be a circulation over the underlying undirected graph of G(t0). Since P TDiag (ΦW |x|)P∆r= 0,

we deduce that the vector of potential functions, ∆ΨW |x| , has to be unique. Therefore, the arc set

E+ constructed based on ∆ΨW |x| must be unique at each |x|.

Finally, we argue that the arc set E+ constructed using (EC.24) has to be invariant for different

|x| ∈ 0+. Without loss of generality, we assume that for a sequence of points |x|ℓ→ 0 (ℓ→∞), the

LCP solution leads to the same arc set E+. Then by the equivalence between the LCP characteri-

zation and Properties (1′) and (2′), we deduce that if there are arcs in E(t0)\E+ going from Gu to

Gu′ , then ΨGu,Gu′

W |x|ℓ> 0. Because the sign of Ψ

Gu,Gu′

W |x|ℓ> 0 stays invariant in 0+, Property (2′) holds

for all |x| ∈ 0+. Property (1′) can be proved by the same logic. Thus, Properties (1′) and (2′) hold

for all |x| ∈ 0+, which implies that the LCP leads to the same E+ at all |x| ∈ 0+ by the equivalence

between Properties (1′) and (2′) and the LCP characterization.

Since for all |x| ∈ 0+, the solution to the LCP leads to E+, we deduce that E+ satisfies the LCP

characterization in Proposition 6. Then Proposition 6 guarantees that the fluid process exists.

“Uniqueness”: As we showed in Proposition 4, ifW (t) is a fluid process, then in t+0 the associated

arc set E(W (t)) must contain all intra-component arcs as well as inter-component arcs that remain

in G. Let E(t) denote the inter-component arcs that remain in G at time t. We next show that


E(t)≡ E+ for t∈ t+0 and E+ has to satisfy Properties (1) and (2). We prove this by contradiction.

Suppose there exists a sequence of times {tℓ | ℓ= 1,2, . . .} that converge to t0 from the right, and

E(tℓ) 6= E+ for all ℓ. Then we can find a sub-sequence {tℓh | h= 1,2, . . .} at which E(tℓh)≡ E′ 6= E+.

Since the arc set E+ that satisfies both Property (1) and (2) is unique, E′ cannot satisfy both

Property (1) and (2). If E′ does not satisfy (2), we can find two subsets Gu and Gu′ , such that

ΨGu,Gu′

W (tℓh) ≤ 0, which then implies Ψ

Gu,Gu′

W (t) ≤ 0 for all t ∈ t+0 by the “invariant sign” property of

ΨGu,Gu′

W (t) . However, if the score of Gu cannot increase as fast as that of Gu′ , it is impossible that

Gu and Gu′ are disconnected in t+0 because Gu can send flow to Gu′′ to make their score change

rates equal. Using the same logic, we can derive a contradiction if E′ does not satisfy (1). Thus we

show that E(t)≡ E+, which means the connected components {Gu}u=1,...,K with respect to E+ are

the routing components in t+. By invoking Proposition 3, we have proved that the fluid process is

unique, which is the unique solution to the ODE (EC.2) formulated based on {Gu}u=1,...,K .

EC.5. Proof for Corollary 2

Corollary 2 Suppose a fluid process in FCFS BQS has initial state Wi(0) =W1(0) for all i ∈ I

and λ(t)>µ(t) at all t. Then the following statements are equivalent:

1. The fluid process is globally FCFS.

2. The fluid process is exactly the one constructed by Algorithm 1 using the M+W index (40),

which has Wi(t) =W1(t) for all i∈ I, and W1(t) is the unique solution to the following integral

equation

W1(t) =W1(0)+

∫ 1

0

(1−µ(t)

λ(t,W1(t)))dt. (EC.32)

3. Given W1(t) constructed as above, the arrival rates {λi(t,W1(t))}i∈I and {µj(t)}j∈J satisfy

the GCC.

Proof.

1⇒ 2: If the fluid process is globally FCFS, then Wi(t)≡W1(t) for all i∈ I. Thus, gi(Wi(t)) =

Wi(t) =W1(t) and θi(t) = θ1(t) for all i∈ I. Because all queues always have the same waiting score,

according to the M+W index we defined in (40), i ∈A(t, j) if and only if j and i are compatible,

that is, (j, i)∈Eb. Thus we deduce that Eb(W (t))≡Eb. Since V is connected by Eb, V is thus the

routing component of the fluid process at all t. Thus θ1(t) can be computed as

θ1(t) = ΨVW (t)

=(

∑

i∈V


g′i(Wi(t))

)−1 (∑

i∈Vλi(t−Wi(t))F

Ci (Wi(t))−

∑

j∈Vµj(t)

)

= 1− µ(t)

λ(W1(t))

(EC.33)


where the second equality follows from g′i(Wi(t)) = 1 and our definition of λi(t) and λ(t). By

Proposition 3, the unique fluid processW1(t) can be constructed by solving the ODE (EC.2), which

leads to

g1(W1(t)) = g1(W1(0))+

∫ t

0

θ1(t)dt=W1(0)+

∫ t

0

(

1−µ(t)

λ(W1(t))

)

dt. (EC.34)

Because Wi(t) =W1(t) = g1(W1(t)) for all i ∈ I, the above equation leads to the integral equation

for W1(t) as given in (EC.32).

Finally, by our assumption that λ(t) > µ(t), we know that λ(t,0) = λ(t) > µ(t), so W ′1(t) =

1− λ(t,W1(t))

µ(t)is positive whenever W1(t) is sufficiently close to zero. Thus, the solution to the ODE

(EC.32) must be always positive, which implies that the condition (13) in Step 0 always fails and

the above solution can be successfully constructed by Algorithm 1.

2⇒ 1: This direction is straightforward by the definition of global FCFS.

2⇒ 3: Suppose {W (t) | t ≥ 0} is constructed by Algorithm 1. By above argument, the score

change rates for all queues is given by θ1(t) which can be computed as (EC.33). Since servers

in A can only serve queues in ∪j∈AI(j), but queues in ∪j∈AI(j) can receive supply fluid from

other servers, we have∑

∪j∈AI(j)

∑

j∈J rji(t)≥∑

∪j∈AI(j)

∑

j∈Arji(t) =

∑

j∈Aµj(t). Thus, for queue

i∈∪j∈AI(j), we have

θi(t) =(

∑

i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))

)−1

×(

∑

i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))−

∑

i∈∪j∈AI(j)

∑

j∈J rji(t))

≤(

∑


)−1(∑

i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))−

∑

j∈Aµj(t)

)

= 1−∑

j∈A µj(t)∑

i∈∪j∈AI(j) λi(t,Wi(t))

(EC.35)

which leads to inequality (41).

3⇒ 2: Proof by contradiction. Suppose 2 fails. Then the routing component cannot be always

V ; otherwise, Algorithm 1 must have constructed a fluid process W (t) that solves the ODE (EC.2)

with routing component V , and thus the integral equation (EC.32). If V is not always the routing

component, it means at some time t, some queues in I have θi(t) 6= θ1(t), and the edge set E(W (t))

is disconnected at t. We can thus find a connected component H whose score change rate θi 6= θ1.

Let A = H ∩ J denote the set of servers in H, then the set of queues in H are exactly the set

∪j∈AI(j) by H being a connected component at E(W (t)). Then for all queues i∈H, we have

θi =ΨHW (t) =1−

∑

j∈Aµj(t)

∑


. (EC.36)

If θi < θ1, then the above inequality leads to 1−θ1 <∑

j∈A µj(t)∑

i∈∪j∈AI(j) λi(t−Wi(t))FCi

(Wi(t)), which contradicts

(41), so we proved 3⇒ 2. If θi > θ1, then there must exists another connected componentH ′ ⊆ V \H,

and all queues i∈H ′ have θi <θ1. Then the same argument can be used to get a contradiction.


The terminology of GCC follow from recent work by Cong et al. (2015), in which the Generalized

Chaining Condition (GCC) is proposed to provide a performance bound for a max-weight routing

policy in a non-overloaded BQS. Their definition of GCC slights differs from ours as they require

the existence of θ1 > 0 such that for all non-empty strict subsets A⊆J , λi+θ11 (1 denotes the all-

ones vector) satisfies the inequality (41), with ≥ replaced by < in their condition as they consider

the non-overloaded case.

EC.6. Proof for Theorem 4

Theorem 4 Suppose W i <∞ for all i∈ I, then W (t)→W ∗ when t→∞, where W ∗ denotes the

steady-state waiting times constructed as in Theorem 2.

Proof. According to Theorem 2, a unique HOL waiting-time vector W ∗ always exists. We

define ∆gi(t) := gi(Wi(t))− gi(W∗i ) as the difference in the scores of queue i between at time t and

at the steady state W ∗. Define I(t) := argmax{∆gi(t) | i ∈ I}, and ∆g(t) := max{∆gi(t) | i ∈ I};

similarly, define I := argmin{∆gi(t) | i ∈ I}, and ∆g(t) := min{∆gi(t) | i ∈ I}. We want to show

that for any ǫ > 0, there exists a T > 0 such that for all t≥ T , ∆g(t)≤ ǫ (or ∆gi(t)≥ −ǫ). Note

that the derivative may not exist at countably many points but that will not affect our subsequent

analysis.

We next prove the ∆g(t)≤ ǫ case. The main idea is to show that the supply fluid that flows into

any queue in I(t) at time t will be no less than that at the steady state. Consequently, the score

change rates of queues in I(t) will be smaller than that at the steady state and thus has to be

negative.

For any queue i∗ ∈ I(t), consider a connected component A(t) at time t (with respect to E(t))

that contains queue i∗. Let A(t) ∩ I(t) denote the set of queues in I contained by A(t), and let

J ∗(A(t)∩ I(t)) denote the set of servers connected to queues in A(t) ∩ I(t) at the steady state,

and let J t(A(t)∩I(t)) denote the set of servers that are serving queues in A(t)∩I(t) exclusively

at time t. We next show that J∗(A(t)∩I(t))⊆J t(A(t)∩I(t)). Intuitively, that is because queues

in I(t) have the highest difference between their scores at time t and at the steady state, so at

time t, they should be more competitive than at the steady state.

Formally, because a server j ∈ J ∗(A(t) ∩ I(t)) must be connected to some queue i ∈ I(t), it

means for for any queue i /∈ I,

gi(W∗i)+L(j, i)≥ gi(W

∗i )+L(j, i) (EC.37)

According to the definition of I(t), we have

∆gi(t) = gi(Wi(t))− gi(W∗i)>∆gi(t) = gi(Wi(t))− gi(W

∗i ). (EC.38)


Inequality (EC.37) and (EC.38) together imply that

gi(Wi(t))+L(j, i)> gi(Wi(t))+L(j, i). (EC.39)

The above inequality implies that server j ∈ J ∗(A(t)∩ I(t)) is only connected to queues in I(t),

thus by definition we deduce that j ∈ J t(A(t)∩ I(t)). So we have shown that J ∗(A(t)∩ I(t))⊆

J t(A(t)∩I(t)).

Since all queues in A(t)∩I(t) have the same score change rate as queue i∗, using an analogous

logic as we used to derive (11), we can derive the following expression for θi∗(t)

θi∗(t) =

∑

i∈A(t)∩I(t)

λiFCi (Wi(t))

g′i(Wi(t))

−1

∑

i∈A(t)∩I(t)

λi−∑

j∈J

rji(t)

. (EC.40)

Because∑

j∈J

rji(t)≥∑

j∈J t(A(t)∩I(t))

µj ≥∑

j∈J ∗(A(t)∩I(t))

µj ≥∑

j∈J

r∗ji(t),

we can further lower bound the RHS of (EC.40) as

θi∗(t) =(

∑

i∈A(t)∩I(t)

λiFCi (Wi(t))

g′i(Wi(t))

)−1(∑

i∈A(t)∩I(t) λi−∑

j∈J rji(t))

≤(

∑

i∈A(t)∩I(t)

λiFCi (Wi(t))

g′i(Wi(t))

)−1(∑

i∈A(t)∩I(t) λiFCi (Wi(t))−

∑

j∈J r∗ji

)

=(

∑

i∈A(t)∩I(t)

λiFCi (Wi(t))

g′i(Wi(t))

)−1(∑

i∈A(t)∩I(t) λiFCi (Wi(t))−λiF

Ci (W ∗

i (t)))

(EC.41)

where the last equality follows from the stationary condition (43). Since W ∗i <Wi(t) ≤W i, and

fi:= inf{fi(x) | x∈ [0,W i]}> 0, we have

∑


Ci (W ∗

i (t))

<∑

i∈A(t)∩I(t) λi(FCi (Wi(t))−FC

i (W ∗i ))

(

= λi

∫W∗i

Wi(t)fi(τ)dτ

)

< −∑

i∈A(t)∩I(t) λi(Wi(t)−W ∗i )f i

< −λi∗(Wi∗(t)−W ∗i∗)f i∗

(EC.42)

Since gi is strictly increasing over its compact domain [0,W i] for each i ∈ I, we can find c, c > 0

such that

c≤ g′i(τ)≤ c, for all i∈ I, τ ∈ [0,W i]. (EC.43)

Consequently, the term(

∑

i∈A(t)∩I(t)

λiFCi (Wi(t))

g′i(Wi(t))

)−1

can be lower bounded by

c(

∑

i∈A(t)∩I(t) λiFCi (Wi(t))

)−1

. When ∆g(t) = gi∗(Wi∗(t)) − gi∗(W∗i∗) ≥ ǫ, (EC.41) and (EC.42)

imply that

θi∗(t) <(

∑

i∈A(t)∩I(t)

λiFCi (Wi(t))

g′i(Wi(t))

)−1(∑


Ci (W ∗

i (t)))

< −c(

∑


)−1

λi∗(Wi∗(t)−W ∗i∗)f i∗

< −c(

∑


)−1

λi∗gi(Wi∗ (t))−gi(W

∗i∗ )

cfi∗

= −C(gi∗(Wi∗(t))− gi∗(W∗i∗))

< −Cǫ

(EC.44)


for some constant C > 0 which does not depend on t nor the choice of i∗ ∈ I(t). Thus, ∆g′i∗(t) =

θi∗(t)≤ −ǫ for all t and all i∗ ∈ I(t) when ∆g(t)≥ ǫ, and consequently ∆g′(t)≤max{∆g′i(t) | i ∈

I(t)}<−Cǫ. Moreover, as long as ∆g(t)> 0, inequality (EC.41) and (EC.42) imply that ∆g′(t)≤ 0.

Therefore, ∆g(t) is always decreasing, and its decreasing rate is at least Cǫ when ∆g(t)≥ ǫ. Thus,

for sufficiently large t, we have

∆g(t) =∆g(0)+

∫ t

0

∆g′(t)dt< ǫ, (EC.45)

which proves that ∆g(t)→ 0 when t→∞. An analogous argument can be used to prove that

∆g(t)→ 0. Therefore, we have shown ∆gi(t)→ 0 for all i∈ I, which lead to Wi(t)→W ∗i by strict

monotonicity of gi(·) and compactness of Wi(t).

EC.7. Parameter Estimation Details

Geyer and Sieg (2013) estimated the average net number of households that move out of PH1,

PH2, and PH3 per quarter, which can be regarded as the resource arrival rates in the BQS mode,

namely µ1 = 118.9, µ2 =29.5, and µ3 = 9.1.

To estimate the proportion of each choice profile, we use the discrete-choice model estimated by

Geyer and Sieg (2013), which assumes a renter’s utility to be

ujt = γj +β ln(yjt)+ δxt +mc+ ξjt, j =1,2,3u0t = ln(y0t)+ δxt + ξ0t, j =1,2,3

(EC.46)

where j denotes the community type, (j =0 means living outside public housing), t is the index of

an applicant, γj, ujt represents the utility for applicant t to accept a unit of type j, γj represents

the fixed effect of community type j and yjt denotes the net income of the applicant reduced by

the rent of housing type j, β denotes the coefficient for the log of net-income (Geyer and Sieg

(2013) assumed β = 1 for the outside housing option), xt is a vector of a household’s attributes,

mc stands for the moving cost for the household (which is incurred when the household accept

public housing), and ξjt represents idiosyncratic shocks which are assumed to be i.i.d. and possess

extreme-value distributions.

Geyer and Sieg (2013) studied the choice behavior of householders already residing in public

housing as well as those outside public housing. In order to do that, they created two population

pools to generate xt and yt: the residence pool and the waitlist pool. The residence pool consists of

households who resided in public housing in the city of Pittsburgh from June 2001 to June 2006.

The attributes of such households were recorded in the HACP’s dataset. The waitlist pool consists

of 14,234 randomly selected low-income applicants who were eligible for public housing from 13


metropolitan areas with a public housing-household ratio similar to the city of Pittsburgh. Their

information is recorded in the SIPP dataset. Geyer and Sieg (2013) argued that these applicants

are a good proxy for those on the waitlist for public housing in Pittsburgh. Since our case study

focuses on the dynamics on the waitlist, we ignore possible transfers between different community

types (which are small). Therefore, it suffices to create the waitlist pool using the same method

described in the above paper and then generate xt and yt from that waitlist.

The waitlist pool we use is a subset of the dataset used by Geyer and Sieg (2013). Therefore

their choice model can be applied to predict household choice in our system. Using xt and yt,

we can calculate the deterministic parts of the utility function ujt and u0t using the coefficients

reported in Geyer and Sieg (2013). By generating the random errors (ξjt)j=0,1,2,3 according to

their distributions, we can simulate the values of ujt and u0t and calculate the probabilities for

each choice profile. For example, Pr(1,2) = Pr(u1t > u2t > 0), Pr(2) = Pr(u1t > 0, u2t < 0, u3t < 0).

Although our choice set has only three types rather than the six types of (Geyer and Sieg 2013),

this does not lead to any estimation bias of the choice probabilities due to the “independence of

irrelevant alternatives” property of the discrete choice model.

We estimate our matching function U(·, ·) using the following reasoning. (1) The function values

should have the same scale for different applicants. Thus, we assign U(j, (j1, j2)) = 1 when j = j1,

i.e., when an applicant’s first choice was met; and we assign U(j, (j1, j2)) = 0 if j 6= j1, j2, that is,

if j is not in an applicant’s choice set. (2) For each individual applicant, the degree of matching

for the 3 options, i.e., the “first choice”, “second choice”, and “not interested in public housing”

have to reflect an individual’s degree of interest among the three options. We find that on average,

the second largest value among the random utilities (ujt)j=0,1,2,3 is 0.66 of the largest value after

normalizing “not interested in public housing” to be zero. Therefore, we use 0.66 as the value for

the degree of matching if an offered housing unit is the second choice of the applicant.

Based on our estimation from the historical data, new applicants arrive at a constant arrival

rate of 443 every quarter of a year. The waitlist was closed in May 2015 and re-opened in July

2015. Assuming that the 1578 applicants on the older closed waitlist for non-senior housing needed

to re-register again on the new waitlist and that 50% of these applicants did register on the new

waitlist in the first two quarters, yields the total arrival rate of 50%×1578+443= 1232 per quarter

in the first two quarters, and thereafter a stable 443 per quarter since the third quarter.

Finally, applicants on a waitlist may abandon the queue before they are assigned a housing

unit for various reasons. Based on our conversations with HACP, the most typical reason that an

applicant loses eligibility because of an increase in their income. The data shows that the time

for an eligible applicant to lose their eligibility has a similar distribution across different queues,

and can be approximated by an exponential distribution with a hazard rate of d= 0.07 quarters.

Therefore the cumulative distribution of abandonment time is given by Fi(t) = 1− exp(−dt).