a fluid model for an overloaded bipartite queueing system with...
TRANSCRIPT
Submitted to Operations Research
manuscript (Please, provide the mansucript number!)
Authors are encouraged to submit new papers to INFORMS journals by means ofa style file template, which includes the journal title. However, use of a templatedoes not certify that the paper has been accepted for publication in the named jour-nal. INFORMS journal templates are for the exclusive purpose of submitting to anINFORMS journal and should not be used to distribute the papers in print or onlineor to submit the papers to another publication.
A Fluid Model for an Overloaded Bipartite QueueingSystem with Heterogeneous Matching Utilities
Yichuan Ding, S. Thomas McCormick, Mahesh NagarajanSauder School of Business, University of British Columbia, Vancouver, BC V6T1Z2,
{Daniel.Ding, Tom.McCormick, Mahesh.Nagarajan}@sauder.ubc.ca
We consider a bipartite queueing system (BQS) with multiple types of servers and customers, where different
customer-server combinations may generate different utilities. Whenever a server is available, it serves the
customer with the highest index, which is the sum of a customer’s waiting index and the matching index. We
assume that the waiting index is an increasing function of a customer’s waiting time and the matching index
depends on both the customer and the server’s types. We develop a fluid model to approximate the behavior
of such a BQS system, and show that the fluid limit process can be computed over any finite horizon. We
prove that the fluid limit process converges to a unique steady state, which can be efficiently computed.
These results enable the policy designer to predict the behavior of a BQS when an indexing priority rule
takes the form as we have defined, and choose an indexing formula that optimizes a given set of performance
metrics. We illustrate the use of our machinery by analyzing the public housing assignment in the city of
Pittsburgh using a real data set.
Key words : Bipartite Queueing System, Min Cost Max Flow, Nested Cuts for Parameterized Network,
Value-based Routing, Public Housing Assignment, Scarce Resource Allocation
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 1
1. Introduction
Many service systems can be modeled as queuing systems that allocate service capacity between
customers (clients), and servers (resources). In settings such as organ donation or assignment of
public housing, the demand for services is typically higher than the capacity to service the demand.
We refer to such systems as being overloaded. As a result, some customers end up abandoning the
queue without being served.
In applications with homogenous customers and homogenous servers where matching customers
to servers is not a significant issue and where it is desirable to have customers be equally likely to
get served, the first-come-first-serve (FCFS) rule is conventionally accepted as the gold standard
of fairness.
In other applications that have heterogenous customers (such as different classes of patients)
and/or heterogenous servers (such as different qualities of organs), FCFS is less appropriate. Differ-
ent server-customer pairs in these settings can generate different levels of welfare from the service
provider’s perspective. For example, for patients with end-stage renal disease who are awaiting kid-
ney transplants, certain donor-recipient pairs can lead to higher post-transplant survival rates; and
certain hospitals can provide better treatment for some types of medical conditions than others. By
implementing priority rules in such settings, the service provider can generate more advantageous
server (resource)-customer pairs and achieve a better system-level efficiency than that achieved by
applying the FCFS rule. Doing so may, however, result in longer waiting times for customers for
whom matching a server is difficult, which may in turn lead to higher abandonment rates.
When one considers settings such as the public sector example mentioned above, the allocation
of scarce resources usually faces two conflicting needs: maximizing efficiency requires an increase
in the proportion of efficient resource-customer pairs, while minimizing inequity (or unfairness)
requires a reduction in the disparity of waiting times across customers of different types. Ideally one
would like to solve a stochastic optimization problem whose objective includes some combination of
the above two factors. Unfortunately, it appears likely for the class of problems that interest us that
this would lead to optimal policies that are too difficult to understand, and so difficult to implement
in practice. Instead we consider a class of sophisticated priority rules that takes into account the
matching outcome as well as the customers’ waiting time in order to heuristically make a tradeoff
between these two conflicting objectives. Understanding the dynamics of the system under such
priority rules can be challenging. The goal of this paper is to develop tools which approximate
the dynamics of the queueing system under a broad but specific class of priority rules, so that the
policy designer can use the understanding of the dynamics to study the impact of using specific
priority rules. This is crucial in helping decide which priority rule one may want to use to achieve
a particular overall objective.
Author: An Overloaded Bipartite Queueing System with Matching Cost2 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
1.1. Overview of the Model
In the next several sections we introduce our model and the notations needed to study service
systems such as the ones above. We use a bipartite queueing system (BQS), which consists of I types
of customers and J types of resource pools or service providers (the terms “resource” and “service”
will be used interchangeably from now on). Customers and resources arrive at the system with
time-varying mean arrival rates λi(t) and µj(t), respectively, for customer type i(= 1,2, . . . , I) and
resource type j(= 1,2, . . . , J). Since this model is motivated by the allocation of scarce resources,
the focus of our study is on scenarios when the BQS on the customer side is overloaded most of the
time. As a result, customers of type i have to buffer in the i-th queue and the servers are busy most
of the time. A waiting customer in class i independently abandons the system after a random time
duration that follows a fixed distribution with continuous cdf Fi(·). We assume that a customer
will permanently leave the system by either being served or abandoning the queue.
A measure of efficiency would be to assume that a utility U(j, i) is generated whenever a customer
of type i receives service of type j, and then the system efficiency is the average utility of all
matches. There are several ways to measure fairness. One possibility is to look at measures that
capture the differences across queues in their likelihood of getting served. This can be rigorously
measured as the variance of an underlying random variable. We elaborate and discuss this carefully
in Section 3, where we study the impact of the scoring rule on the system and the overall objective
of the service provider, which in turn depends on the above two factors.
Since trying to directly optimize these objectives would likely lead to unimplementable policies,
we consider instead an implementable class of policies that ranks the customers by what we call an
M+W index or score (we use the terms “index” and “score” interchangeably from now on), where
“M” and “W” stand for matching score and waiting score, respectively, and “+” indicates that
the total score is a sum of the waiting and matching scores. For the “M” part, we assume that a
matching score L(j, i) is given or can be computed for each resource-customer pair (j, i), indicating
how good it would be to serve customer type i by server type j. One could base L(j, i) on U(j, i),
but it is possible that one could get better performance by choosing L(j, i) different from U(j, i).
The “W” part depends on the amount of waiting time a customer has spent in the queue. Thus
we assume that a waiting score gi(WT) is given or can be computed for customer type i. We
assume that gi(WT) is a strictly increasing and continuous function of WT to incent the system to
choose to serve customers who have waited longer. The special case where gi(·) is a linear function,
gi(WT) = ciWT, has been commonly used in other ranking policies. Therefore the total score of
type i customers as seen by servers of type j, denoted by s(j, i,WT), is
s(j, i,WT)=L(j, i)+ gi(WT). (1)
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 3
Whenever a server of type j becomes available, it will be assigned to the HOL customer in queue
i with the highest M+W index s(j, i,WT). We only consider non-preemptive service disciplines,
because in our model the start of a service means that the resource has been offered to a customer
and this allocation is final.
Because customers of the same type (so in the same queue) have the same matching score with
respect to the same server, and the waiting score is strictly increasing in each customer’s waiting
time, we know that within each queue, the discipline is first come, first served (FCFS). Customers
at the head of different queues have to compete for service determined by comparing their scores.
Since gi(·) is strictly increasing and continuous, and the inter-arrival and service completion times
both have continuous distributions, the probability that two customers’ indices are tied is zero.
We refer to a BQS equipped with the M+W indexing rule as the M+W-BQS model. This
framework has applications in many settings where scarce resources are allocated across different
types of service requests, particularly in public sectors. We now provide two examples that have
been extensively studied in various literatures as examples of important social problems.
Motivating Example 1: Cadaver Kidney Allocation . The United Network of Organ Sharing
(UNOS) is a non-profit organization which coordinates U.S. organ transplant activities. UNOS
(Committee 2011) continues to face a serious shortage of kidney donations. There were more than
90,000 patients registered on the kidney transplant waitlist at the end of year 2014 (OPTN/UNOS
2015). This number continues to grow. This waitlist can be modeled as a BQS, where patients
(kidneys) with similar physical attributes are regarded as being in the same queue. Patient in this
system renege either due to death or by receiving live donations.
The UNOS has been seeking an evidence-based and transparent ranking policy that strikes
a balance between maximizing the total life years saved by transplant and minimizing inequity
across different types of patients (Zenios et al. 2000). In 2008, the UNOS Scientific Registry of
Transplant Recipients (SRTR) proposed to rank candidates using their kidney allocation score
(KAS) (OPTN/UNOS 2008), which can be expressed as
KAS=0.8× (1−DPI)
0.8×DPI+0.2×LYFT+CPRA× 4/100+WT, (2)
where WT is the patient’s waiting time, LYFT ( life years from transplant) is a measure of the
candidate’s net gain from transplantation (which depends on both the donor and patient type),
DPI (donor profile index) is a function of donor type, and CPRA (calculated panel reactive anti-
body) compensates patient types for which it is difficult to find a matched kidney. Formula (2)
is a particular case of M+W index by defining L(j, i) = 0.8×(1−DPI)
0.8×DPI+0.2× LYFT+CPRA× 4/100 and
gi(WT)=WT (We assume gi(0) = 0, so the terms independent of WT have been added into L(j, i)).
The tools and methods developed in this paper using our M+W-BQS framework enable one to
study the performance of the system if one uses such a KAS.
Author: An Overloaded Bipartite Queueing System with Matching Cost4 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Motivating Example 2: Housing Assignment. Kaplan (1988) was the first to study the problem
of assigning limited public housing units to deserving applicants who register sequentially and have
heterogenous preferences for housing types and locations. Kaplan considers a scenario in which
each applicant has to specify their acceptable housing types and houses of each type will only
be offered to applicants who accept that type. An extension of the Kaplan’s model is to allow
applicants to specify their ordinal preferences over different types of houses, a practice followed
in some North American cities. For example, an applicant may rate each house type as “first
preference”, “second preference”, etc. Our M+W-BQS framework is able to investigate a housing
allocation system that accommodates such ordinal preferences. In Section 5, using a data set from
the Housing Authority of the City of Pittsburgh, we predict and compare allocation outcomes
when different M+W-indexing rules are used in the allocation of public housing in Pittsburgh.
Next, we review the literature relevant to M+W-BQS, and clarify the contributions and the
relative positioning of our paper. This review will reveal that M+W-BQS is in general not in
the class of models that are exactly solvable. Hence this paper will analyze M+W-BQS using a
deterministic fluid approximation of the underlying stochastic process.
1.2. Literature Review
BQS covers a broad set of service systems, and its formulation exhibits subtle differences depend-
ing on the specific application. According to the traditional definition of a BQS in the queueing
literature, each resource type j = 1, . . . , J can only serve a subset of compatible customer types,
say I(j). The topology of the BQS can be represented by a bipartite graph whose vertices on
each side represent the customer and server types, respectively. An arc connecting vertices j and
i represents that (j, i) is a compatible resource-customer pair. Depending on the application that
is being modeled, each vertex on the resource side may represent a type of resource, or a single
server, or a pool of many servers of the same type.
In settings that resonate with our motivating examples, at least two approaches are possible.
The first approach (“optimization”) studies what the optimal control policy would be for a BQS
where one has a clear objective that is being optimized. The policy that one derives using this
approach often yields a priority rule. The optimization approach is practically useful if the priority
rule is not complicated and easy to understand and implement.
In public sector settings such as organ donation, transparency and ease of understanding is a key
requirement, and the policies coming from the optimization approach would be too complicated.
The second and more appropriate approach (“rule-based”) in such cases is to invent a scoring rule
that takes into account different relevant considerations and uses this to allocate available capacity.
In the rule-based approach, understanding the impact of the scoring rule on the system is vital.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 5
We take the rule-based approach in this paper. Our paper has connections to both approaches and
the literature review is loosely organized based on the approach each paper takes to studying the
BQS.
One stream of literature uses the rule-based approach to characterize the dynamics of a BQS
under the FCFS service disciplines (hereafter referred to as FCFS-BQS). The FCFS rule in BQS
specifies that each newly available server j has to serve the first-arrived customer among those in
I(j). The FCFS-BQS model was first proposed by Schwartz (1974) in the study of a “lane-selection”
problem, and later motivated by other applications such as the allocation of public housing (Kaplan
1988) and the adoption of children (Caldentey and Kaplan 2007). Talreja and Whitt (2008) first
studied a fluid approximation for the FCFS-BQS with abandonment, where all flows are regarded
as deterministic. They discussed the conditions under which the system is globally FCFS, that
is, all customers are FCFS regardless of their types. They pointed out that the fluid process may
be not unique when the residual network of the bipartite graph contains a loop. Caldentey et al.
(2009) studied a stochastic BQS where customers and resources both arrive according to a renewal
(non-Poisson) process with constant mean. They proposed a novel state description to characterize
the system behavior as a Markovian process. Later, Adan and Weiss (2012) modeled the same
BQS with a different Markovian process and derived its stationary distribution, which surprisingly
takes a product form. Using this state description, product-form stationary distributions have been
found for settings when a newly arriving customer was routed to a randomly selected compatible
server with fixed probability (Visschers et al. 2012), or the longest-idle compatible server (Adan
and Weiss 2014). Adan and Weiss (2014) also proved the convergence of the stochastic processes in
the FCFS-BQS to the fluid limit process when each type of service is provided by a single server.
Another stream of literature uses the optimization approach to examine scheduling rules in
BQS, and most of the problems in this literature have an objective of minimizing total delays
in the system. An early non-BQS result by Van Mieghem (1995) introduced the Gcµ rule for a
single-server multi-class queueing system with a holding cost Ci(Qi) for each queue i, where Ci(Qi)
is a convex increasing, differentiable function of the queue length Qi. The Gcµ rule says that a
newly available server j should myopically maximize the decrease in cost by serving the queue
i with the highest index µiC′i(Qi). Mandelbaum and Stolyar (2004) extended this result to the
BQS setting and proved that if each server j follows the Gcµ rule by serving the highest index
µijC′i(Qi), then it asymptotically minimizes both instantaneous and cumulative total holding cost
over all routing policies. An analogous result for a BQS with many servers for each type is proved
by Gurvich and Whitt (2009) under additional conditions. When there is holding cost of c per unit
time and customers renege at rate θ, Atar et al. (2010) proved that the cµ/θ rule minimizes linear
holding costs in a Markovian queue with homogeneous servers and multitype customers. However,
Author: An Overloaded Bipartite Queueing System with Matching Cost6 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
it is not known whether the optimality result of the cµ/θ rule can be extended to the BQS case.
In fact, Stolyar and Tezcan (2011) show that a shadow-routing based control policy numerically
outperforms the cµ/θ rule for the X-model (the simplest BQS with two customer classes and two
server pools) in the sum of the served customers weighted by their types. Other scheduling policies
for a BQS have been studied by Dai and Tezcan (2008), Larranaga et al. (2014), and Ghamami
and Ward (2013).
The above literature on BQS assume that different customer-server pairs may have different
service speeds, but generate the same value customer-server value. In contrast, in our problem
the total reward depends on the customer-server match. This is an important difference. Papers
that consider a BQS and model this difference resort to numerical approaches and show limited
theoretical results. One possible reason is that such a BQS model, although it captures the trade-
offs and dynamics in public sector examples such as the ones described earlier, is not often relevant
to commonly studied service systems such as call centers. Examples include a recent paper by
Sisselman and Whitt (2007) that proposed the idea of value-based routing in contrast to the
traditional congestion-based routing for a BQS in which different customer-server pairs are assumed
to generate different utility values for the service provider. Related to this, Mehrotra et al. (2012)
numerically compares several scheduling rules in call centers where the resolution probability for
each call request depends on both the server and request types.
An important point to note is that the matching utility cannot be easily incorporated into the
analytical framework developed in the existing literature on BQS to obtain results that reconcile
with customer-server utilities. For example, in Section 4 we develop an optimal indexing rule
for the steady state of the fluid model which cannot be derived from the Gcµ nor cµ/θ rule by
simply including the matching utility U(j, i) into the cost rate c. In addition, the fairness objective
requires us to consider congestion-based measurements such as queue length and waiting time, so
this problem has to be considered under a queueing framework. Our paper takes a first step to
investigate this problem by focusing on the fluid approximation of the system with the underlying
belief that the fluid process offers a robust approximation of the stochastic behavior of the system.
This belief is well-founded for at least two reasons. First, our simulation experiment in Section 5
shows that the fluid model gives a close approximation even though some queues have a thin
arrival stream such as less than ten customers per unit time. Second, Adan and Weiss (2014) has
proved that the scaled stochastic process in FCFS-BQS converges to a fluid process using the state
description introduced in (Adan and Weiss 2012). We expect and anticipate that a similar result
follows for our M+W-BQS. However, we believe that this extension would require a significant
technical overhead since M+W-BQS exhibits much more complicated dynamics than FCFS-BQS.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 7
We thus leave the issue of convergence to a fluid process for future research and in this paper we
focus on the fluid approximation to the M+W-BQS.
Note that the M+W indexing policy represents a general class of priority rules and subsumes
FCFS (by setting L(j, i) = 0 and gi(WT) =WT). In particular, it generalizes the dynamic priority
policy proposed by Jackson (1960) for a queueing system with a single server and multi-type
customers. Their dynamic priority rule prioritizes customers using the index
s(i,WT)=L(i)+ gi(WT), (3)
where Jackson (1960) call L(i) the “urgent number” for customer type i. Jackson (1960) studied
the simplest waiting score gi(WT) =WT. Nelson (1990) considered the more general case when
the gi(WT) are affine functions whose slopes and intercepts both depend on i. Kleinrock and
Finkelstein (1967), Netterman and Adiri (1979), Grindlay (1965) further analyzed situations when
gi(WT) are nonlinear functions. The M+W rule generalizes the dynamic priority policy by allowing
function L to depend on both customer and server types.
The M+W index uses three variables in ranking the candidates: customer type, server type,
and customer waiting time. We do not consider other ranking criteria, such as the arrival rates
of customers and resources, or the queue length, which have been widely used in other dynamic
routing policies in overloaded systems, e.g., (Zenios et al. 2000, Stolyar and Tezcan 2011). The
reason is that avoiding these other ranking criteria makes the policy more likely to be used in
practice. For example, it is difficult to rank patients on the kidney waitlist by the queue length or
arrival rates, because the real-time queue length would be difficult to monitor. Moreover in settings
such as these of interest to the general public, having a simple and a somewhat predictable ranking
mechanism is more likely to be publicly acceptable.
1.3. Main Contributions
Our paper makes the following contributions:
1. We present the first study on a BQS where the matching utility depends on both customer and
resource types. The ranking policy we proposed, namely the M+W indexing rule, incorporates
two different objectives that a social planner may have, i.e, maximizing matching utility
and reducing inequity across different queues. Including both objectives makes the model
quite relevant for applications. Using this framework, we show several useful and important
theoretical properties of this system, and we show an application of the tools we develop in
this paper by applying it to a case study on public housing.
Author: An Overloaded Bipartite Queueing System with Matching Cost8 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
2. We develop an algorithm to compute the transient state of the fluid process over any finite
horizon. Our characterization of the transient state of the fluid process has both theoretical
and practical significance. From a theoretical perspective, the construction of the fluid process
demonstrates its existence and uniqueness. Moreover, building on the transient analysis, we
show under certain assumptions that the fluid process converges to the steady state, which
means the steady state we characterize can not only be theoretically achieved but can be used
in practice. From a practical perspective, the arrival rates of both resources and customers
can vary in time. Even if the arrival rates are stationary, an overloaded queueing system
with reneging customers typically takes a substantial amount of time before converging to
the steady state. E.g., for the public housing assignment example we study in Section 5, it
takes 50 years to reach steady state. The policy maker is likely to be interested in the system
performance only over a relatively short time horizon. In such cases studying the transient
behavior would be quite relevant for policy evaluation and design. Thus proving results for
the transient case as we do in this paper is important.
3. We characterize the steady state of the fluid process as a solution to a network flow problem,
and propose an optimal M+W index when the fairness is measured by the variance in the
reneging rate across different queues. Although there is no theoretical guarantee of the opti-
mality of this indexing rule for different fairness metrics or for a transient period, the proposed
M+W index can be used as a benchmark policy based on which one could explore possible
adjustments for improvements. Showing results for a fluid model lets us test various rules and
its impact on possible fairness and efficiency metrics. Having the theoretical approximation
has the advantage that one does not have to resort to simulation, which can be more expensive
and less robust. Moreover, we prove structural results that give useful insights on the effect
of the policy which would be hard to imagine without this type of theoretical result.
4. Both the transient and stationary analysis of the fluid model rely heavily on tools from com-
binatorial optimization, which is not typical in the queueing literature. We need these tools
because the dynamics in our M+W-BQS are extremely complicated, and even the fluid process
alone is hard to describe without these optimization techniques. Our study gives an example
where combinatorial optimization techniques are applied to characterizing complicated queue-
ing networks, which is an area where researchers in both the combinatorial optimization and
queueing communities can make contributions.
The rest of the paper is organized as follows. Section 2 formally defines the fluid process as a
solution to a set of dynamic equations. In Section 3 we prove that over any finite horizon, under
certain conditions, the transient states of the fluid process can be constructed by an algorithm we
develop, which also proves that the fluid process is unique. In Section 4 we characterize the steady
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 9
state as a solution to a min-convex-cost network flow problem, and derive the optimal form of the
M+W policy with respect to some special efficiency and fairness metrics. Section 5 presents a case
study for public housing allocation in Pittsburgh. Finally, Section 6 discusses the limitations of our
paper and future research directions.
2. The Fluid Model
We consider the system as I parallel double-sided buffers (or queues). Demand fluid of type i flows
into the i-th buffer at a rate λi(t), and as customers renege it leaks out at a rate fi(WT), where
WT is the amount of time that the demand fluid has waited in a buffer. We call WT the age of a
demand fluid, and call the demand fluid that has arrived at time t as cohort t. Supply fluid of type
j flows into the system at rate µj(t). Following the terminology in (Liu and Whitt 2012a,b, 2011),
we assume that both λi(t) and µj(t) are piecewise continuous functions, which means that they
are right-continuous functions with finitely many discontinuities in each finite interval, where left
limits exist at each discontinuity point. This broad assumption on arrival and service rates allows
us to model situations where a rate suddenly changes due to an external shock.
Whenever supply fluid meets demand fluid, each is canceled out by an equal amount of the other.
The supply fluid always cancels out the demand fluid with the highest index s(j, i,WT), where
s(j, i,WT) is computed according to equation (1). According to the M+W index, demand fluid
of the same type will be ranked by age. Hence, FCFS applies in each buffer, and so demand and
supply fluid are virtually flowing into a buffer from two opposite directions. Figure 1 provides a
graphical illustration of the fluid model.
We next give a formal mathematical description of the fluid model for M+W-BQS discussed
informally above.
Model Inputs: Index set I and J , piecewise continuous arrival rates (λi(t))i∈I and
(µj(t))j∈J , continuous cumulative distribution functions (cdf) of abandonment time (Fi(·))i∈I ,
matching scores (L(j, i))j∈J ,i∈I, and waiting score functions (gi(·))i∈I.
State Variables: Head-of-line (HOL) waiting time in each buffer W (t) := (Wi(t))i∈I,
amount of demand fluid in each buffer Q(t) := (Qi(t))i∈I, routing rates (or service rates) r(t) :=
(rji(t))j∈J,i∈I, and scores s(t) := (sji(t))j∈J ,i∈I. Note that t is a continuous time index.
Assumptions and Comments on Model Inputs and State Variables: Suppose a unit
of demand fluid arrives at buffer i at time 0 and has not been served by time t. We assume
that the cumulative proportion of demand fluid that has reneged up to time t is given by the
cdf Fi(t). We require Fi(t) to be real analytic on [0,+∞)1, and its pdf fi(·) to stay positive over
Author: An Overloaded Bipartite Queueing System with Matching Cost10 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Figure 1 The Fluid Approximation of M+W-BQS.
[0,W i] with W i ∈ (0,+∞] (we allow W i =+∞ to represent an infinite support of fi(·)) . We use
FCi (t) := 1−Fi(t) to denote the complementary cdf associated with Fi(·).
The HOL waiting time refers to age of the fluid at the head of a queue. We use Wi(t) and Qi(t) to
denote the HOL waiting time and length of queue i at time t, respectively, and use rji(t) to denote
the rate of supply fluid that is routed to buffer i from server j at time t. At any time t, the supply
fluid at the head of each queue i is assigned an M+W score sji(t) =L(j, i)+ gi(Wi(t)) with respect
to each server j. We call sji(t) the score of queue i w.r.t. server j. The waiting score function gi(·)
is assumed to have the following properties: (1) real analytic on [0,+∞); (2) gi(0) = 0; and (3)
strictly increasing.
2.1. Definition of a Fluid Process
Definition 1 A fluid process in an M+W-BQS is a deterministic process {(W (t),Q(t), r(t)) | t≥
0} with W (t) and Q(t) continuous, r(t) piecewise continuous, which satisfies the following con-
straints:
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 11
1. Budget Constraint:∑
i∈I
rji(t) = µj(t), ∀ j ∈J . (4)
2. Flow Dynamic Equation: For all i ∈ I, the derivative Q′i(t) exists almost everywhere (a.e.)
with
Q′i(t) = λi(t)−
∑
j∈J
rji(t)−
∫ t
t−Wi(t)
λi(s)fi(t− s)ds. (5)
In (5), λi(t) and∑
j∈J rji(t) represent the total rate of demand and supply fluid that flow in
to buffer i, fi(t− s) gives the reneging rate for cohort s at time t, and∫ t
t−Wi(t)λi(s)fi(t− s)ds
counts the total reneging rate for queue i at time t. Note that t−Wi(t) could take negative
values, in which case we need to know λi(s) at time s < 0 in order to calculate Q′i(t).
3. One-to-One Correspondence between Qi(t) and Wi(t):
Qi(t) =
∫ t
t−Wi(t)
λi(s)FCi (t− s)ds, (6)
where λi(s)FCi (t− s) gives the amount of fluid in cohort s that remains in the buffer without
having reneged.
4. Highest-Indexed-Served-First: Let A(t, j) denote the set of buffers which may receive supply
fluid of type j at time t. Often A(t, j) is referred to as the active set, which is the set of queues
with the highest scores for supply fluid of type j at time t, that is,
A(t, j) = argmax {sji(t) | i∈ I}. (7)
However, if all queues in argmax {sji(t) | i ∈ I} are empty at time t, then the active set has
to be expanded to allow the surplus of supply fluid of type j to flow into other buffers. We
do not consider this special case in our construction of the fluid process in order to keep
the discussion simpler (this special case can be handled by an easy reduction step as given
in Appendix EC.1). After identifying A(t, j), our priority rule of serving the highest indexed
fluid requires rji(t) > 0 only in periods when queue i ∈ A(t, j), which leads to the following
complementary slackness condition:
rji(t)> 0 ⇒ i ∈A(t, j) = argmax {sji(t) | i∈ I}. (8)
In our above definition of fluid process, we assumed that the queue length process Qi(t) is contin-
uous, and its derivative exists a.e., which implies the same property for Wi(t). This assumption is
without loss of generality for the following reasons. Process Qi(t) can be alternatively expressed as
a difference of integrals that counts the cumulative arrivals and departures for buffer i, which shows
that Qi(t) is continuous everywhere and differentiable a.e. We thus simply characterize the fluid
Author: An Overloaded Bipartite Queueing System with Matching Cost12 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
process using Q′i(t), by noting that the integral expression for Qi(t) would require us to introduce
a significant amount of notation, but is equivalent to the characterization using Q′i(t).
However, the assumption on piecewise continuity of r(t) is indeed restrictive, as in theory there
can be reasonable fluid approximations to the stochastic system where the service rates r(t) does
not have such nice properties. However, we anticipate that most service disciplines used in practice
have piecewise continuous service rates rji(t). Therefore our assumption reflects a reasonable com-
promise that avoids the technical difficulty of analyzing non-piecewise continuous rji(t) without
compromising the practical relevance of our model. The assumption of piecewise continuous service
rates has been used in the queueing literature that looks at time-varying arrival rates, for example,
(Liu and Whitt 2012a).
2.2. A Compact Characterization of the Fluid Process
We next propose a compact characterization of the fluid process which only uses state variables
W (t) and r(t). With W (t) and r(t), we can use (6) to recover Q(t).
We first look at the constraints thatW (t) has to satisfy. Since Qi(t) is differentiable a.e., equation
(6) implies that W ′i (t) exists a.e. and satisfies the equation
Q′i(t) = λi(t)−λi(t−Wi(t))F
Ci (Wi(t))(1−W ′
i(t))−
∫ t
t−Wi(t)
λi(s)fi(t− s)ds. (9)
The two equations on Q′i(t), (9) and (5) then imply the following constraint on W ′
i (t):
W ′i (t) =
(
λi(t−Wi(t))FCi (Wi(t))
)−1(
λi(t−Wi(t))FCi (Wi(t))−
∑
j∈J
rji(t))
. (10)
The scores sji(t) only depends on t through gi(Wi(t)), so its derivative s′ji(t) is independent of
j. Thus we can use θi(t) to denote the score change rate of queue i with respect to all servers, and
θi(t) = g′i(Wi(t))W′i (t)
= g′i(Wi(t))(λi(t−Wi(t))FCi (Wi(t)))
−1
(
λi(t−Wi(t))FCi (Wi(t))−
∑
j∈J
rji(t)
)
=: ϑWi
(
∑
j∈J
rji(t))
,
(11)
where the function ϑWi(z) defined above returns the score change rate of queue i when its HOL
waiting time is Wi and has an aggregate service rate z.
By considering equations (9) (10), and (11), piecewise continuity of r(t) implies thatQ′i(t),W
′i (t),
and θ′i(t) have left and right limits everywhere, and have only a finite number of discontinuity
points in any finite interval. Note that Q′i(t), W
′i (t), and θ′i(t) might not exist at such discontinuous
points, but we can safely assign their function values to be their right limits at those discontinuity
points without changing the process Qi(t) and Wi(t). With this clarification, henceforth, we assume
that Q′i(t), W
′i (t), and θ′i(t) exist everywhere and are piecewise continuous in t.
Next we discuss the constraints the service rates r(t) has to satisfy.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 13
(a) The budget constraint (4).
(b) Supply fluid flows from j to i only if i ∈ argmax sji(t), that is, (8).
(c) If server j is serving multiple queues at t, then the flow rates rji(t) have to be split such that
all queues being served by server j have the same score change rate at t, that is,
rji(t)> 0 ⇒ θi(t) = maxk∈A(t,j)
θk(t). (12)
Constraints (a) and (b) directly follows from the constraints for the fluid process. Constraint
(c) follows from constraint (b) and piecewise continuity of r(t). If constraint (c) were violated,
then without loss of generality we can assume that the score of queue i ∈ argmax sji(t) increases
slower than the other queues in argmax sji(t). As a result, queue i can no longer maintain its
active status with respect to server j, and the service rate rji jumps to zero right after t. Now
this in turn implies rji(t) is not right-continuous at t, contradicting the fact that rji is piecewise
continuous (our definition of piecewise continuity requires right-continuity). Thus constraint (c)
has to be satisfied.
We next propose the following definitions.
Definition 2 We refer to r(t) := (rji(t))j∈J ,i∈I as feasible routing rates or a feasible routing-
rate vector with respect to W (t) if r(t) and the associated θi(t) = ϑWi(∑
j∈J rji(t)) (i ∈ I) satisfy
conditions (4), (8), and (12).
Since both r(t) and Q(t) can be constructed from W (t), it suffices to use the HOL waiting-time
trajectory {W (t) | t≥ 0} to describe the fluid process.
Definition 3 An HOL waiting-time vector process {W (t) | [t0, t1]} is said to be the fluid process in
an M+W-BQS if there exists a queue-length vector process {Q(t) | [t0, t1]} and service process {r(t) |
t∈ [t0, t1]} such that {(W (t),Q(t), r(t)) | t∈ [t0, t1]} satisfy the constraints given in Definition 1.
We can now give compact criteria to verify a M+W-BQS fluid process to replace the multiple
equations in Definition 1. Our construction of the fluid process starting from state variables W (t)
and r(t) works by noting that W (t) and r(t) provide sufficient information to recover the other
state variables for the M+W-BQS fluid process.
Lemma 1 An a.e. differentiable HOL waiting-time vector process {W (t) | t ∈ [t0, t1]} is a fluid
process in an M+W-BQS if and only if there exists a piecewise continuous {r(t) | t ∈ [t0, t1]} such
that r(t) are the feasible routing rates with respect to W (t) for all t ∈ [t0, t1], and they satisfy
equation (10).
Proof.
Author: An Overloaded Bipartite Queueing System with Matching Cost14 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
“only if”: At the beginning of this section, we argued that rji(t) (j ∈ J , i ∈ I) must satisfy
conditions (4), (8), and (12) in order to satisfy the constraints in Definition 1.
“if”: We recover Q(t) using (6). By taking the derivative of (6), we obtain equation (9). Plugging
(10) into equation (9) then leads to (5). Conditions (4) and (8) follow from r(t) being feasible
routing rates with respect to W (t).
3. Transient Behavior of the Fluid Process
We start with a simple example to provide some intuition about the transient behavior of the fluid
process.
Example 1 Consider a system with two servers, J = {1,2}, and customers of two types, I =
{a, b}, with λa = 1, λb =10, µ1 = µ2 =0.5, and exponentially distributed abandonment time Fa(τ) =
Fb(τ) = 1− exp(−0.1τ). We assume Wa(0) =Wb(0) = 0 and L(1, a) = 0, L(1, b) =−100, L(2, a) = 0
and L(2, b) = −1. We assume that ga(WT) = gb(WT) = 2WT. According to this M+W index, at
time 0, we have s1a(0) = 0> s1b(0) =−100 and s2a(0) = 0> s2b(0) =−1 so that both servers only
serve queue a. Since the queues are overloaded, both Wa(t) and Wb(t) and all scores increase. Since
queue b does not receive any service, Wb(t) increases faster than Wa(t), and so s1b(t) and s2b(t)
increase faster than s1a(t) and s2a(t). Suppose at time t1, score s2b(t) = 2Wb(t)− 1 catches up with
s2a(t) = 2Wa(t), but that s1a(t1) = 2Wa(t) is still bigger than s1b(t1) = 2Wb(t)− 100. As a result,
after t1 server 2 will serve both b and a, and server 1 continues to serve only queue a. Because
both queues share server 2, their scores must become identical after t1, i.e., s2b(t)≡ s2a(t) for t in
some interval after t1.
Because queue b has a much larger arrival rate than queue a, eventually the tie between the
two queues for server 2 will break. That means, after some time t2 > t1, the score of queue b will
increase faster than that of queue a and s2b(t)> s2a(t) for t > t2, so servers 2 can no longer serve
both queues. Instead, after t2 server 1 and 2 will be dedicated to queues a and b, respectively. In
Figure 2 we depict the changes by showing how the routing components (the connected components
of the graph showing which servers are serving which queues) change over time. Note that it is
possible that there are further changes to the routing structure after t2 till a steady state is reached.
To provide some intuition towards how the fluid process of Section 2 approximates the original
stochastic process, consider the original stochastic process in Example 1. At t1 the difference
between the scores of queues a and queue b is small, and so at the top of the waitlist is a mixture of
customers from both queues. As a result, server 2 has to frequently switch back and forth between
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 15
(a) For t ∈ [0, t1) we have
s1a(t) > s1b(t) and s2a(t) >
s2b(t), and so both servers
serve queue a. The routing
components are {1,2, a} and
{b}.
(b) For t ∈ [t1, t2) we have
s2a(t) = s2b(t) and s1a(t) >
s1b(t), and so both queues
share server 2 and keep their
scores for server 2 tied, and
queue a is also served by server
1. The routing component is
{1,2, a, b}.
(c) For t ∈ [t2, t3) (for some
t3, possibly t3 = ∞) we have
s2a(t) < s2b(t) and s1a(t) >
s1b(t), and so queue a is served
by server 1, and queue b is
served by server 2. The rout-
ing components are {1, a} and
{2, b}.
Figure 2 How the routing components of Example 1 change over time.
the two queues. This situation also implies that the average change rates of the highest score in
the two queues must stay equal, otherwise one queue will soon dominate the other in the rankings
and prevent the other queue from getting served. However, this configuration will last only until
time t2, after which server 2 has to stop serving customers in queue a. Throughout the process,
the routing rates are not controlled by the service provider, but endogenously determined by the
highest-score-first-served rule.
Each of the three periods of Example 1 has a different routing structure. As illustrated in Figure 2
this can be represented by a bipartite graph with arcs showing which server(s) are serving which
queue(s) (referred to as basic activity in some literature, e.g., (Stolyar and Tezcan 2011)). The
connected components of this graph partition servers and queues into routing components. A more
rigorous definition of routing components is given later in Definition 5. In Example 1, t1 and t2 are
called switch times, at which the routing components change.
From the above example, we see that identifying the routing components is a key step in charac-
terizing the transient behavior of the fluid process. Once the routing components are determined,
we show below that the trajectory of Wi(t) and Qi(t) can be characterized as solutions to an
ordinary differentiable equation (ODE) subject to boundary conditions. The boundary conditions
specify at which switch times the routing components change, and consequently how the ODEs
have to be reformulated.
Author: An Overloaded Bipartite Queueing System with Matching Cost16 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
3.1. Construction of the Transient Solutions
We focus on constructing the fluid process described by the trajectory {W (t) | t≥ 0}. We identify
the rare condition under which the fluid process can have non-unique trajectories of {W (t) | t≥
0}, and in the absence of this condition show that the construction process guarantees that the
fluid process has a unique trajectory. After obtaining W (t), Q(t) can be uniquely computed from
W (t) using (6). We will show that the service rates r(t) might have multiple solutions for a given
trajectory W (t), but all the feasible solutions r(t) must lead to the same trajectories of W (t) and
Q(t). In later sections, when we refer to the unique fluid process, it always refers to the unique
state description using W (t) or Q(t).
The construction can compute the transient behavior of the fluid process over any finite horizon
[0, T ]. The choice of T depends on the time window of interest to the policy designer. For example,
the time window for developing a kidney allocation policy may be the next several decades, whereas
for the allocation of public housing it may only be the next several years.
Given W (0), we construct {W (t) | t∈ [0, T ]} via the following steps:
Step 0 Inductively suppose that we have already computed (W (t)) up to t = t0. Given W (t0), for
each server j ∈ J , construct active sets A(t0, j) according to (7) (or (EC.1) in the special case
when all queues in argmax {sji(t0) | i∈ I} are empty). Check whether the following condition
holds
∃j1 6= j2 ∈J , i∈A(t0, j1)∩A(t0, j2), such that Wi(t0) = 0 and λi(t0)<µj1(t0)+µj2(t0), (13)
where i∈A(t0, j1)∩A(t0, j2) means that the empty queue i is in the active set of both server j1
and j2. If (13) holds, Appendix EC.2 shows that the fluid process might admit more than one
trajectory of W (t), and so we terminate the construction process (because the construction
does not serve the purpose of being predictive when the trajectories are not unique). Otherwise,
proceed to Step 1.
Step 1 Given W (t0), we define and/or construct the following combinatorial structures: a rout-
ing graph G(t0), an associated parameterized network Gθ(t0), a partition of the vertices of
G(t0) into K different primitive components (Gk(t0))k=1,...,K , feasible routing-rates r(t) :=
(rji(t0))j∈J ,i∈I, and score-change rates θ(t0) := (θk)k=1,...,K . We prove structural properties
about the primitive components and feasible routing rates.
Step 2 If all score-change rates θk(t0)’s are distinct (which we call the non-degenerate case), then the
routing components in the next period starting from t0 are exactly the primitive components,
and we can skip this step and go to Step 3; otherwise, we call it the degenerate case, and we
compute the routing components (Gk(t0)) by merging certain primitive components.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 17
Step 3 For each routing component that we constructed in Steps 1–2, we formulate an ODE and
identify the boundary conditions at which the routing components change. The solution to
the ODE characterizes the trajectories {W (t) | t ∈ [t0, t1]}, where t1 denotes the next switch
time when the solution hits a boundary. We update t0← t1 and if t1 <T we go back to Step 0.
Next, we provide more details on each step.
Step 0: Appendix EC.2 shows that the fluid process is likely to have more than one trajectory
of W (t) when condition (13) is satisfied, which results in ambiguity when one tries to use the fluid
model to predict the behavior of the waitlist system. However, having empty queues is unusual in
scenarios with overloaded queues, so we expect condition (13) to hold only in a few unusual cases.
Thus we do not expect this condition to be a significant impediment to applying the fluid model for
predictive analytics. The next proposition provides an easy-to-verify condition which guarantees
condition (13) to fail, so we do not have to worry about Algorithm 1 terminating at Step 0 without
successfully constructing a unique fluid process.
Proposition 1 Suppose that λi(t) ≡ λi and µj(t)≡ µj for all i ∈ I and j ∈ J . If condition (13)
fails at state 0 (0 denotes an all-zero vector of dimension I), then condition (13) can never be
satisfied at any HOL waiting-time vector W ≥ 0.
Proof. Proof by contradiction. Suppose condition (13) holds at some W ≥ 0. Then at W , an
empty queue, say queue ℓ, must be active for at least two different server j1, j2, and λℓ <µj1 +µj2 .
Note that Wℓ = 0 because ℓ is an empty buffer, whereas Wi ≥ 0 for all i 6= ℓ. Therefore, queue ℓ
can only be more competitive at state 0 compared to at state W . So queue ℓ has to remain active
for server j1 and j2 at 0, with λℓ <µj1 +µj2 , which contradicts the assumption that condition (13)
fails at 0.
From now on, we assume that condition (13) is never satisfied. We will show that then the
construction always returns a unique fluid process {W (t) | t≥ 0}.
Step 1: We next introduce a parameterized network flow framework (see Ahuja et al. (1993)
for background on network flows) which allows us to compute feasible routing rates r(t), score
change rates θ(t), and routing components for a given W (t). (Some of these tools were used in
the concurrent paper Adan and Weiss (2014) to analyze an FCFS-BQS, but the network we deal
with is more complicated due to the M+W indexing.) The components we compute using this
framework, which we call primitive components, may need to be merged in Step 2 to get the routing
components we are looking for.
Author: An Overloaded Bipartite Queueing System with Matching Cost18 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Formally, given a fixed t and any W (t)≥ 0, we define the routing graph G(t) := (V,E(t)), with
vertex set V with nodes
V := {S}∪J ∪I ∪ {T}, (14)
where S and T are artificial source node and sink nodes, and directed arc set
E(W (t)) (or simply E(t)) := {(S, j) | j ∈J }∪ {(j, i) | i∈A(t, j)}∪ {(i, T ) | i∈ I}. (15)
We also define the following subset of E(t) by restricting to the arcs from I to J :
Eb(t) := {(j, i)∈E(t) | j ∈J , i∈ I}. (16)
Note that the arc sets E(t) and Eb(t) depends on the scores s(t), and thus W (t). Therefore, when
we later consider a different value of t, both arc sets can change, which is not the case in an FCFS-
BQS where the routing graph is invariant with time. Our routing graph is G(t) = (V,E(t)). Since
t is fixed, we often omit t and write G= (V,E) to simplify notation.
We define a sequence of (S,T )-networks (Gθ), whose capacities are parameterized by a real
number θ ∈ (−∞,+∞). To get an intuition for the meaning of θ, recall the definition of ϑWi(z)
from (11) for z =∑
j∈J rji(t). We will see that we can interpret rji(t) as a flow on arc (j, i), and so
by flow conservation z =∑
j∈J rji(t) is the flow on (i, T ). Thus we can think of ϑWi(x) as a function
whose input is a flow z on (i, T ), and whose output is a score change rate. Then its inverse, ϑ−1Wi
(θ),
is a function whose input is a target score change rate θ, and whose output is a flow z, and so we
can think of ϑ−1Wi
(θ) as the capacity of (i, T ). With this in mind, for each θ we define the capacity
for the flow on each arc e∈E as
ue(θ) =
µj when e= (S, j);∞ when e= (j, i)∈Eb;ϑ−1Wi
(θ) when e= (i, T );(17)
Since ϑWi(z) is an affine, strictly decreasing function of z, ϑ−1
Wi(·) is also affine and strictly decreasing.
It can happen that ϑ−1Wi
(θ) is negative, reflecting that even with zero service rate, the score change
rate of queue i is smaller than the target value θ. This would lead to a negative capacity on (i, T ),
which seems strange, but which occurs in other applications of parametric network flow. If one
prefers to avoid this possibly negative capacity on arc (i, T ), a standard trick is to introduce a
new arc (S, i) with capacity (ϑ−1Wi
(θ))− (the negative part of the capacity on (i, T ) in the original
graph), and assign the positive part (ϑ−1Wi
(θ))+ as the capacity of (i, T ). Our subsequent analysis
works for either formulation. The construction of Gθ is illustrated in Figure 3.
Recall that an (S,T ) cut in this network is a partition (A,B) of V such that S ∈A and T ∈B,
and every min (S,T ) cut has this form (all cuts discussed in this paper are (S,T ) cuts, and so
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 19
Figure 3 The max flow network (Gθ) with capacities parametric in θ. The parameterized upper bound ue(θ) is
labeled above each arc.
we use “cut” to mean “(S,T ) cut” from now on). We often refer to cut (A,B) as just A. It is
well-known that every max flow network has a unique minimal min cut Amin and maximal min
cut Amax. The parametric capacities in Gθ are strictly decreasing in θ on arcs (i, T ) and constant
elsewhere, which is a special case of the property called strict source-sink monotone (S-SSM) by
Granot et al. (2012). Property S-SSM implies that if A1 is a min cut for θ1, and A2 is a min cut
for θ2 > θ1, then Amax1 ⊆Amin
2 , i.e., that min cuts are nested.
When a max flow network has parametric capacities affine in θ that satisfy S-SSM, it is clear
that the graph of max flow value as a function of θ is piecewise linear with O(|V |) breakpoints
θ1 < θ2 < · · ·<θp. Denote the minimal (maximal) min cut at θ by Amin(θ) (Amax(θ)). Then we have
that both Amax(θk) and Amin(θk+1) (which could be the same) are min cuts for any θ ∈ [θk, θk+1],
which implies that Amin(θk) is a strict subset of Amax(θk) for all k. When S-SSM holds, Gallo
et al. (1989) gave an algorithm to compute all O(|V |) breakpoints and Amin(θk). Since our Gθ has
parametric capacities affine in θ that satisfy S-SSM, we can use the their algorithm to compute all
breakpoints θk and Amin(θk), k =1, . . . , p.
Our subsequent analysis also requires knowing all min cuts at each breakpoint θl. Given a max
flow X in a network, Picard and Queyranne (1982) showed how to compute a nested sequence of
min cuts Amin(θl) = Al0 ⊂ Al
1 ⊂ · · · ⊂ Alk = Amax(θl) that represent all min cuts. We insert these
between Amin(θl) and Amin(θl+1) = Amax(θl) in the sequence of Amin(θk) to get a grand nested
sequence Π of min cuts
Π= {(Ak,Bk) | {S}=A0 (A1 ( . . .(AK = V \{T}}. (18)
Author: An Overloaded Bipartite Queueing System with Matching Cost20 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
The algorithm in Gallo et al. (1989) is often referred to as “GGT” after the authors’ initials, so
we call the overall process of computing Π the GGT Algorithm. Since all cuts in Π are nested, it
contains only O(|V |) = O(I + J) =K min cuts. We re-define θk to be the value of θ where both
Ak−1 and Ak are min cuts, and so there can be several consecutive θk with the same value if a
breakpoint has more than two min cuts, which is referred to as the degenerate case. Based on Π,
we can partition the vertex set I ∪J into K subsets: Gk := Ak\Ak−1, k = 1, . . . ,K. We call each
subset Gk a primitive component.
We first summarize some useful properties of max flows in these parametric network that will
allow us to prove that routing-rates come from max flows. Recall that for cut A, δ+(A) is the
subset of arcs with their tail but not head in A (“exiting A”), and δ−(A) is the subset of arcs with
their head but not tail in A (“entering A”).
Lemma 2 Let Xk be a max flow in Gθ w.r.t. θ= θk, so that both Ak−1 and Ak are min cuts w.r.t.
Xk. Then
1. δ+(Ak−1)∩Eb = δ+(Ak)∩E
b = ∅;
2. Xke = 0 for all e∈ δ−(Ak−1 ∩E
b);
3. XkSj = µj(t) for all j ∈Gk;
4. XkiT = uiT (θ) = ϑ−1
Wi(θk) for all i∈Ak; and
5. Xk restricted to arcs of the subnetwork {S}∪Gk ∪{T} is again a max flow in the subnetwork,
and so∑
j∈Gk
µj =∑
i∈Gk
ϑ−1Wi
(θk). (19)
Proof. 1: Follows because ue =∞ for e∈Eb and no infinite-capacity arc can exit a min cut.
2. Follows because all arcs entering a min cut must have flow 0 in any max flow.
3. For j ∈Gk =Ak\Ak−1 it follows because then (S, j)∈ δ+(Ak−1) and so (S, j) must be saturated
by Xk. Note that this is true even for k=1 since A0 = {S} (since θ0 =−∞).
4. Follows because all arcs exiting a min cut must be saturated.
5. By 1 and 2 Xk puts no flow on any arc between the subnetwork and its complement, and so
Xk, and (4) above, all arcs exiting the subnetwork are saturated and so Xk must be a max flow.
Then (19) follows from 3 and flow conservation.
The properties of routing rates rji(t), score change rates θi(t), and their connections with the
breakpoints and primitive components are summarized in the next proposition. Figure 4 provides
an illustration of the construction of the primitive components and feasible routing rates.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 21
(a) The Parameterized Network,Gθ with (ue(θ)) on
each arc e.
(b) When θ =−∞, the min cut A0 = {S}. All arcs
(S, j) are in δ+(A0). (Arcs (S, j) and (i, T ) have Xe
(ue(θ)) on them, and (j, i) has just Xe.)
(c) When θ increases to θ1 = −84, both A0 and
A1 = {S,2, a} are min cuts, and arc (a,T ) is satu-
rated.
(d) When θ increases to θ2 = −18, both A1 and
A2 = {S,1,2, a, b} are min cuts, and both (a,T ) and
(b,T ) are saturated.
(e) When θ increases to θ3 =−5, both A2 and A3 =
{S,1,2,3, a, b, c} are min cuts, and all (i, T ) arcs
are saturated.
(f) The GGT Algorithm computes the three prim-
itive components indicated by dotted lines. The
inter-component arc (3, b) goes from a faster com-
ponent to a slower one, and carries zero flow.
Figure 4 Construction of primitive components on Gθ and feasible routing rates. Bold arcs are those saturated
by a max flow. The feasible routing flows are labeled above each arc.
Author: An Overloaded Bipartite Queueing System with Matching Cost22 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Proposition 2 The nested cuts Π and the sequence of breakpoints θ1 ≤ . . .≤ θK computed by the
GGT algorithm satisfy the following properties:
(a) Queues in the same primitive component Gk =Ak\Ak−1 have the same score change rate, i.e.,
θi = θk for all i ∈ Gk. Consequently, a primitive component with a smaller (larger) index k
corresponds to an equal or smaller (larger) score increment rate θk. If the score change rate
is smaller (larger), the primitive component with a smaller (larger) index is referred to as a
slower (faster) component.
We can compute θk(t) as follows:
θk(t) =(
∑
i∈Gk
λi(t−Wi(t))FCi (Wi(t))
g′i(Wi(t))
)−1 (∑
i∈Gkλi(t−Wi(t))F
Ci (Wi(t))−
∑
j∈Gkµj(t)
)
=: ΨGkW (t),
(20)
where the function ΨGkW (t) represents the score change rate for component Gk at state W (t)
when the supply fluid flows from outside into Gk at a net rate 0 (so queues in Gk are served
by its own servers at rate∑
j∈Gkµj(t)).
(b) Given W =Wi(t), the set of feasible routing-rate vectors is the following polytope:
Γ(W ) :=
r≥ 0
∣
∣
∣
∣
∣
∣
∑
{i|(j,i)∈E} rji = µj, ∀ j ∈J∑
{j|(j,i)∈E} rji = ϑ−1Wi
(θk), ∀ i∈Gk, k= 1,2, . . . ,K
rji = 0, ∀ j ∈Gk, i∈Gk′ , k 6= k′ or (j, i) /∈E
(21)
Since (21) restricted to the subnetwork {S} ∪Gk ∪ {T} w.r.t. θ = θk is just the constraints
describing a max flow Xk = {XkSj, X
kji, X
kiT | j, i∈Gk} on the subnetwork, any feasible routing-
rate vector rji can be constructed via
rji =
{
Xkji if j, i∈Gk for some k
0 if j, i not in the same Gk.(22)
(c) There are no arcs going from a slower component to a faster one. Any arc from a faster
component to a slower component must carry zero flow. As a result, no supply fluid can be sent
between different primitive components.
(d) We call H (Gk a closed subset of Gk if H contains no outgoing arc to other parts of Gk. If
H is a closed subset of Gk, then
∑
j∈H
µj <∑
i∈H
ϑ−1Wi
(θk). (23)
(e) Each primitive component Gk is connected by Eb. However, a connected component of Eb could
be the union of multiple primitive components.
Remark 1 According to Properties (a)–(c) in Proposition 2, each primitive component forms an
independent subsystem, in which queues share the same score-increment rate and are served only
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 23
by servers in the same component. Moreover, the partition of primitive components and the score
change rates in each component can be explicitly computed using the GGT algorithm, and the
feasible routing rates can then be computed from a max flow for each Gk. Properties (d) and (e)
will also be used in subsequent analysis.
Proof. To show that rji constructed in (22) are feasible routing rates, it suffices to prove that
they satisfy conditions (4), (8), and (12).
First, since the rji’s are constructed from the flows Xk, they take a positive values only for arcs
(j, i)∈Eb (or E). By the definition of E, a server is only connected to its active queues, so (j, i)∈E
if and only if i ∈A(t, j), which verifies (8).
Now (4) follows from flow conservation at node j and Lemma 2 part 3, and (12) follows from
Lemma 2 part 4 since it implies that all queues being served by server j ∈Ak have the common
score change rate of θk. This proves Properties (a) and (b). Property (c) follows from Lemma 2
parts 1 and 2.
To prove Property (d), consider the max flow Xk in Gθ w.r.t parameter θ= θk. Since max flow
Xk sends zero flow to other components, the RHS of inequality (23) exactly counts the total flow
out of subset H, whereas the LHS of inequality (23) is only part of the total flow into subset H,
and so∑
j∈H
µj ≤∑
i∈H
ϑ−1Wi
(θk). (24)
Notice that if (24) is not strict, then Ak−1 ∪H would satisfy complementary slackness w.r.t. Xk,
and so would also be a min cut. This contradicts the choice of Ak as a minimal superset of Ak−1
that is a min cut.
For Property (e), if Gk is not a connected component, then define H as one of the connected
components of Gk. Then H has no outgoing arcs to other parts of Gk, and so by the same argument
as Property (d), Ak−1 ∪H would also be a min cut for Xk, again a contradiction.
We note that different primitive components might be connected, for example, arc (1, a) in Figure
4 (f). In most cases, such inter-component arcs are from a (strictly) faster component to a slower
component, but will disappear right away because ties could not last between two queues with
different score change rate. In the degenerate case, an inter-component arc connects two primitive
components with the same θk. Then it is unclear whether this arc will remain there (so the two
primitive components get merged), or disappear and so disconnect the two components.
Author: An Overloaded Bipartite Queueing System with Matching Cost24 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Step 2: Step 1 used the GGT algorithm to generate a sequence of nested cuts Ak and break-
points θk, k= 1, . . . ,K such that both Ak−1 and Ak are min cuts at θk. Based on the values of θk
we classify the routing graph at a given state W (t) as being degenerate or non-degenerate:
Definition 4 The network at time t given W (t) is non-degenerate if θ1 < θ2 < . . . < θK, and is
degenerate if θk = θk+1 for some k= 1,2, . . . ,K− 1. Equivalently, the network is non-degenerate at
t if and only if the parameterized network Gθ has at most two different min cuts for all parameter
values θ.
If the graph is non-degenerate, the primitive components are exactly the routing components
that we are looking for; otherwise, two or more primitive components correspond to the same
breakpoint θk. If these primitive components are also connected, then we do not know which
primitive component is faster at the next moment. When multiple primitive components with the
same score change rate are connected we need an additional procedure to determine which of these
components should be merged and which should stay separate during the next time period. Our
detailed discussion of the degenerate case is in Appendix EC.4.
Step 3: Intuitively, routing components specify which subset of servers are served by which
subset of queues. We next formally define the routing components. Recall that given a vector W (t),
its associated arc set E(W (t)) is defined in (15), which connects each server j to its active queues
according to their M+W scores.
Definition 5 Given a vector-valued process {W (t) := (Wi(t))i∈I | t ∈ [t0, t1]}, if its associated arc
set E(W (t)) is invariant over (t0, t1), then the connected components of Eb(t) are the routing
components over (t0, t1).
Since routing components are defined over an open interval (t0, t1) only when the arc set stays
invariant over that interval, the routing components are also invariant over that interval. This
definition allows E(t0) and E(t1) to differ from E(t) for t ∈ (t0, t1) without affecting the routing
components.
The next Proposition proposes a sufficient and necessary condition for a given piece of vector-
valued process to be the HOL waiting-time trajectory of the fluid process. It relies on using xk(t)
to denote the total amount of score change for queues in a routing component Gk from time t0 up
to time t.
Proposition 3 Suppose {W (t) | t ∈ [t0, t1]} is a continuous vector-valued process with associated
arc set E(t)≡E for all t∈ (t0, t1), and suppose {Gk}k=1,...,K are the routing components in (t0, t1).
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 25
Then {W (t) | t ∈ [t0, t1]} is the fluid process in an M+W-BQS over [t0, t1] if and only if for each
k= 1, . . . ,K, {W (t) | t∈ [t0, t1]} is the unique solution to the following ODE:
(ODE)
dxk(t)
dt= Ψ
Gk
W (t) for k= 1,2, . . . ,K
Wi(t) = g−1i [gi(Wi(t0))+xk(t)] for all i∈ Gk, k =1,2, . . . ,K
xk(0) = 0 for , k= 1,2, . . . ,K,
(25)
where the function ΨGk
W (t) is defined as in equation (20).
The proof of this Proposition is in Appendix EC.3.
Proposition 3 says that if the arc set is invariant over interval (t0, t1), then the fluid process
{W (t) | t∈ [t0, t1]} can be uniquely determined from the ODE (EC.2). The next proposition shows
that there always exists such an interval over which the arc set stays invariant. Define arc set
Eb(t+0 ) := {(j, i) ∈Eb(t0) | j, i∈Gk for some k}, (26)
i.e., Eb(t+0 ) is the set of edges in Eb(t0) inside the same primitive component.
Proposition 4 If the parameterized network at t0 is non-degenerate, then there exists an interval
(t0, t1) such that a unique fluid process {W (t) | t∈ [t0, t1]} exists as a solution to the ODE (EC.2),
in which {Gk | k= 1, . . . ,K} are exactly the primitive components of the parameterized network at
t0. Moreover, for all t∈ (t0, t1) we have Eb(t)≡Eb(t+0 ).
Proof.
We first prove the “uniqueness” part. If {W (t) | t∈ [t0, t1]} is a fluid process, then its associated
Eb(t) must be in the form of (26) for all t∈ [t0, t1], which contains only arcs within each primitive
component Gk (k= 1, . . . ,K). We now prove subclaims (a)–(c).
(a) Eb(t)⊆Eb(t0), that is, no new arcs can be added to Eb(t).
Suppose j ∈ J , i∈ I, but (j, i) /∈Eb(t0). According to our construction of Eb(t0), we have
δji(t0) :=maxℓ∈I
sjℓ(t0)− sji(t0)> 0. (27)
Because all scores sji(t) are continuous in t, δji(t) has to be continuous in t. Therefore, there
exists an interval [t0, t1] over which δji(t)> 0. By selecting t1 sufficiently small, all (j, i) /∈Eb(t0)
will continue to stay outside Eb(t) for t∈ (t0, t1). Note that the above argument actually proves
that Eb(t) (and E(t)) is upper hemicontinuous2 at all t.
(b) All inter-component arcs in Eb(t0) will not be contained in Eb(t).
Suppose (j, i)∈Eb(t0) and j, i are in two different primitive components Gk(t0) and Gk′(t0),
respectively, with k 6= k′. Since each primitive component is connected by Eb(t0), node j in Gk
must be connected to some node i′ ∈Gk. Thus, server j is sending supply fluid to both i and
Author: An Overloaded Bipartite Queueing System with Matching Cost26 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
i′. However, since i and i′ are in two different primitive components, which have different score
change rates by the non-degenerate assumption, i and i′ must also have different score change
rates. Since arcs can only go from a faster component to a slower component, i′ must have a
larger score change rate than i and thus (j, i) has to disappear after t0.
(c) All intra-component arcs in Eb(t0) will remain in Eb(t).
Proof by contradiction. If the above claim is not true, then there is a sequence of time points
{tℓ} approaching t0 such that at each of those time points, at least one arc (j, i) ∈ Eb(t0)
disappears. Whenever this happens, the primitive component containing (j, i) will be split into
disconnected sub-components G1, G2, . . . (if that component is still connected, our argument
at the beginning of Proof of Proposition 3 shows that all queues in that component will keep
their scores changing by the same amount, but then arc (j, i) cannot disappear). Therefore,
at each tℓ, we obtain a sequence of sub-components. For each sub-component, we calculate
the score change from t0 until tℓ, which is gi(tℓ)− gi(t0), with queue i in that sub-component.
Let G1ℓ denote the sub-component which has the minimal score change. Because G1ℓ only has a
finite number of possible choices, there must be a subsequence {tℓu} such that G1ℓ ≡:H at all
of those time points, where H must be the sub-component of one of the primitive components,
say, Gk without loss of generality.
We claim that H must be a closed subset of Gk with respect to Eb(t0). Otherwise, there
would be an arc e ∈ Eb(t0) that goes from H to other parts of Gk. By our construction of
H, queues in the other sub-components of Gk must have their scores increased faster than H
during the interval (t0, tℓu), so this edge e has to remain in Eb(tℓu) for all u=1,2, . . .. However,
if e ∈ Eb(tℓu), then H must be connected to other parts of Gk at tℓu, which contradicts that
H is disconnected from other parts of Gk. Therefore, we deduce that Eb(t0) contains no arcs
from H to other parts of Gk, and H thus has to be a closed subset of Gk at t0.
For u = 1,2, . . ., let θ(tℓu) denote the score change rate of all queues i ∈H at tℓu . By our
previous definition of ϑ−1Wi(tℓu)(·), the following equality holds at all times tℓu
∑
j∈H
µj(tℓu) =∑
i∈H
ϑ−1Wi(tℓu)(θ(tℓu))→
∑
i∈H
ϑ−1Wi(t0)
(θ), when u→∞, (28)
where θ ∈ [−∞,+∞] denote the limit of θ(tℓu), which exists by right continuity. We know
that θ must be no more than θk, because otherwise, in a sufficiently small interval of t0 the
score change rate of H will be strictly larger than θk, which will not allow H as the slowest
component at time points tℓu→ t0. Thus, we deduce that θ≤ θk and consequently,
∑
j∈H
µj(t0) =∑
i∈H
ϑ−1Wi(t0)
(θ)≥∑
i∈H
ϑ−1Wi(t0)
(θk), (29)
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 27
where the equality follows from (28), and the inequality follows from that ϑ−1Wi(t0)
(θ) is decreas-
ing in θ. Inequality (29) then contradicts Property (d) in Proposition 2.
We have thus proved that over a certain interval (t0, t1), Eb(t) is exactly the set of edges inside
each primitive component that is generated by the GGT algorithm at t0. As a result, the routing
components over (t0, t1) are exactly those primitive components. Then by Proposition 3, the fluid
process, if exists, must have W (t) solving the ODE over [t0, t1] which is formulated based on these
routing components.
We next prove the “existence” part. By formulating the ODE based on the primitive components
Gk, we could solve for the HOL waiting time trajectories {W (t) | t∈ [t0, t1]}. According to the ODE,
queues in the same primitive component will keep their scores tied so will stay connected. Moreover,
we have proved in the “uniqueness” case that no new arc will appear in (t0, t1) by selecting t1
sufficiently small, and arcs between different primitive components will disappear because of the
difference in their score change rate. Therefore, the primitive components will be the connected
components with respect to Eb(t) for t∈ (t0, t1), and thus are exactly the routing components. We
then invoke Proposition 3 to show that the {W (t) | t ∈ [t0, t1]} computed from the ODE is a fluid
process.
If G(t0) is degenerate, certain inter-component edges in Eb(t0) may remain in Eb(t) over some
interval [t0, t1]. We provide a method in Appendix EC.4 to find those inter-component edges. We
also show that Eb(t) stays invariant in some interval [t0, t1] in the degenerate case. Therefore, no
matter whether the graph at t0 is degenerate or non-degenerate, we can always construct an arc
set Eb(t+0 ) and routing components (Gk) which stay invariant over a certain interval (t0, t1) for
some t1 > t0. If we want to construct the fluid process over a finite horizon interval [0, T ], we
can first solve ODE (EC.2) over [t0, T ] and obtain {W (t) | t∈ [t0, T ]}, while being aware that this
construction may not be valid because the arc set and the routing components may change at some
time t∗ ∈ (t0, T ]. We call such a t∗ a switch time. There are three types of time points that could
possibly become a switch time:
• Type-1 : At a Type-1 switch time, an arc (j, i) not in the arc set Eb(t+0 ) is added into Eb(t+0 ).
Recall that (j, i) /∈Eb(t) if and only if δji(t)> 0, where δji(t) represents the difference between the
score of queue i and an active queue of j, which can be computed using equation (27) for a given
W (t). The next type-1 switch time after t0 can thus be characterized as
δji(t) = 0 for some (j, i) /∈Eb(t+0 ). (30)
A Type-1 switch happens when the score of one routing component grows faster and catches up
with other routing components, which results leads to a re-configuration of the routing graph. In
Author: An Overloaded Bipartite Queueing System with Matching Cost28 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Example 1, t1 is a Type-1 switch time, at which the routing graph changes from Figure 2(a) to
Figure 2(b).
• Type-2 : Right after a Type-2 switch time, a connected component in E(t+0 ) splits into two or
more sub-components, and some edge(s) has to disappear. Recall that in the proof of Proposition
4, we showed that if any arc disappears, there must be a sequence of time points tℓu ↓ t, such that
there is a slowest closed subset H of some routing component Gk for which inequality (29) holds.
Thus, the Type-2 switch is characterized by the following condition,
∃ k and closed subset H ( Gk, such that∑
j∈Hµj ≥
∑
i∈Hϑ−1Wi(t)
(θk)(31)
The above condition cannot hold at t0, because Property (d) in Proposition 2 states that such a
closed subset H cannot exist in G(t0).
A type-2 switch happens when some subset of a routing component has their scores growing too
slowly (because they have too much service) to catch the other parts of the routing component and
get split off the original routing component. In Example 1, t2 is a Type-2 switch time, at which
the routing graph changes from Figure 2(b) to Figure 2(c).
• Type-3 : The discontinuous points of edge capacities {ue(t) | e ∈ E(t)} may also lead to a re-
configuration of the routing components. Since uSj(t) depends on µj(t), and uiT (t) depends on
λi(t−Wi(t)), a Type-3 switch time, which is the discontinuous point of the edge capacities, can be
characterized asλi(t−Wi(t)) is discontinuous at t for some i∈ Ior µj(t) is discontinuous at t for some j ∈J .
(32)
Since t−Wi(t) is non-decreasing in t, the number of Type-3 swich times has to be finite over any
finite interval.
The conditions we have listed above are necessary but not sufficient conditions for a switch time.
If we define
t∗ =min{t∈ (t0, T ] | (30), (31), or (32) holds}, (33)
then t∗ can be interpreted as the next time point which could possibly be a switch time. We call t∗ a
potential switch time. If such a potential switch time does not exist in [0, T ], then we know for sure
that the edge set and routing components will stay invariant in [0, T ] and thus the vector-valued
process {W (t) | t∈ (t0, T ]} constructed from the ODE is the unique fluid process by Proposition 3;
otherwise, we have to compute the new edge set and the routing component at t∗, and reformulate
the ODEs. To do that, we update t0← t∗ and repeat Steps 0–3 with the new t0 so the construction
can continue.
The entire procedure of constructing a fluid process {W (t) | t∈ [0, T ]} over a given finite interval
[0, T ] is summarized in Algorithm 1 below.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 29
Algorithm 1: Construct a Fluid Process over (0, T ].
Data: T , W (0)
Result: {W (s) | t∈ [0, T ]}
Initialize: t0← 0, W (t0)← t0
Step 0: if Condition (13) holds thenTerminate Algorithm and return “{W (s) | t∈ [0, T ]} not unique”
end
Step 1: Construct E(t) via (15). Apply the GGT Algorithm to get a sequence of breakpoints
(θk)k=1,...,K and the corresponding nested cuts Ak, and compute the primitive components
Gk =Ak\Ak−1 for k=1, . . . ,K.
Step 2: if θk−1 = θk for any k=1, . . . ,K then
Generate Gk according to Proposition 6 in Appendix EC.4
else
Gk =Gk
end
Step 3: Calculate {W (t) | t∈ [t0, T ]} as the unique solution to the ODE (EC.2). Find a
potential switch point t∗ using (33).
if t∗ exists thent0← t∗, go to Step 0
elseTerminate the Algorithm and return {W (t) | t∈ [0, T ]}.
end
Algorithm 1 presents a constructive approach to prove the existence of the fluid process and
characterizes its dynamic behavior. Unfortunately, there is no performance guarantee for Algorithm
1 because (1) the accuracy of solving an ODE with boundary condition depends on the smoothness
of functions gi and FCi ; (2) the number of Type-3 switch times depends on the number of discontin-
uous points of the input arrival rates λi(t) and µj(t); (3) the number of Type-1 and Type-2 switch
times depends on smoothness of gi and FCi . By assuming that gi and FC
i are analytic functions,
we ensure that the fluid process has only a finite number of Type-1 and Type-2 switch times over
any finite interval, so the fluid process can be constructed by Algorithm 1 over any finite horizons.
Theorem 1 Given any T > 0 and W (0)≥ 0, if Algorithm 1 does not terminate because of condition
(13), then it constructs {(W (t) | 0≤ t≤ T}, which is the unique fluid process in an M+W-BSQ.
Proof.
Author: An Overloaded Bipartite Queueing System with Matching Cost30 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Existence: Given t0, we may compute the routing component Gk in interval [t0, t1) by using
either the primitive component (non-degenerate case) or merged primitive components (degenerate
case). Identifying the routing components allows us to compute {W (t) | t∈ [t0, T )}, based on which
we can find the next switch time t∗ according to (33). If such a switch time does not exist during
[t0, T ], then it implies that over [t0, T ], the arc set stays invariant as E(t+0 ) and thus Proposition
3 states that {W (t) | t ∈ [t0, T )} is a fluid process over [t0, T ]; otherwise, we find the next switch
time t∗. By the definition of switch time, the arc set and routing components stay invariant over
[t0, t∗], and thus the constructed W (t) over [t0, t
∗] is a fluid process according to Proposition 3. It
remains to show that there is a finite number of switch points over a finite horizon [0, T ]. Then by
repeating the above procedure by a finite number of times, our construction can cover [0, T ].
Proof by contradiction. If there is an infinite number of switch times in [0, T ], then there must be
a sequence of switch points tℓ→ T ∗. We do not need to consider the Type-3 switch times because
there are only finite number of discontinuous points for λi(t) and µj(t). Note that if tℓ is a Type-1
switch time, an arc is added to E(tℓ); if tℓ is a Type-2 switch time, an arc is removed from E(t+ℓ ).
Since there are finitely many arcs to add (or to remove), at least one arc (j, i) must have been
repeatedly added and removed to the arc set at a sequence of Type-1 and Type-2 switch times,
which converge to T ∗. Because the arc set E(tℓu) has finite size, there must be two sub-sequence
of switch times {tvℓu | u = 1,2, . . .} (v = 1,2) such that E(tvℓu) ≡ Ev (v = 1,2), and (j, i) ∈ E1 but
(j, i) /∈E2.
As in the proof of Proposition 4 we have shown that the arc set E(t) is upper hemicontinuous at
all T ∗, so (j, i)∈E1 ⊆E(T ∗). If (j, i) is an intra-component edge, it must remain in a neighborhood
of T ∗ by our argument in the proof of Proposition 4, case (c); otherwise, (j, i) is an inter-component
edge. According to Proposition 6 in Appendix EC.4, an inter-component edge either remains in the
graph, or disappears in a neighborhood of t0 (by using the property that all gi(·) and FCi (·) (i∈ I)
are analytic functions), which contradicts that (j, i) appears at time points E(t1ℓu) and disappears
at E(t2ℓu) for both sequences tvℓu→ T ∗ (v= 1,2). So we proved that such a limiting point of switch
points T ∗ cannot exist, and therefore there are only finitely many switch times over any finite
horizon.
Uniqueness: Suppose {W (t) | t ∈ [0, T ]} and {W (t) | t ∈ [0, T ]} are two different fluid processes
and W (0) = W (0). Let t0 := inf{t |W (t) 6= W (t)}. We claim that W (t0) = W (t0). Otherwise, conti-
nuity ofW (t) implies thatW (t0−∆t) 6= W (t0−∆t) for some sufficiently small ∆t, which contradicts
the definition of t0. Then by Proposition 3, the fluid process is unique over an interval (t0, t1). Thus
W (t)≡ W (t) for t∈ [t0, t1), which then contradicts the definition of t0. Thus the fluid process must
be unique.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 31
3.2. Calculating Performance Metrics for a Fluid Process
The construction of the fluid process enables the social planner to calculate certain performance
metrics based on the fluid process and to evaluate M+W scoring rules. In the setting of scarce
resource allocation, we assume that the system designer is concerned with two major objectives:
efficiency (Ef) and fairness (Fa). In this section, we assume that the system designer is only inter-
ested in the performance of the BQS over a finite horizon [0, T ]. The fluid process we constructed
earlier can then be used to predict the system performance of a particular M+W indexing policy
over that horizon.
There is no universal consensus in the literature on a measure of fairness (Fa). Here we propose
one possible way to measure fairness using the variance of the likelihood of getting service across
all customers. To calculate this variance, we first note that the average likelihood of getting service
is simply the ratio of supply versus demand, that is,
P (T ) =µ
λ. (34)
Thus the variance can be computed as
Fa(T ) :=−∑
i∈I
λi
λ
(
∫ T
0
∑
j∈Jrji(t)dt
Tλi
−P (T )
)2
. (35)
Alternatively, one may measure fairness by the variance in waiting times (Zenios et al. 2000), or
by the minimum likelihood of getting service among all types of customers. Most of those fairness
metrics can be represented as functions of Wi(t) and∑
j∈J rji. The latter represents the aggregate
service rate for each queue and has a one-to-one mapping with W ′i (t) by (10). Therefore, the
performance of the system under the above fairness metrics can be determined by the fluid process
{W (t) | t∈ [0, T ]}.
Compared to the measurements of fairness, there is more agreement on measuring efficiency (Ef)
as the expected utility for all resource-customer matches over [0, T ]:
Ef(T ) :=
∑
i∈I, j∈J
∫ T
0U(j, i)rji(t)dt
µT. (36)
Given a fluid process {W (t) | t∈ [0, T ]}, the corresponding service process rates r(t) do not have
to be unique. In fact, at each W (t), the set of feasible routing rates is a polytope Γ(W (t)) which
is characterized in Proposition 2 property (b). In the proof of Proposition 3, we showed that by
specifying a certain lexicographic order over different routing-rate vectors, we can always pick the
maximum rmax(t) with respect to that order, and rmin(t) is piecewise continuous.
If we define the lexicographic order according to the entries of the utility matrix U (ties are
broken by a fixed order) as
(j, i)≻ (j′, i′) iff U(j, i)>U(j, i′), (37)
Author: An Overloaded Bipartite Queueing System with Matching Cost32 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
then the lex-max rmax(t) is also the solution to the following problem
maxr(t)∈Γ(W (t)
∑
j,i
rji(t)U(j, i). (38)
Thus, the expected utility in the fluid model is maximized by choosing the maximum routing rates
rmax(t) with respect to the lexicographic order (37). Similarly, the expected utility is minimized at
routing rates rmin(t) which is the minimal routing rate-vector with respect to that order. Using
rmax(t) and rmin(t), we can derive the best- and worst-case bounds for the efficiency level for the
BQS fluid model that can be achieved by a given M+W index.
Corollary 1 The efficiency (Ef) of an M+W-BQS fluid model, if measured by (36), has the fol-
lowing range
Ef(T )∈
[
∑
i∈I, j∈J
∫ T
0U(j, i)rmin
ji (t)dt
µT,
∑
i∈I, j∈J
∫ T
0U(j, i)rmax
ji (t)dt
µT
]
. (39)
Proof. We argued that both rmin(t) and rmax(t) are feasible routing rates and piecewise
continuous on [0, T ]. Thus, the lower and upper bounds are both attainable by rmin(t) and rmax(t),
respectively. If we define the convex combination of rmin and rmin as
rc(t) := crmin(t)+ (1− c)rmax(t),
then the mean-value theorem implies that any value between the upper and lower bounds can be
attained by rc(t) for some c∈ [0,1]. Finally, we know rc ∈ Γ(W (t)) by the convexity of Γ(W (t)), and
is piecewise continuous over [0, T ], so {rc(t) | t ∈ [0, T ]} satisfies the criteria for being the service
rates of a valid fluid process.
Once an M+W indexing rule is implemented, the selection of r(t)∈ Γ(W (t)) is determined by the
stochastic behavior of the system as argued in (Talreja and Whitt 2008), and therefore is out of the
system manager’s control. However, the system manager can eliminate some arcs from the routing
graph by modifying the corresponding matching score L(j, i), so that on the remaining graph only
the optimal r(t) is feasible. More details about this technique will be discussed in Section 4.
3.3. An Application to an Overloaded FCFS-BQS
As we noted in Section 1.1, an FCFS-BQS is a special case of M+W-BQS by defining the M+W
index as follows,
gi(WT)=WT, L(j, i) =
{
0 if (j, i)∈Eb
−∞ if (j, i) /∈Eb , (40)
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 33
where Eb denotes the set of compatible server-customer pairs in the FCFS-BQS. We can define
a fluid process in an FCFS-BQS in the same way as in an M+W BQS. A fluid process in an
FCFS-BQS is said to be globally FCFS if all queues have equal HOL waiting times at all the times
(See Lemma 1 in (Talreja and Whitt 2008)).
Talreja and Whitt (2008) explored the question of when a BQS is globally FCFS, and derive
certain sufficient conditions for global FCFS to hold at the steady state when the bipartite graph
is fully connected or sparsely connected by Eb. The machinery developed in this paper allows
us to extend the result by Talreja and Whitt (2008) by providing two sufficient and necessary
conditions for global FCFS for general bipartite graphs. We need some definitions to express the
conditions. Define λi(t,W1(t)) := λi(t−W1(t))FCi (W1(t)) for all (i ∈ I), and define λ(t,Wi(t)) :=
∑
i∈I λi(t,W1(t)). Following Cong et al. (2015) we say that the arrival rates {λi(t,W1(t))}i∈I and
{µj(t)}j∈J satisfy the Generalized Chaining Condition(GCC) if ∃ θ1 > 0, such that (1− θ1)λ(t) =
µ(t), and for all non-empty subsets A of I,
(1− θ1)∑
i∈∪j∈AI(j)
λi(t)≥∑
j∈A
µj, (41)
where I(j) denotes the set of customer types compatible with server j.
Corollary 2 Suppose a fluid process in FCFS BQS has initial state Wi(0) =W1(0) for all i ∈ I
and λ(t)>µ(t) at all t. Then the following statements are equivalent:
1. The fluid process is globally FCFS.
2. The fluid process is exactly the one constructed by Algorithm 1 using the M+W index (40),
which has Wi(t) =W1(t) for all i∈ I, and W1(t) is the unique solution to the following integral
equation
W1(t) =W1(0)+
∫ 1
0
(1−µ(t)
λ(t,W1(t)))dt. (42)
3. Given W1(t) constructed as above, the arrival rates {λi(t,W1(t))}i∈I and {µj(t)}j∈J satisfy
the GCC.
The proof of this Corollary is in Appendix EC.5.
4. Steady State of the Fluid Process
Until now we allowed arrival and service rates to vary with time. In this section, by contrast, in
order to analyze the steady state we assume throughout that the BQS has stationary arrival rates,
that is, λi(t) ≡ λi (i ∈ I), and service rates µj(t) ≡ µj (j ∈ J ). We further assume that the cdf
fi(·) has compact support interval [0,W i] with W i <∞. We make this assumption to rule out the
case when Wi(t)→+∞ and a steady state may not exist. We impose the sufficient condition that
Author: An Overloaded Bipartite Queueing System with Matching Cost34 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
we made in the Proposition 1 such that (13) fails at state 0 to make sure that the steady state is
unique.
An HOL waiting-time vector W ∗ ∈ [0,+∞)I is said to represent a steady state of the fluid process
in an M+W-BQS if and only if starting with W (0) = W ∗, the unique fluid process is given by
W (t) ≡W (0) for all t ≥ 0. A matrix r∗ := (r∗ji)i∈I,j∈J is defined as the steady-state routing-rate
vector, if r∗ is a feasible routing-rate vector at W ∗, which means that r∗ has to satisfy conditions
(4), (8), and (12).
At the moment we don’t know whether a steady state W ∗ exists, and if they do, how one
may compute them. However, we can use the definition of steady state to develop properties
W ∗ would have to satisfy. By the definition of steady state, we know that, w.r.t. steady-state
routing rates r∗ji corresponding to W ∗, the inflow rate λi = λi(FCi (W ∗
i )+Fi(W∗i )) and outflow rate
λiFi(W∗i )+
∑
j∈J r∗ji must be equal at each buffer i at the steady state, and so that
λiFCi (W ∗
i ) =∑
j∈J r∗ji, ∀i∈ I. (43)
We could also derive (43) from (11) by observing that θi =0 for all i∈ I at W ∗.
We can invert (43) to get
W ∗i = (FC
i )−1
(
∑
j∈J r∗jiλi
)
, (44)
where (FCi )−1 is well defined over domain [0,1] by its strict monotonicity. This is useful because
we know from Section 3 that routing rates correspond to flows in a bipartite max flow network.
Thus we define the max flow network G∗ := (V,E,u∗,C) as follows. Vertex set V is as defined in
(14). Arc set E is defined as
E := {(S, j) | j ∈J }∪ {(j, i) | j ∈J , i∈ I}∪ {(i, T ) | i∈ I}, (45)
i.e., E consists of all possible arcs in the (S,T )-network, which differs from the arc set E(t) that
we defined in the transient analysis. Arc capacities u∗e for each arc e are
u∗e =
µj if e= (S, j);∞ if e= (j, i);λi if e= (i, T ).
(46)
We now need to define non-linear costs on G∗ such when we compute optimal flows X∗e on G∗,
when we set r∗ji =X∗ji for all j ∈J and i∈ I and compute W ∗ via (44), we will indeed get steady-
state waiting times. When we write down the first-order conditions for optimality (see (51) below),
we discover that if X∗ji > 0 we need that the score of (j, i) is L(j, i) + gi(W
∗i ), which should come
from the negatives of the derivatives of Cji(X∗ji) and CiT (X
∗iT ) (the negative comes from the fact
that we maximize scores, but minimize the cost of flows). Thus we choose CiT (X∗iT ) such that its
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 35
derivative matches −gi(W∗i ) (with W ∗ as defined in (44)), and we set Cji(X
∗ji) so that its derivative
matches −L(j, i). Therefore when the flow on arc e is Xe, its cost is Ce(Xe), defined as
Ce(Xe) =
0 if e= (S, j);−L(j, i)Xe if e= (j, i);
−∫ Xe
0gi(
(FCi )−1( u
λi))
du if e= (i, T ).(47)
Since gi(
(FCi )−1(·)
)
is strictly decreasing in its domain [0,1], CiT (XiT ) =−∫ XiT
0gi(
(FCi )−1( u
λi))
du
has strictly increasing derivatives and is therefore strictly convex over [0, u∗iT (= λi)]. See Figure 5
for an example of this construction.
Figure 5 The (S,T )-network we construct for computing the steady state. On a few representative arcs we put
the cost function outside the parenthesis, and the capacity inside the parenthesis.
Theorem 2 below will show that this indeed works. This will imply the following algorithm for
computing a steady-state routing-rate vector r∗ and HOL waiting-time vector W ∗: We compute a
min-convex-cost flow X∗ in G∗ and set r∗ji =X∗ji for j ∈ J and i ∈ I. Then we compute W ∗ from
r∗ via (44).
After we compute W ∗, we can then define E∗ :=E(W ∗) as the arc set with respect to W ∗ using
definition (15) Theorem 2 below shows that this W ∗ is in fact the unique steady-state waiting-time
vector. Since the score change rates of all queues equal zero, condition (12) holds trivially at the
steady state. The steady-state routing-rate vector r∗ can thus be characterized by condition (4),
(8), and (43). The set of steady-state routing-rate vectors can thus be formulated as the polytope
Γ∗(W ∗) :=
r≥ 0
∣
∣
∣
∣
∣
∣
∑
i∈I rji = µj, ∀ j ∈J∑
j∈J rji = λiFCi (W ∗
i ) ∀ i∈ Irji = 0, ∀ (j, i) /∈E∗
. (48)
Author: An Overloaded Bipartite Queueing System with Matching Cost36 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
When Γ∗(W ∗) contains multiple members, the proof of Theorem 2 shows that finding one that
maximizes the total matching score,∑
j∈J ,i∈I L(j, i)r∗ji, has particular significance, as we show later
in this section. The problem of searching for such an r∗ can be formulated as the linear program
(LP)max
∑
j∈J ,i∈I L(j, i)r∗ji
s.t. r∗ ∈ Γ∗(W ∗).(49)
We call the solution to the LP in (49) a maximum steady-state routing-rate vector.
The next theorem characterizes W ∗ and a maximum steady-state routing-rate vector r∗ via a
min-convex-cost flow solution X∗ to G∗. It shows that there can be multiple steady-state routing-
rate vectors r∗, but the steady-state HOL waiting time W ∗, and thus the steady-state Q∗, must
be unique.
Theorem 2 1. The fluid process under an M+W index (1) always has a unique steady state
W ∗.
2. A flow X∗ := {X∗e | e ∈E} is a min-convex-cost flow on G∗ if and only if {Xji | j ∈ J, i ∈ I}
is a maximum steady-state routing-rate vector.
Proof. We first characterize solutions to the min-convex-cost flow problem via KKT points
(i.e., points that satisfies the above KKT conditions). To do this, we formulate the min-convex-cost
flow problem as the convex program:
min∑
i∈I,j∈J −L(j, i)Xji−∫
∑j∈J Xji
0gi(
(FCi )−1( u
λi))
du
s.t.∑
i∈I Xji = µj
Xji ≥ 0.
(50)
Let sj and νji denote the dual variables that correspond to constraints∑
i∈I Xji = µj and Xji ≥ 0,
respectively. Then a solution to problem (50) must satisfy the following KKT necessary conditions
(Nocedal and Wright 2006)
νji = sj −(
L(j, i)+ gi(
(FCi )−1(
∑j∈J Xji
λi)))
, ∀j ∈ J , i∈ I first order condition (FOC)∑
i∈I Xji = µj, ∀j ∈ J , Xji ≥ 0, ∀i∈ I, j ∈ J primal feasibilityνji ≥ 0, ∀j ∈ J , i∈ I dual feasibilityXji > 0⇒ νji = 0, ∀j ∈J , i∈ I complementary slackness.
(51)
In particular, sj can be interpreted as maxi∈I sji, the highest score with respect to server j.
If we define W ∗i from (44), then the above KKT conditions are equivalent to conditions (4), (8),
and (43). Specifically, primary feasibility is equivalent to the budget constraint (4); the FOC, dual
feasibility, and complementary slackness are equivalent to condition (8); and the definition of Wi
in (44) is equivalent to condition (43). Thus, there is a one-to-one correspondence between a KKT
point and a steady-state routing-rate vector.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 37
We now show that all KKT points X∗ have the same value of∑
j∈J X∗ji =X∗
iT for all i ∈ I, so
that all KKT points lead to the same W ∗i if constructed as in (44). We prove this by contradiction.
Suppose that X1 and X2 are two optimal min-convex-cost flows with, say, X1iT <X2
iT . To get flow
conservation at all nodes, we consider the extended network with arc (T,S) that carries the total
flow through the network. Network flow theory then says that there is a cycle D in the extended
network containing (i, T ) forward such that X1e <X2
e for all forward arcs of D, and X1e >X2
e for
all backward arcs of D, and some α> 0 such that X1+βχ(D) is an optimal flow for all 0≤ β ≤ α,
implying that
C(X1 +βχ(D))= 0 for all 0≤ β ≤ α. (52)
Cycle D includes exactly two arcs incident to T . One such arc is (i, T ). The other arc cannot
be (T,S) as a forward arc, since both X1 and X2 are max flows due to primal feasibility in (51).
Thus D must include an arc (k,T ) as a backward arc. Then all arcs in D other than (i, T ) and
(k,T ) have linear cost. This would imply from (52) that −CiT (X1iT + β)−CkT (X
1kT − β) equals
∑
{e∈D|e6=(i,T ),(k,T )} ce(X1e + βχ(D)e) for all 0 ≤ β ≤ α, i.e., that CiT (X
1iT + β) + CkT (X
1kT − β) is
linear in this interval, contradicting that CiT (XiT ) and CkT (XkT ) are strictly convex. Thus all KKT
points X∗ must indeed have the same value for X∗iT for all i ∈ I. Once we finish proving part 2,
this will prove part 1.
To find a min-convex-cost flow, it suffices to search all KKT points to find the one that mini-
mizes the total cost. Because all the KKT points have the same value of W ∗, only the first term∑
i∈I,j∈J −L(j, i)Xji varies with the choice of X in (50). Therefore, to find the min-convex-cost
flow, it suffices to select a KKT point that minimizes the first term, or equivalently maximizes the
objective in the LP (49). Since the set of KKT points is exactly the set of feasible routing rates
Γ∗(W ∗), finding a min-convex-cost flow is equivalent to solving the LP (49), which finishes the
proof of part 2.
Since min-convex-cost flow can be computed to accuracy ǫ (i.e., the best solution that is an
integer multiple of ǫ) in time polynomial in 1/ǫ (Ahuja et al. 1993), Theorem 2 provides a useful
technique to compute a maximum steady-state routing-rate vector and the unique steady state.
4.1. Searching for an M+W Index that optimizes the Steady-State Performance
Given an M+W index, by leveraging Theorem 2, we can find a unique steady state W ∗ and the set
of steady-state routing-rate vectors Γ∗(W ∗). Given W ∗ and any r∗ ∈ Γ∗(W ∗), we can then measure
efficiency (Ef) and fairness (Fa) of the system according to certain metrics, which are usually
Author: An Overloaded Bipartite Queueing System with Matching Cost38 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
functions of W ∗ and r∗. The question in this section is to find an M+W index that optimizes the
steady-state performance of the fluid process with respect to these measurements.
Suppose that we have functions EfW∗,r∗(∞) and FaW∗,r∗(∞) which measure the average system
efficiency and fairness at the steady state, respectively, which are equivalent to the metrics given
in (36) and (35) when T →∞. If we use scalar parameter η to combine these functions, then we
can formulate this question as the multi-objective planning problem
max EfW∗,r∗(∞)+ ηFaW∗,r∗(∞)s.t. W ∗ is the steady state under a certain M+W index
r∗ is the unique steady-state routing-rate vector.(53)
Theorem 2 implies that there can be multiple steady-state routing-rate vectors in general, so (53)
asks us to search for a particular M+W index under which Γ∗(r∗) is a singleton, which is the
maximum steady-state routing-rate vector. Thus, we ensure that sub-optimal steady-state routing-
rate vectors would never be used.
For general functional forms of EfW∗,r∗(∞) and FaW∗,r∗(∞), (53) is likely to be intractable.
However, for a special class of functions EfW∗,r∗(∞) and FaW∗,r∗(∞), a closed-formM+W index can
be derived under which (W ∗, r∗) solves the optimization problem (53). Specifically, we consider the
following efficiency and fairness measures, where EfW∗,r∗(∞) solely depends on r∗ and FaW∗,r∗(∞)
solely depends on W ∗,Efr∗(∞) =
∑
j∈J , i∈I U(j, i)r∗jiFaW∗(∞) =−
∑
i∈Iλi
λ
(
FCi (W ∗
i )−µ
λ
)2 (54)
Note that function Efr∗(∞) calculates the average matching utility at the steady state, and
FaW∗(∞) measures the variance in the likelihood getting service before abandoning the queue, in
which λµrepresents the mean likelihood of getting service.
We solve the multi-objective planning problem (53) in two steps.
Step 1. Solve an auxiliary problem: If we relax the second constraint in (53) by allowing multiple
feasible routing-rate vectors at the steady state, we obtain an auxiliary problem
max Efr∗(∞)+ ηFaW∗(∞)s.t. W ∗ is the steady state under a certain M+W index
r∗ is a steady-state routing-rate vector.(55)
The following Proposition derives a closed-form M+W index at which W ∗ and r∗ solve the above
auxiliary problem.
Proposition 5 Let W ∗ and r∗ denote the steady state and the maximum steady-state routing-rate
vector of the fluid process under the M+W index
sji =U(j, i)+ η2
λFi(WT). (56)
Then (W ∗, r∗) solves (55).
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 39
Proof. We establish an equivalence between the auxiliary problem (55) and the min-convex-
cost flow problem on G∗ so that we can apply Theorem 2 to construct a steady-state routing-rate
vector which solves the auxiliary problem.
If (W ∗, r∗) is a feasible solution to the auxiliary problem (55), it must satisfy the stationarity
condition (43). Plugging equation (43) into the expression of Fa∗(·) in (35) leads to
Fa∗(W ∗) = −∑
i∈Iλi
λ
(∑j∈J r∗ji
λi− µ
λ
)2
= −∑
i∈I
(∑
j∈J r∗ji)2
λλi+(µ
λ)2.
(57)
Removing the constant term (µλ)2 from the objective function of (55) will not change the optimal
solution. Thus, optimizing (55) is equivalent to solving the following problem
max∑
j∈J ,i∈I r∗jiUji−
∑
i∈I
(∑
j∈J r∗ji)2
λλi
s.t. r∗is a steady-state routing-rate vector.(58)
The constraint of the above problem requires r∗ to satisfy conditions (4), (8), and (43). Condition
(43) holds trivially because we have plugged (43) into the objective (58) to make it a function of
r∗. By removing constraint (8), we obtain the following relaxation of (58),
max∑
j∈J ,i∈I r∗jiUji−
∑
i∈I
(∑
j∈J r∗ji)2
λλi
s.t.∑
i∈I r∗ji = µj, ∀ j ∈J
(59)
If we define CSj = 0, Cji =−Uji and CiT (x) =x2
λλifor all j ∈J , i∈ I, then solving (59) is equivalent
to solving a min-convex-cost flow on network G∗(V,E,u∗,C), where V , E, and u are constructed
according to (14), (45), and (46).
Note that by defining gi(WT)=− 2λFC
i (WT), the above cost function has the alternative expres-
sion for all i∈ I
CiT (x) =x2
λλi
=−
∫ x
0
(−2
λ)u
λi
du=−
∫ x
0
gi(
(FCi )−1(
u
λi
))
du (60)
which coincides with the cost function we defined in (47). Then by invoking Theorem 2, the min-
convex-cost flow on G∗ gives a feasible routing rate vector r∗ associated with some steady state
W ∗ under the following M+W index
L(j, i)+ ηgi(WT)=L(j, i)− η2
λFC
i (WT). (61)
Note the above M+W index is equivalent to the M+W index (56) because the two M+W indices
differ by a constant 1. We propose to use (56) instead of (61) to comply with the assumption
gi(0) = 0.
Thus we have proved that (i) a min-convex-cost flow on G∗ provides an optimal solution to the
relaxed problem (59); and (ii) the min-convex-cost flow corresponds to a steady-state routing-rate
Author: An Overloaded Bipartite Queueing System with Matching Cost40 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
vector r∗ under the M+W index (63). Therefore we deduce that r∗ maximizes the problem (59),
which is a relaxation of (58). From (2), we further deduce that r∗ is feasible solution to problem
(58). Thus, r∗ must be the optimal solution to problem (58), and the equivalent optimization
problem (55).
Step 2. Removing Sub-optimal r∗s: Using the M+W index (56), one may obtain steady state W ∗
and r∗ ∈ Γ∗(W ∗) which solves the auxiliary problem (55). However, Γ∗(W ∗) may contain multiple
feasible routing-rate vectors, some of which might be sub-optimal in maximizing Efr∗(∞). The
selection of r∗ depends on the stochastic nature of the system (see discussions in Talreja and Whitt
(2008)), so it is not guaranteed that the optimal r∗ would be picked by the fluid process.
To address this issue, we modify the index formula (56) so that under the revised M+W index,
the maximum steady-state routing-rate vector will be the unique steady-state routing-rate vector.
Let rmfs denote a minimal-facet solution to the LP (49), which means that rmfs is an extreme point
of the polytope G∗(W ∗). Given rmfs, we construct a J by I perturbation matrix ∆ with
∆(j, i) =
{
−ǫ if rmfsji = 0 and (j, i)∈E∗
0 otherwise(62)
where ǫ is a positive constant. After changing the matching-score matrix from U to U +∆, any arc
(j, i) that is not used by rmfs has to be removed, because the score of queue i with respect to server
j has decreased by ǫ. By doing that, we make sure that rmfs is the unique steady-state routing-rate
vector under the modified index.
Theorem 3 Suppose W ∗ is the steady state of the fluid process under the M+W index (56) and
rmfs denotes a minimal facet solution to the LP (49). Define ∆ according to equation (62), and
define a new M+W index
sji =U(j, i)+∆(j, i)+ η2
λFi(WT). (63)
Then (W ∗, rmfs) are the unique steady-state HOL waiting-time and routing-rate vectors under the
above M+W index, and thus solve the multi-objective planning problem (53).
Proof. Since W ∗ is the steady state under the M+W index (56) and rmfs is the associated
maximum steady-state routing-rate vector, by Proposition 5 they must solve the auxiliary problem
(55), which is a relaxation of the original problem (53). We next show that (W ∗, rmfs) is feasible to
(53), and thus must also solve (53).
We first show that (W ∗, rmfs) are the HOL waiting-time and steady-state routing-rate vectors
under the above M+W index. Suppose rmfsji > 0, which implies that i is in the active set of server
j. For all other i′ 6= i in the active set of server j, their scores sj,i′ can only decrease after using the
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 41
matching score U +∆ rather than U . Thus, if rmfsji > 0, then queue i has to remain in the active set
of server j under the new index. Thus, rmfs and W ∗ still satisfy condition (8), which says server j
only serves queues in its active set. Since the values of rmfs are not changed, conditions (4) and (43)
remain valid. Thus, rmfs satisfies conditions (4), (8), and (12), and is a steady-state routing-rate
vector. Condition (43) also implies that W ∗ is a steady state under the new index, which is also
unique by Theorem 2.
It remains to show that rmfs is the unique steady-state routing-rate vector under the new index
(63). First, we note that the new arc set under (63) is given by
E∗ :=E∗\{(j, i) ∈E∗ | rmfsji = 0}. (64)
Suppose rmfs+∆r is another steady-state routing-rate vector associated with (sji) for some ∆r :=
(∆rji)j∈J ,i∈ I 6= 0. If rmfsji = 0 for some (j, i), then we know (j, i) /∈ E∗, so ∆rji = 0. If rmfs
ji = µj,
then rmfsji′ = 0 for all i′ 6= i, and thus (j, i′) /∈ E∗. Thus, (j, i) is the only arc in E∗ leaving j. Thus,
if rmfsji = µj, we must have rmfs
ji +∆rji = µj, which implies that ∆rji = 0. Thus, all inequalities
that are binding for rmfs will still be binding for rmfs + t∆, so rmfs + t∆ is on the minimal facet
which contains rmfs. By choosing |t| sufficiently small, we ensure that rmfs + t∆ lies in the relative
interior of the minimal facet. If∑
jiUji∆ji 6= 0, then for either t > 0 or t < 0, rmfs+ t∆ improves the
objective of the LP, which contradicts with the optimality of rmfs; if∑
jiUji∆ji =0, then rmfs± t∆
are both maximizers of the LP for sufficiently small t > 0. Since rmfs can be expressed as convex
combination of rmfs± t∆, it contradicts that rmfs is an extreme point of the polytope.
4.2. Convergence to Steady State
Finally, we prove that the fluid process will eventually converge to the unique steady state when
the pdf of abandoment time fi(·) has compact support set [0,W i] for some Wi(t)<∞. Therefore
it is reasonable to believe that the steady state provides an approximation to the long-run average
performance of the fluid process.
Theorem 4 Suppose W i <∞ for all i∈ I, then W (t)→W ∗ when t→∞, where W ∗ denotes the
steady state constructed as in Theorem 2.
The proof of this theorem is in Appendix EC.6.
Author: An Overloaded Bipartite Queueing System with Matching Cost42 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
5. Case Study: Public Housing Assignment in Pittsburgh
In this section we return to one of our motivating examples, i.e., the assignment of public housing
in cities, in order to demonstrate the usefulness of our model in practice. Considering this example
will allow us to show that:
1. The fluid approximation is close to the underlying stochastic system.
2. Our model makes it easy to try out different scoring policies.
3. It is easy to try out the policy implications of different wait time penalties.
5.1. Overview of the example
Most major North American cities provide affordable housing options for eligible low-income fam-
ilies, seniors, and persons with disabilities3. The housing authority (HA) in each city is responsi-
ble for the maintenance and assignment of such houses. Currently, such programs accommodate
approximately 1.2 million households4, but the supply is far from large enough to meet all the
demand for housing. Applicants to such programs typically have strict constraints on the housing
options they prefer based on location, the community type, and features of the housing unit such as
the number of rooms etc., which exacerbates the shortage for some types of housing. For example,
the waiting time for single-bedroom apartments was reported to be around 28 years in the District
of Columbia (Stockton 2013).
We illustrate the use of the tools we developed in this paper using data from the Housing
Authority of the City of Pittsburgh (HACP) that was gathered by Geyer and Sieg (2013). The
HACP requires applicants to specify their preference among the available types of units, and among
the communities where public housing is available. Houses are allotted based on availability in the
preferred community and according to the applicants’ registration dates when joining the waitlist
for housing. Therefore the waitlist can be effectively partitioned into multiple queues according to
the applicants’ selections. If we assume that the priority is first-come-first-served in each queue5,
then the waitlist system is a BQS.
We take the following steps to study the optimal scoring policies that the HACP could use to
allocate their houses. First, Section 5.2 estimates parameters of this BQS using historical data that
was obtained from the HACP and the data from the Survey of Income and Program Participation
(SIPP). Next, Section 5.3 calculates the fluid limit process for this BQS using the tools developed
in this paper. We compare the fluid limit process to the stochastic process that is simulated using
the same set of parameters. The comparison shows that the fluid model generates very robust
predictions. Lastly, Section 5.4 shows that the predictions of the our fluid model make it easy to
evaluate different scoring rules with respect to various performance measures.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 43
5.2. Parameter Estimation
Geyer and Sieg (2013) developed a discrete-choice-model to predict how low-income households
select their housing community by studying the public housing allocation system of HACP.
Although our objective in this paper is different than theirs, we use data sets from their study and
calibrate many useful system parameters based on estimates reported in their study. They iden-
tified 34 community types in the city of Pittsburgh, and further categorized them into six broad
community types, each of which consists of fairly homogeneous housing units. To make our BQS
even simpler, we consider only three of these six broad community types, denoted by PH1, PH2,
and PH3. The majority of the housing units in these three broad types are offered to non-senior
families, so we can focus on these three types independently from the other types.
Applicants specify a first choice among the three types, and are encouraged (but not required)
to add a second choice. An applicant’s choice profile can then be represented by a single number
or an ordered pair. For example, choice profile [12] means that the applicant’s first choice is PH1
and and second choice is PH2, and choice profile [2] means that the applicant’s first choice is PH2
and they did not specify a second choice.
The parameters we estimated are summarized in Table 1, and the technical details of the param-
eter estimation are in Appendix EC.7. Some of these estimates are ad-hoc, but this case study
serves the purpose of illustrating the application of the tools we developed.
Table 1 Summary of Estimated Parameters
Choice Profile (Queue) i [12] [13] [1] [21] [23] [2] [31] [32] [3]Probability 0.1013 0.0793 0.049 0.1318 0.1939 0.1061 0.0906 0.1703 0.0776
Community Type j PH1 PH2 PH3Service Rates µj 118.9 29.5 9.1
Time Window Quarter 1 and 2 LaterTotal Arrival Rate λ 1232 443
Reneging Rate d 0.07
Matching Utility U(·, ·)
Queues [12] [13] [1] [21] [23] [2] [31] [32] [3]PH1 1 1 1 0.66 -100 -100 0.66 -100 -100PH2 0.66 -100 -100 1 1 1 -100 0.66 -100PH3 -100 0.66 -100 -100 0.66 -100 1 1 1
5.3. Comparison of the Fluid Approximation and the Stochastic Process
A fluid approximation to the waitlist system assumes that both applicants and housing units arrive
according to deterministic processes. Using the machinery developed in Section 3, we can calculate
the HOL waiting times in the fluid model using the parameters reported in Table 1. To see whether
Author: An Overloaded Bipartite Queueing System with Matching Cost44 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
the fluid approximation is an accurate approximation of the underlying stochastic process, we
compare the HOL waiting times predicted by the fluid model to that in a simulated stochastic BQS
using the same set of parameters and assuming that all arrival processes are Poisson. We simulate
the HOL waiting times of this stochastic BQS by bootstrapping each applicant’s attributes from
the waitlist pool and predicting their choice profiles.
We assume that applicants on the waitlist are ranked by scoring formula (56), where the weight
η is set to be 50. According to Corollary 5, this scoring formula optimizes the steady-state per-
formance with respect to the objective function (53), where efficiency (Ef) and fairness (Fa) are
defined according to equations (36) and (35), respectively. Figure 6 plots two sets of HOL waiting
time curves over a five-year horizon (20 quarters). One set are predictions made by the fluid model,
and the other set are simulated sample paths for the stochastic BQS. The two sets of curves stay
fairly close and the fluid process captures the behavior of the simulated sample paths quite well.
In the first period, each community PHi only serves the queues that indicated PHi as their first
choice. However, this routing configuration changes quickly because scores of queues in routing
components of PH2 and PH3 increase at a higher speed than those of other queues due to the
acute shortage of housing units in PH2 and PH3. Consequently, a subset of queues served by PH2
and PH3 catch up with queues served by PH1 in their scores, and get merged into the component
of PH1. For example, at T3, queues [32] and [3] in the faster component are merged into the slower
component of PH1, which leads to the breakpoints in the waiting-time curves. We plot the curves
for five years, which covers all switch points, though it takes fifteen years in total for the fluid
process to reach the steady state. The change in waiting times is smaller than .00001 after 15
years, indicating convergence. The routing configurations in different time periods are depicted in
Figure 5.3.
We repeat the simulation 100 times and calculate the mean absolute deviation (MAD) for the
HOL waiting times in each queue, and find MADs that range from 0.097 to 0.116 quarters, which
is 8–10 days, which means that the fluid approximation is accurate up to a ten-day period. We
also observe that the fluid approximation remains accurate even for queues with thin service rates,
such as queues [32] and [3], whose throughput rate is less than 10 per quarter. This demonstrates
that fluid models for overloaded systems were quite robust in our experiments.
5.4. Policy Evaluation
Our fluid model enables the housing authority to predict the properties of the system when housing
allocations are made according to any scoring policy of the form (1) discussed in this paper.
In practice, most housing authorities try to match the applicants’ choice of community types
to the available capacity, but an unintended consequence is that some community types might
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 45
Time (Quarter)0 2 4 6 8 10 12 14 16 18 20
HO
L W
aitin
g T
ime
(Qua
rter
)
0
2
4
6
8
10
12
14
16
18
20
T1 T2 T3
[12][13][1]--F[21]--F[31]--F[2][23]--F[3][32]--F[12]--S[13]--S[1]--S[21]--S[31]--S[2]--S[23]--S[32]--S[3]--S
Figure 6 HOL Waiting Times for the nine queues predicted by Fluid Model (F) versus a Simulated Stochastic
Process (S). Note that some queues, such as queues [32] and [3], have the same HOL waiting times
curves in the fluid process. T1, T2, T3 are switch times at which the routing components change. Other
kinks on the curve are due to discontinuity of the arrival rate functions λi(t).
require a much longer waiting time than others. In practice, the queue length or waiting time is
usually unobserved by the applicants. As a result, even applicants who are somewhat indifferent
to the community type might have to wait much longer than the other applicants due to their
unintentionally selecting a housing type that corresponds to a longer queue. Such scenarios can be
minimized if a housing authority actively tries to reduce the waiting-time disparity across different
queues while aiming to match applicants’ choices. To demonstrate how a HA can accomplish this,
in our housing assignment example we assume that HACP uses a multi-objective function in the
form of (53), with efficiency (Ef) and fairness (Fa) measured by (36) and (35), respectively. We use
the optimal index formula for an infinite horizon problem from (56) as the benchmark policy for
our finite horizon problems for tractability. We then compare the performance of (56) to several
alternative formulas.
Author: An Overloaded Bipartite Queueing System with Matching Cost46 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
PH1 PH2 PH3
[13] [1] [21] [23] [2] [31] [32] [3][12]
(a) Initially, houses in each community type are allocated
only to applicants whose first choice is matched.
PH1 PH2 PH3
[13] [1] [21] [23] [2] [31] [32] [3][12]
(b) After T1 = 2.67, queue [31] starts to be served by
PH1 instead of PH3.
PH1 PH2 PH3
[13] [1] [21] [23] [2] [31] [32] [3][12]
(c) After T2 = 3.09, queue [21] starts to be served by PH1
instead of PH2.
PH1 PH2 PH3
[13] [1] [21] [23] [2] [31] [32] [3][12]
(d) After T3 = 19.15, the routing components of PH1 and
PH3 merge into a single routing component.
Figure 7 Routing configurations in each period
We estimate the matching utility matrix U is estimated in Appendix EC.7. By looking into the
fluid process, we can easily identify the routing components and feasible routing-rate vectors exist
in each time periods. For example, as shown in Figure 5.3 (d), after T3, the corresponding residual
network contains a cycle PH1→ [32]→PH3→ [3]→PH1. We could construct alternative feasible
routing rates by pushing a nonzero flow around these cycles, which would not change the matching
utility because the utility around these cycles sums to zero. In other words, different routing rates
lead to the same expected matching utility∑
j∈J ,i∈I rjiUji. Thus, in this example there is no need to
perturb the matching score L. Indeed, our numerical experiments show that different perturbations
of L lead to close performance with respect to the efficiency and fairness measurements (36) and
(35).
We next fix the matching score to be L = U and test the following parametric forms of the
waiting score:
gi(WT) = WTgi(WT) = Fi(WT)= 1− exp(−dWT)gi(WT) = (WT)2
(65)
which represents linear, concave, and convex waiting scores. For a given weight η, we calculate
the corresponding fluid process over a 20-quarter horizon. We measure efficiency using the average
matching utility (36), but use two different fairness measurements. One is the negative variance
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 47
in likelihood of getting service given by (35), and the second is the minimum likelihood of getting
service across all customers, which can be calculated by
mini∈I
∫ T
0
∑
j∈J rji(t)dt∫ T
0λi(t)dt
. (66)
By sampling different values of η, we obtain the Pareto frontiers corresponding to the three types
of waiting scores and plot them on the efficiency-fairness spectrum in Figure 5.4, where 5.4 (a)
uses the variance as fairness measure, and (b) uses the negative minimum likelihood as fairness
measure. For the first fairness measure, we have a theoretical characterization of the optimal
M+W index formula for the steady state, while the second fairness measure is much simpler and
possibly more practical in real life. Figure 5.4 (a) shows that the status quo, gi(WT)= Fi(W (t)) =
1−exp(−dW (t)) outperforms the other two forms of waiting scores with respect to the first fairness
measure; however, if the second fairness measure is used, then the convex waiting score gi(WT)=
Fi(W (t)) = 1− exp(−dW (t)) dominates the other two. Intuitively, if one wishes to lower bound
the likelihood of getting an offer, then one may consider using a convex waiting score to prioritize
queues with the smallest likelihood.
In practice, the policy designer may wish to maximize the efficiency measure solely by taking the
fairness measure as a constraint, for example, require customers in all queues to have a probability
of at least 10% to get service. Then one may look up η at which the Pareto frontier intersects the
line fairness= 10% and obtain the corresponding M+W index.
Efficiency0.87 0.88 0.89 0.9 0.91 0.92 0.93
Fai
rnes
s
-0.07
-0.065
-0.06
-0.055
-0.05
-0.045
-0.04
-0.035
-0.03Quarter=20
1-e-0.07WT
WT
WT2
(a) Fairness metric used: negative variance in the probability
of getting offer
Efficiency0.87 0.88 0.89 0.9 0.91 0.92 0.93
Fai
rnes
s
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13Quarter=20
1-e-0.07WT
WT
WT2
(b) Fairness metric used: minimum probability of getting offer
Figure 8 The Pareto frontiers when L=U and gi(WT) takes the three forms in (65)
Author: An Overloaded Bipartite Queueing System with Matching Cost48 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
6. Discussions
In this paper we propose a mechanism for allocating scarce resources, namely the M+W indexing
policy, which generalizes several well-known ranking policies, such as FCFS, static priority, and
dynamic priority. The M+W indexing policy is particularly well-suited to contexts where public
goods are allocated to several types of customers, and there are not enough servers to satisfy all
demand. The ranking criteria used by an M+W index is restricted to waiting time and the degree
of matching, because the use of other factors, such as queue length, might be criticized as being
unfair for public goods. An example is that waiting time and blood/tissue type compatibility are
the main criteria being used in the current policy for allocating cadaver kidneys. Given that the
M+W index represents an important class of priority mechanisms, we believe that it deserves more
attention from researchers in queueing and applied probability. The fluid model presented in this
paper acts as the first step to investigate this policy.
The analysis of the fluid model is complicated by both the combinatorial and dynamic nature of
the bipartite queueing system. A key observation is that the fluid process can be characterized as
solutions to ODEs in time intervals where the routing components remain invariant. To identify
the routing components, we utilize two classical combinatorial results: the nested-cut structure
of parametric min cut with SSM capacities, and the existence of a linear complementary pair.
The latter was used to resolve the case when multiple primitive components have the same score-
increment rates.
To extend our model to cover other practical applications, we intend to consider the following
extensions in the future: (1) an overloaded BQS in which each candidate has the autonomy to accept
or reject a resource when it is offered; (2) a double-end BQS in which each queue has either excessive
demand fluid or excessive supply fluid; (3) a BQS in which different server-customer combinations
lead to different service speeds instead of utility. Regarding (1), Moulin and Sethuraman (2013) and
Luss (1999) have studied the resource rationing problem in a single period in presence of customer
choice. However, we are not aware of any literature that has discussed a similar problem in the
queueing context. In fact, modeling customers’ dynamic choices can be challenging. In reference to
(2), when the priority rule is FCFS Adan andWeiss (2012) proved that the steady-state distribution
of the Markovian process has a product form, and Afeche et al. (2014) derived
closed-form results for a double-ended FCFS queue with batch arrivals. It is not clear whether
the closed-form results in these works can be generalized to the SDSS case. Model (3) is usually
discussed in the context of skill-based routing in a call center (Wallace and Whitt 2005, Gurvich
and Ward 2012). Although mathematically it is tempting to study the application of SDSS to
systems as described in (3), it is not clear whether SDSS is appealing to call-center managers given
that it is less flexible but more transparent than a dynamic routing policy.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 49
Endnotes
1. A real valued function f(x) is real analytic at x0 if there is a neighborhood of x0 where the
Taylor series∑∞
n=11n!f (n)(x0)xn converge to f(x) point-wise.
2. Suppose B is a discrete set. A correspondence E : [0,∞)→B is upper hemicontinuous at t if
there exists a neighborhood A of t such that for all t′ ∈A, E(t′)⊆E(t).
3. The rent for public housing is usually around 30% of the renter’s gross income. This can vary
depending on the size of the residence and the type of the community.
4. http://portal.hud.gov/hudportal/HUD?src=/topics/rental_assistance/phprog
5. In fact, most households with the same selection of community types will be sequenced accord-
ing to their registration date. The HACP may prioritize a small number of households with urgent
needs, but this is often negligible.
Acknowledgments
This paper has benefited from the many helpful, deep, and insightful comments from the Area Editor, and
two anonymous referees, and we thank them for their efforts. We would like to extend our sincere appreciation
to Peter Glynn and Stefanos Zenios for their many insightful comments and enormous help with this project.
We are grateful to Judy Geyer for providing much of the data we used in the case study, which is an excerpt
from the data she obtained from the HACP. We thank Professor Yunan Liu and Professor Yehua Wei for
their helpful suggestions and comments. We thank students Jiaxin Liang and Kai Wang for their substantial
help with the computations for the case study. The research of all three authors is partially supported by
their respective NSERC Discovery Grants.
References
Adan, Ivo, Gideon Weiss. 2012. Exact FCFS matching rates for two infinite multitype sequences. Opera-
tions Research 60(2) 475–489. doi:10.1287/opre.1110.1027. URL http://or.journal.informs.org/
content/60/2/475.abstract.
Adan, Ivo, Gideon Weiss. 2014. A skill based parallel service system under FCFS-ALIS-steady state, over-
loads, and abandonments. Stochastic Systems 4(1) 250–299.
Afeche, Philipp, Adam Diamant, Joseph Milner. 2014. Double-sided batch queues with abandonment: Mod-
eling crossing networks. Operations Research 62(5) 1179–1201. doi:10.1287/opre.2014.1300.
Ahuja, R. K., T. L. Magnanti, J. B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications .
Prentice-Hall, Englewood Cliffs.
Atar, Rami, Chanit Giat, Nahum Shimkin. 2010. The cµ/θ rule for many-server queues with abandonment.
Operations Research 58(5) 1427–1439.
Caldentey, Rene, Kaplan Edward H, Weiss Gideon. 2009. FCFS infinite bipartite matching of servers and
customers. Advances in Applied Probability 41(3) 695–730.
Author: An Overloaded Bipartite Queueing System with Matching Cost50 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Caldentey, Rene A, Edward H Kaplan. 2007. A heavy traffic approximation for queues with restricted
customer-server matchings. working paper .
Committee, The OPTN/UNOS Kidney Transplantation. 2011. Concepts for Kidney
Allocations. open resource URL http://www.unos.org/SharedContentDocuments/
KidneyAllocationSystem--RequestForInformation.pdf.
Cong, Shi, Yehua Wei, Yuan Zhong. 2015. Process flexibility for multi-period production systems. Available
at SSRN 2655790 .
Cottle, Richard W. 1964. Note on a fundamental theorem in quadratic programming. Journal of the Society
for Industrial & Applied Mathematics 12(3) 663–665.
Dai, JG, Tolga Tezcan. 2008. Optimal control of parallel server systems with many servers in heavy traffic.
Queueing Systems 59(2) 95–134.
Gallo, G., M. D. Grigoriadis, R. E. Tarjan. 1989. A fast parametric maximum flow algorithm and applications.
SIAM J. Comput. 18(1) 30–55. doi:10.1137/0218003. URL http://dx.doi.org/10.1137/0218003.
Geyer, Judy, Holger Sieg. 2013. Estimating a model of excess demand for public housing. Quantitative
Economics 4(3) 483–513.
Ghamami, Samim, Amy R Ward. 2013. Dynamic scheduling of a two-server parallel server system with
complete resource pooling and reneging in heavy traffic: Asymptotic optimality of a two-threshold
policy. Mathematics of Operations Research 38(4) 761–824.
Granot, F., S. T. McCormick, M. Queyranne, F. Tardella. 2012. Structural and algorithmic properties for
parametric minimum cuts. Math. Prog. 135(1-2) 337–367.
Grindlay, Andrew A. 1965. Tandem queues with dynamic priorities. OR 16(4) pp. 439–451. URL http:
//www.jstor.org/stable/3006711.
Gurvich, Itai, Amy Ward. 2012. On the dynamic control of matching queues. Preprint .
Gurvich, Itay, Ward Whitt. 2009. Scheduling flexible servers with convex delay costs in many-server service
systems. Manufacturing & Service Operations Management 11(2) 237–253.
Jackson, James R. 1960. Some problems in queueing with dynamic priorities. Naval Research Logis-
tics Quarterly 7(3) 235–249. doi:10.1002/nav.3800070304. URL http://dx.doi.org/10.1002/nav.
3800070304.
Kaplan, Edward H. 1988. A public housing queue with reneging. Decision Sciences 19(2) 383–391.
Kleinrock, Leonard, Roy P. Finkelstein. 1967. Time dependent priority queues. Operations Research 15(1)
pp. 104–116. URL http://www.jstor.org/stable/168514.
Larranaga, Maialen, Urtzi Ayesta, Ina Maria Verloop. 2014. Index policies for a multi-class queue with
convex holding cost and abandonments. ACM SIGMETRICS Performance Evaluation Review , vol. 42.
ACM, 125–137.
Author: An Overloaded Bipartite Queueing System with Matching CostArticle submitted to Operations Research; manuscript no. (Please, provide the mansucript number!) 51
Lemke, Carlton E. 1965. Bimatrix equilibrium points and mathematical programming. Management science
11(7) 681–689.
Liu, Yunan, Ward Whitt. 2011. Large-time asymptotics for the gt/mt/st+gi many-server fluid queue with
abandonment. Queueing systems 67(2) 145–182.
Liu, Yunan, Ward Whitt. 2012a. The gt/gi/st+gi many-server fluid queue. Queueing Systems 71(4) 405–444.
Liu, Yunan, Ward Whitt. 2012b. Stabilizing customer abandonment in many-server queues with time-varying
arrivals. Operations research 60(6) 1551–1564.
Luss, Hanan. 1999. On equitable resource allocation problems: A lexicographic minimax approach. Operations
Research 47(3) 361–378.
Mandelbaum, Avishai, Alexander L Stolyar. 2004. Scheduling flexible servers with convex delay costs: Heavy-
traffic optimality of the generalized cµ-rule. Operations Research 52(6) 836–855.
Mehrotra, Vijay, Kevin Ross, Geoff Ryder, Yong-Pin Zhou. 2012. Routing to manage resolution and waiting
time in call centers with heterogeneous servers. Manufacturing & service operations management 14(1)
66–81.
Moulin, Herve, Jay Sethuraman. 2013. The bipartite rationing problem. Operations Research 61(5) 1087–
1100.
Nelson, Randolph D. 1990. Heavy traffic response times for a priority queue with linear priorities. Operations
Research 38(3) pp. 560–563. URL http://www.jstor.org/stable/171370.
Netterman, A., I. Adiri. 1979. A dynamic priority queue with general concave priority functions. Operations
Research 27(6) pp. 1088–1100. URL http://www.jstor.org/stable/172085.
Nocedal, Jorge, Stephen Wright. 2006. Numerical optimization. Springer Science & Business Media.
OPTN/UNOS. 2008. Kidney Allocation Concepts: Request for Information. open resource .
OPTN/UNOS. 2015. OPTN/UNOS online Data Report. open resource URL https://optn.transplant.
hrsa.gov/data/.
Picard, Jean-Claude, Maurice Queyranne. 1982. On the structure of all minimum cuts in a network and
applications. Mathematical Programming 22(1) 121–121. doi:10.1007/BF01581031. URL http://dx.
doi.org/10.1007/BF01581031.
Schwartz, Benjamin L. 1974. Queuing models with lane selection: a new class of problems. Operations
Research 22(2) 331–339.
Sisselman, Michael E, Ward Whitt. 2007. Value-based routing and preference-based routing in customer
contact centers. Production and Operations Management 16(3) 277–291.
Stockton, Halle. 2013. Pittsburgh, allegheny county experience critical shortage of public housing. Public-
Source .
Author: An Overloaded Bipartite Queueing System with Matching Cost52 Article submitted to Operations Research; manuscript no. (Please, provide the mansucript number!)
Stolyar, Alexander L, Tolga Tezcan. 2011. Shadow-routing based control of flexible multiserver pools in
overload. Operations Research 59(6) 1427–1444.
Talreja, Rishi, Ward Whitt. 2008. Fluid models for overloaded multiclass many-server queueing systems with
first-come, first-served routing. Management Science 54(8) 1513–1527. doi:10.1287/mnsc.1080.0868.
URL http://mansci.journal.informs.org/content/54/8/1513.abstract.
Van Mieghem, Jan A. 1995. Dynamic scheduling with convex delay costs: The generalized c— mu rule. The
Annals of Applied Probability 809–833.
Visschers, Jeremy, Ivo Adan, Gideon Weiss. 2012. A product form solution to a system with multi-type jobs
and multi-type servers. Queueing Systems 70(3) 269–298.
Wallace, Rodney B, Ward Whitt. 2005. A staffing algorithm for call centers with skill-based routing. Man-
ufacturing & Service Operations Management 7(4) 276–294.
Zenios, Stefanos A., Glenn M. Chertow, Lawrence M. Wein. 2000. Dynamic allocation of kidneys to candi-
dates on the transplant waiting list. Oper. Res. 48(4) 549–569. doi:http://dx.doi.org/10.1287/opre.48.
4.549.12418.
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec1
Appendices
EC.1. The Case when All Queues in argmax Sji(t) are Empty
We discuss how to compute A(t, j) in the case when Qi(t) = 0 for every i ∈ argmax Sji(t). We
define the following sequence of index sets
Iℓ(t, j) = argmax{Sjk(t) | k ∈ I\∪ℓ−1u=1 I
u(t, j)}.
When ℓ = 1, I1(t, j) = argmax Sji(t) is the set of queues with the largest scores with respect to
server j at time t. If all buffers in I1(t, j) are already empty, and the supply fluid of type j (after
cancelling out the demand fluid flowing into buffers in I1(t, j)) still has some surplus, then the
surplus has to flow to other buffers in I2(t, j) which have the second highest scores. By repeating
this step for I1(t, j), I2(t, j), . . ., A(t, j) can be constructed as
A(t, j) =∪kℓ=1I
ℓ(t, j), (EC.1)
where k is the smallest integer with Qi(t)> 0 for some i∈ Ik(t, j) or having∑
i∈∪kℓ=1
λi(t)>µj(t).
Each buffer i ∈∪k−1ℓ=1I
ℓ(t, j) will receive supply fluid of type j at rate λi(s), which exactly cancels out
the incoming demand fluid, and the type-j supply fluid still has µj(t)−∑
i∈∪k−1ℓ=1
λi(t) fluid to serve
the other buffers. By removing all queues in ∪k−1ℓ=1I
ℓ(t, j) and replacing µj(t) by µj(t)−∑
i∈∪k−1ℓ=1
λi(t),
the system reduces to the usual cases with no empty queues with the highest scores. We can then
apply the machinery we developed in Section 3 to construct the fluid process.
EC.2. An Example of Non-Unique Trajectory Wi(t) when Condition (13) Holds
Consider a system with two servers, J = {1,2}, and three types of customers, I = {a, b, c}. Suppose
that at time t0 queue b is empty and queues a and c are non-empty. Suppose that queue b has the
highest M+W score with respect to both server 1 and server 2. Moreover, λb < µ1 + µ2, so after
cancelling out all the incoming demand fluid in buffer b, there is still some surplus of supply fluid.
Suppose that queue a has the second highest M+W score for server 1, and queue c has the second
highest M+W score for server 2. In this case, the split of the surplus of supply fluid between queue
a and queue c depends on how much supply fluid has flowed to buffer b from each of server 1 and
server 2. The service rates r1a, r1b, r2a, and r2b can be characterized by the following system of
linear equations, which has infinitely many solutions:
r1a + r1b = µ1
r2b + r2c = µ2
r1b + r2b = λb.
Since Wa and Wc depend on the service rates r1a and r2c, there will be infinitely many possible
waiting time trajectories Wa(t) and Wc(t) starting from t0.
ec2 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
EC.3. Proof of Proposition 3
Proposition 3 Suppose {W (t) | t ∈ [t0, t1]} is a continuous vector-valued process with associated
arc set E(t)≡E for all t∈ (t0, t1), and suppose {Gk}k=1,...,K are the routing components in (t0, t1).
Then {W (t) | t ∈ [t0, t1]} is the fluid process in an M+W-BQS over [t0, t1] if and only if for each
k= 1, . . . ,K, {W (t) | t∈ [t0, t1]} is the unique solution to the following ODE:
(ODE)
dxk(t)
dt= Ψ
Gk
W (t) for k= 1,2, . . . ,K
Wi(t) = g−1i [gi(Wi(t0))+xk(t)] for all i∈ Gk, k= 1,2, . . . ,K
xk(0) = 0 for , k= 1,2, . . . ,K,
(EC.2)
where the function ΨGk
W (t) is defined as in equation (20).
Proof. We first argue that all queues in the same component Gk must have the same score
change rate θk(t) for all t∈ [t0, t1]. Otherwise, we can find an (undirected) path that connects two
queues with different score change rates. Along this path, there must be a server that connects two
queues which have different score change rates. Due to the difference in their score change rates,
the tie between scores of the two queues has to break and one of the queues is then disconnected
from the server. This contradicts the assumption that E(t) is invariant over [t0, t1].
We next prove the “only if” and ”if” statement of the Proposition.
“only if”: According to Lemma 1, if (W (t), r(t)) describe the state of an M+W-BQS fluid pro-
cess, (10) must be satisfied. Thus, θk, which is the score change rate for all queues i ∈ Gk, has to
satisfy the following equation as a result of equation (11):
θk(t) = ϑWi(t)
(
∑
j∈Gk
rji(t)
)
, ∀ i∈ Gk. (EC.3)
Summing (EC.3) up over all i∈ Gk leads to equation (19), which has the unique solution
θk(t) =ΨGk
W (t). (EC.4)
Since xk(t) is the total amount of score change for queues in component Gk from time t0 up to time
t, we have xk(t)
dt= θk(t). Equation (EC.4) then leads to the first equation in the ODE. Intuitively,
the first equation characterizes the trajectory of score change in each component k given that
component k receives (or sends) no supply fluid from (to) the outside.
By the definition of xk(t), we have
xk(t) = gi(Wi(t))− gi(Wi(t0)), ∀ i∈Gk, (EC.5)
which implies that Wi(t) can be computed via the second equation in (EC.2). We have thus proved
the ODE must be satisfied by the vector-valued process {W (t) | t∈ [t0, t1]}.
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec3
“if”: We first argue that the ODE (EC.2) always admits a unique solution W (t). Notice that
the function ΨGkW (t) is continuous in t and W (t), except when t−Wi(t) is a discontinuity point of
λi(·). Since W ′i (t) ≤ 1 (the HOL waiting time of a queue increases at rate one when it receives
no service, which is the maximum possible W ′i (t)), t−Wi(t) is non-decreasing. Therefore, as λi(t)
follows our definition of piecewise continuous and Wi(t) is continuous in t, we can deduce that
λi(t−Wi(t)) is also piecewise continuous in t. Thus, function ΨGk
W (t) only has a finite number of
discontinuous points and is therefore integrable. By substituting the expression for Wi(t) in the
second equation of (EC.2) into the RHS of its first equation, we obtain an ODE of the basic form
dxk(t)
dt= f(xk(t)), where f is integrable. This ODE always has a unique solution {xk(t) | t∈ [t0, t1]}
under the boundary condition xk(0) = 0. After solving for xk(t), W (t) can be uniquely determined
using the second equation in (EC.2) for all t∈ [t0, t1].
Next, we construct a piecewise continuous feasible routing-rate vector r(t) for all t∈ [t0, t1], which
solves equation (10) together with the W (t) derived from the ODE. Lemma 1 then shows that
{W (t) | t∈ [t0, t1]} is a fluid process.
By the first equation in the ODE, at all time t ∈ [t0, t1] and all k = 1,2, . . . ,K, we have dxk
dt=
ΨGkW (t), which coincides with the expression of breakpoints (20) given in Proposition 2 property
(a). Thus, {dxk
dt| k = 1, . . . ,K} are exactly the sequence of breakpoints we obtain by applying the
GGT algorithm on G(t), and each of those breakpoints corresponds to at least two min cuts in the
sequence of nested cuts Π. Let Amax( dxk
dt) and Amin( dx
k
dt) denote the maximum and minimum min
cut at the breakpoint dxk
dt, respectively. Then Gk = Amax( dx
k
dt)\Amin( dx
k
dt) is the union of one (the
non-degenerate case) or multiple (the degenerate case) primitive components, all of which have the
same score change rate dxk
dt. Let {Gℓ | ℓ= 1, . . . ,K ′} denote the primitive components. According
to Property (b) in Proposition 2, the set of feasible routing-rate vectors r(t) can be represented by
the following non-empty linear polytope
Γ(W (t)) :=
r(t)≥ 0
∣
∣
∣
∣
∣
∣
∑
{i|(j,i)∈E} rji(t) = µj(t), ∀ j ∈J∑
{j|(j,i)∈E} rji(t) = ϑ−1Wi(t)
( dxk
dt), ∀ i∈ Gk, k= 1,2, . . . ,K
rji(t) = 0, ∀ j ∈Gℓ, i∈Gℓ′ , ℓ 6= ℓ′ or (j, i) /∈E
. (EC.6)
We next show that any feasible routing-rate vector r(t)∈ Γ(W (t)) and W (t) solve equation (10).
The second equation in the ODE implies equation (EC.5). Taking derivatives on both sides of
(EC.5) leads todxk
dt= g′i(Wi(t))W
′i (t), for all i∈ Gk. (EC.7)
Equation (EC.7) and the second constraint on r(t) in (EC.6) imply
ϑWi(t)(∑
{j|(j,i)∈E}
rji) =dxk
dt= g′i(Wi(t))W
′i (t), (EC.8)
ec4 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
which leads to equation (10) due to equation (11).
It remains to show that we can ensure r(t) to be piecewise continuous by specifying certain rules
for choosing an r(t) from Γ(W (t)). For this purpose, we write the feasible routing rates r(t) as a
JI-dimensional row vector, i.e.,
(r11(t), r12(t), . . . , r1I(t), r21(t), . . . , r2I(t), . . . , rJI(t)). (EC.9)
We then rank these vectors lexicographically, i.e., r(t)≻ r′(t) if the first non-zero entry in r− r′
is positive. This lexicographical order defines a complete order over the set of feasible routing-rate
vectors. Since the polytope Γ(W (t)) is closed, we can pick a unique lex-max rmax(s) from Γ(W (t)).
We next show that rmax(t) is piecewise continuous in (t0, t1). To this end, consider the following
maximization problem for each t∈ (t0, t1):
max∑
i∈I, j∈J (I ·J)−(jI+i)rji(t),
s.t. r(t)∈ Γ(W (t)).(EC.10)
The objective function is selected to ensure that the objective value of r in (EC.10) is strictly
larger than that of r′ if and only if r ≻ r′. As a result, rmax is the unique maximizer to the
optimization problem (EC.10), and there is a continuous bijection between the objective values
and the feasible set Γ(W (t)). Since the coefficients in the formulation of polytope Γ(W (t)) change
piecewise continuously with t (except at points of discontinuity of λi(t) and µj(t)), the optimal
value of (EC.10) (which is clearly continuous in the coefficients of Γ(W (t))) has to be piecewise
continuous in t. Thus the optimal solution rmax, as there is a continuous bijection between its
domain and the objective function values, is picewise continuous in t.
EC.4. The Degenerate Case
When G(t0) is degenerate, there are multiple primitive components corresponding to the same
breakpoint, which therefore have the same score change rate at t0 according to Proposition 2
Property (a). If some of those components are further connected, then queues in those components
not only have their current scores tied, but also have their score change rates tied. If that happens,
we cannot determine whether the inter-component arcs between those components will remain
in the arc set or disappear in the next infinitesimal period after t0, which we denote by t+0 for
simplicity. This section provides techniques to determine which of these inter-component arcs will
remain on the graph E(W (t)) in t+0 , so we can determine the routing components in t+0 .
Formally, suppose at t0 there areK primitive components {G1,G2, . . . ,GK}. We define a directed
graph G(t0) := (V (t0), E(t0)) with vertex set
V (t0) := {1,2, . . . ,K}, (EC.11)
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec5
and directed arc set
E(t0) := {(k, k′) | θk(t) = θk
′(t), and there is at least one outgoing arc
in E(t0) from Gk to Gk′}.(EC.12)
Figure EC.1 illustrates how G(t0) is constructed based on the original routing graph G(t0).
Figure EC.1 A degenerate graph with (Gk)k=1,2,3,4 (left) is represented by G (right). Each primitive component
Gk corresponds to a vertex in G. Note that vertices 4 and 1 are not connected because θ1(t) 6= θ4(t).
Graph G(t0) summarizes the information about inter-component arcs of the routing graph G(t0).
By Proposition 2 Property (c), we can index the primitive components so that arcs can only enter
a lower indexed component from a higher indexed component, and so there are no directed cycles
in G(t0). We may also construct a network flow r(t) := {re | e ∈ E} on E(t0) from any feasible
routing-rate vector r(t) by defining
ruu′(t) =∑
j∈Gu,i∈Gu′
rji(t). (EC.13)
Thus ruu′ can be interpreted as the total amount of flow sent from component Gu to Gu′ according
to some feasible routing-rate vector, so we call r a feasible inter-component routing-rate vector.
Our discussions in the proof of Proposition 4 show that if two primitive components Gk and Gk′
are disconnected at t0, or are connected but have different score change rates θk 6= θk′at t0, then
Gk and Gk′ must be disconnected at t+0 . For example, in Figure EC.1, since θ1 <θ4 at t0, the inter-
component arc (5, d) disappears at t+0 . If two components k and k′ are connected and also have
ec6 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
the same score change rate at t0, then the corresponding inter-component arcs might remain in
the graph or disappear at t+0 . Such arcs are all included in E(t0) and are candidate arcs that could
remain in t+0 , and we use E(t) to denote the inter-component arcs that remain at t and we will show
that for E(t)≡ E+ for all t∈ t+0 . If (k, k′)∈ (/∈)E+, it implies that queues in component k and k′ will
have the same (different) score change rates during t+0 , so all inter-component arcs from Gk to Gk′
have to remain (disappear) in t+0 . Let {Gu}u=1,...,K′ denote the connected components with respect
to E+. We call each Gu a super-component as it is a union of one or multiple primitive components.
For example, in Figure EC.1, if E+ = {(3,2)}, then G1 = {1}, G2 = {2,3}, and G3 = {4}. The
super-component G2 represents the merged component G2 ∪G3, which consists of servers {1,2}
and queues {a, b}.
Recall that we use ΨGk
W (t) to denote the score change rate of queues in primitive component
Gk provided that the net inflow rate for Gk equals zero. With a slight abuse of notation, we use
ΨGu
W (t)(z) to denote the score change rate if the net inflow rate for a super-component Gu ⊆ V
equals z ∈ (−∞,+∞). Therefore, if r is a feasible inter-component routing-rate vector on G, then
the score change rate of a super-component u is given by ΨGu
W (t)
(∑
e∈δ+(Gu)re(t)−
∑
e∈δ−(Gu)re(t)
)
,
where δ+(Gu) and δ−(Gu) denote the set of arcs in E that leave and enter Gu, respectively. We
then define a potential function for any two connected subsets A,B ⊆ V as
∆ΨA,B
W (t)(r(t)) :=ΨAW (t)
(
∑
e∈δ+(A)
re(t)−∑
e∈δ−(A)
re(t)
)
−ΨBW (t)
(
∑
e∈δ+(B)
re(t)−∑
e∈δ−(B)
re(t)
)
,
(EC.14)
which gives difference in the score change rates between A and B at state W (t). In the special case
when A and B themselves are connected components (so their net inflow rates are both equal to
zero), we adopt the following notation to represent their score change rate at W (t):
∆ΨA,B
W (t) := ΨAW (t)−ΨB
W (t). (EC.15)
Let |x| denote the absolute value of cumulative score change since t0. Given |x|> 0, the cumu-
lative score change in primitive component Gk is given by
xGk = sign(θk)|x|, (EC.16)
where sign(θk) denotes the sign of θk. The next Proposition provides three equivalent characteriza-
tion of E+ and {Gu}u=1,...,K . All three characterizations are based on the potential functions, but
the fluid processes are indexed by t in the first characterization, and indexed by |x| in the other two
characterizations. If scores in a connected subset A⊆ V have the same trajectory in t+0 , then for
all |x| in an infinitesimal interval (0, x) (denoted by 0+), a unique time point t(xA) exists at which
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec7
the score of queues in A has changed by xA from t0 to t(xA). Consequently, there is a one-to-one
correspondence between the score-change index xA and time index t. In the above notation, A can
be a single vertex {k} ∈ V , or a connected component Gu ⊆ V . Given |x|, the corresponding HOL
waiting-time vector W |x| := {W |x|i }i∈I and feasible inter-component routing-rate vector r|x| with
respect to E+ are given by
W|x|i := g−1
i (gi(W (t0))+xGk), for all i∈Gk, (EC.17)
and
r|x|
kk′:=
{
0 if (k, k′) /∈ E+;
rkk′(t(x{k,k′})) if (k, k′)∈ E+.
(EC.18)
Note that the definition of r|x| is always with respect to a given arc set E+. Given W |x| and r|x|,
we can define the potential function ∆ΨGu,Gu′
W |x| (r|x|) and ∆ΨGu,Gu′
W |x| (r|x|) similar to (EC.14) and
(EC.15). Because ΨGu
W |x|(r|x|) is an affine function of r|x|, ∆Ψ
Gu,Gu′
W |x| is also affine in r|x|.
Proposition 6 Suppose at W (t0) the routing graph is degenerate, and G(t0) := (V (t0), E(t0)) is
constructed according to (EC.11) and (EC.12). Then {W (t)} is the fluid process during some
interval [t0, t1] if {W (t) | t ∈ [t0, t1]} solves the ODE (EC.2) associated with the super-components
{Gu}u=1,...,K which are connected components with respect to some arc set E+ ⊆ E(t0), and it
satisfies the following properties for all t∈ (t0, t1):
• (1) If Gu can be partitioned into two subsets, G+u and G−
u such that there are only arcs in E+
going from G+u to G−
u , then
∆ΨG+
u ,G−u
W |x| ≤ 0. (EC.19)
• (2) If there are arcs in E(t0)\E+ going from Gu to Gu′ , then
∆ΨGu,Gu′
W (t) > 0. (EC.20)
Equivalently, there exists x > 0, such that for all |x| < x, and x := (xk)k∈V defined according to
(EC.16), the following properties hold.
• (1′) If Gu can be partitioned into two subsets, G+u and G−
u such that there are only arcs in E+
going from G+u to G−
u , then
∆ΨG+
u ,G−u
W |x| ≤ 0. (EC.21)
• (2′) If there are arcs in E(t0)\E+ going from Gu to Gu′ , then
∆ΨGu,Gu′
W |x| > 0. (EC.22)
ec8 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
Furthermore, Properties (1′) and (2′) hold if and only if there exists a feasible inter-component
routing-rate vector r|x| over E that solves the following linear complementary problem:
(LCP)
r|x|
kk′∆Ψ
Gk,Gk′
W |x| (r|x|) = 0 ∀ (k, k′)∈ E(t0)
r|x|
kk′≥ 0 ∀ (k, k′)∈ E(t0)
∆ΨGk,Gk′
W |x| (r|x|) ≥ 0 ∀ (k, k′)∈ E(t0)
(EC.23)
and
E+ = {(k, k′) ∈ E(t0) |∆ΨGk,Gk′
W |x| (r) = 0}. (EC.24)
Before proving the Proposition, we first provide some intuition towards Properties (1) and (2).
Property (1) is equivalent to the condition that there exists a network flow inside each Gu such
that all vertices have the same score change rate, that is,
ΨGkW (t)
(
∑
e∈δ+(k)
re(t)−∑
e∈δ−(k)
re(t)
)
=ΨGu
W (t) for all primitive components Gk ⊆ Gk. (EC.25)
In other words, Property (1) requires the primitive components contained by the same super-
component have the same score trajectory in t+0 . Property (2) states that if inter-component arcs
from Gu to Gu′ do not remain in E+, then ∆ΨGu,Gu′
W (t) > 0, which means the score change rate of Gu
must dominate that of Gu′ in t+0 .
Unfortunately, Properties (1) and (2) are difficult to verify directly, because at t0 we have no
knowledge about W (t) for t∈ t+0 . The characterization via Properties (1′) and (2′), by indexing the
fluid process using |x| instead of t, are easier to verify. We will show that the signs of ∆ΨGu,Gu′
W |x| and
∆ΨGu,Gu′
W (t) are identical in 0+ and t+0 , and thus Properties (1′) and (2′) are equivalent to Properties
(1) and (2) as a characterization of E+. Properties (1′) and (2′) are introduced to derive the LCP
characterization, which gives a general method to identify E+ in t+0 .
Proof. We first prove that Properties (1′) and (2′) are equivalent to Properties (1) and (2).
We will only prove the equivalence between (2) and (2′), as the proof of the equivalence between
(1) and (1′) is similar. Because Gu and Gu′ are connected at t0, they must have the same score
change rate at t0. If the score change rates of Gu and Gu′ are both equal to zero, then there will
be no change to their waiting times and scores, so Gu and Gu′ will stay connected and no arcs
between Gu and G)i′ will be in E(t0)\E+. In this case, Property (2) or (2′) both hold trivially.
Otherwise, we can assume that dxGu (t0)
dt= dx
Gu′ (t0)
dt> 0 without loss of generality. Then both xGu(t)
and xGu′ (t) are strictly increasing in t+0 , and their inverse functions, t(xGu) and t(xGu′ ), must both
exist in 0+.
The existence of dxGu(t)
dt> 0 in t+0 implies that t(xGu )
dxGuexists in some 0+. According to Property
(2), if arcs in E\E+ go from Gu to Gu′ , then the score of Gu increases faster than that of Gu′
so it must be that xGu(t)> xGu′ (t) for all t ∈ t+0 . That means that for all |x| ∈ 0+ it always takes
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec9
less time for Gu to reach a cumulative increment of |x| than Gu′ . Therefore, for all |x| ∈ 0+, given
xGu = xGu′ = |x| we have t(xGu)− t(xGu′ )< 0. As both Gu and Gu′ are connected components of
E(t), their score change rates are given by ΨGu
W |x| and ΨGu′
W |x| , respectively. We thus have
t(xGu )
dxGu|xGu=|x|
− dt(xGu′ )
dxGu′|xGu′=|x|
= ( dxGu(t)
dt|xGu=|x|
)−1− ( dxGu′ (t)
dt|xGu′=|x|
)−1
= (ΨGu
W |x|)−1− (Ψ
Gu′
W |x|)−1.
(EC.26)
Using the expression of ΨGuW given in (20), one may check that ΨGu
Wx is real analytical at |x|= 0
(i.e., it has a Taylor series converging to its function value in some neighborhood of 0), thus the
difference given in (EC.26) must be also real analytic at 0, and thus its sign has to stay invariant
for all |x| ∈ 0+. Because t(xGu)− t(xGu′ )< 0, we deduce that the above difference in (EC.26) has
to stay negative in 0+, which leads to (ΨGu
W |x|)−1 < (Ψ
Gu′
W |x|)−1 and thus Property (2′). A similar
argument proves the case where dxGu (t0)
dt= dx
Gu′ (t0)
dt< 0. The reverse direction (2′)⇒ (2) can also
be proved by an analogous argument.
We now prove that Properties (1′) and (2′) are equivalent to the LCP characterization. As
previously argued, Property (1′) is equivalent to the condition that there exists a network flow r|x|
that meets the demand for each vertex, where all vertices connected by E+ have the same score
change rate. This condition is equivalent to saying that given r|x|, the potential function on any
arc (k, k′) ∈ E+ equals zero. Property (2′) says that all arcs in E\E+ must connect two super-
components Gu and Gu′ , with Gu having a larger score change rate than Gu′ . This is true if and
only if all arcs (k, k′) ∈ E(t0) with k ∈ Gu and k′ ∈ Gu′ are excluded from E+, provided that the
score increment rate of k is strictly larger than that of k′ (so ∆ΨGk,Gk′
W |x| (r)> 0), which is exactly
how E+ is constructed from the LCP solution.
We next prove that W (t) is a fluid process in some interval [t0, t1] if W (t) solves ODE formulated
using the super-components {Gu}u=1,..., that satisfy Properties (1) and (2). Because W (t) is con-
structed from the ODE, which guarantees that queues in each super-component Gu have the same
score-change rate, it suffices to prove that {Gu}u=1,..., are routing components, or equivalently,
connected components with respect to arc set E(W (t)) for all t∈ t+0 . Because the super-component
is a collection of primitive components, queues in each primitive component must also have the
same score change rate. Thus, all intra-component arcs inside the same primitive components have
to remain in E(t) for t∈ (t0, t1).
For inter-component arcs, if they are inside the same super-component, then they must remain
in the graph because queues in the same super-component have the same score-change rate as
implied by Property (1); if they are connecting different super-components, say, from Gu to Gu′ ,
then the score of queues in Gu must increase faster than those in Gu′ according to Property (2).
ec10 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
Consequently, all inter-component arcs between Gu and Gu′ must disappear in t+0 . We have thus
proved {Gu}u=1,...,K are the connected components with respect to the arc set E(W (t)). Proposition
3 then implies that {W (t) | t∈ (t0, t1)} is the HOL waiting times of a fluid process.
We can further show that the arc set E+ constructed based on the solution to the LCP is
invariant for all |x| ∈ 0+. So the routing components of the fluid process in t+0 can always be
uniquely determined using (EC.24), which guarantees that the fluid process uniquely exists.
Proposition 7 Suppose at t0 the routing graph is degenerate. Then there always exists a fluid
process in t+0 whose inter-component arc set E+ is constructed from the solution to the LCP for
all |x|’s in some 0+. Furthermore, such a fluid process is unique in t+0 .
Proof.
“Existence”: It suffices to show that for each |x| ∈ 0+, the LCP has a solution, and all the
solutions at different |x|’s leads to the same E+.
For any given |x| ∈ 0+, ΨGk
W |x|(r|x|) can be expressed as an affine function of r|x| as
ΨGk
W |x|(r|x|)
=
(
∑
i∈Gk
λi(t−W|x|i
)FCi (W
|x|i
)
g′i(W
|x|i
)
)−1(
∑
i∈Gkλi(t−W
|x|i )−
∑
j∈Gkµj(t)+
∑
e∈δ+(Gk)re−
∑
e∈δ−(Gk)re
)
= ΨGk
W |x| +(∑
e∈δ+(Gk)re−
∑
e∈δ−(Gk)re)Φ
Gk
W |x| ,(EC.27)
where ΨGk
W |x| was defined in (20), and
ΦGk
W |x| :=
(
∑
i∈Gk
λi(t−W|x|i
)FCi (W
|x|i
)
g′i(W
|x|i
)
)−1
. (EC.28)
Define ΨW |x|(r|x|) := {ΨGk
W |x|(r|x|)}k=1,...,K , ΨW |x| := {Ψ
Gk
W |x|}k=1,...,K , and ΦW |x| := {ΦGk
W |x|}k=1,...,K .
Let P denote the vertex-arc incidence matrix of the directed graph G(t0), and let Diag (ΦW |x|)
denote the K-by-K diagonal matrix with diagonal ΦW |x| . Equation (EC.27) can be expressed in
vectorized form as
ΨW |x|(r|x|) =Diag (ΦW |x|)P r|x|+ΨW |x| . (EC.29)
The potential-function vector, defined as ∆ΨW |x|(r|x|) := {ΨGk,Gk′
W |x| (r|x|)}(k,k′)∈E(t0), can then be
expressed as
∆ΨW |x|(r|x|) = P TΨW |x|(r|x|) = P TDiag (ΦW |x|)P r|x|+P TΨW |x| . (EC.30)
Using the above equation, the LCP can be expressed in its vectorized form
(r|x|)T (P TDiag (ΦW |x|)P r|x|+P TΨW |x|) = 0r|x| ≥ 0
P TDiag (ΦW |x|)P r|x|+P TΨW |x| ≥ 0,(EC.31)
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec11
According to classical results on LCP (Cottle 1964, Lemke 1965), the LCP has a solution if (a)
the Hessian matrix P TDiag (ΦW |x|)P is symmetric positive-semidefinite; and (b) there is a pair of
vectors r|x| and ∆ΨW |x|(r|x|) which are both non-negative (not necessarily complementary). (a)
is straightforward by the non-negativity of ΦW |x| . (b) can be proved by constructing r|x| in the
following way: We first set r|x|e = 0 for all e ∈ E(t0). We check if there is any arc (k0, k1) ∈ E(t0)
corresponding to a negative potential ∆ΨGk0
,Gk1
W |x| (r|x|) < 0. If such an arc exists, we then push a
positive flow along a directed path (k0, k1, . . . , kℓ) with kℓ being a leaf node (having no outgoing
arcs). Such a path always exists because G(t0) contains no directed cycle. By pushing such a positive
flow, it increases the score change rate of Gk0 (ΨGk
W |x|(r|x|)) and decreases that of Gkℓ
(ΨGkℓ
W |x|(r|x|)),
whereas the score change rates of vertices in the middle of the path stay invariant as their net
inflow rates have not changed. As a result, pushing this positive flow along this path increases the
potential on arc (k0, k1) while keeping the potentials on all other arcs non-decreasing. By pushing
a sufficiently large amount of flow along the path, it can make ∆ΨGk0
,Gk1
W |x| (r|x|)≥ 0. We then repeat
the above procedure until all arcs have a non-negative potential value, which then leads to a pair
of nonnegative vectors, r and ∆ΨGk0
,Gk1
W |x| (r|x|).
At each |x|, the LCP may have multiple complementary pairs, each of which is non-negative
and minimizes (r|x|)T (P TDiag (ΦW |x|)P r|x| + P TΨW |x|). Since P TDiag (ΦW |x|)P is positive semi-
definite, a feasible solution has to take the form r|x|+∆r, where r|x| is any feasible solution, and ∆r
lies in the null space of P TDiag (ΦW |x|)P , or equivalently the null space of P , which means ∆r has
to be a circulation over the underlying undirected graph of G(t0). Since P TDiag (ΦW |x|)P∆r= 0,
we deduce that the vector of potential functions, ∆ΨW |x| , has to be unique. Therefore, the arc set
E+ constructed based on ∆ΨW |x| must be unique at each |x|.
Finally, we argue that the arc set E+ constructed using (EC.24) has to be invariant for different
|x| ∈ 0+. Without loss of generality, we assume that for a sequence of points |x|ℓ→ 0 (ℓ→∞), the
LCP solution leads to the same arc set E+. Then by the equivalence between the LCP characteri-
zation and Properties (1′) and (2′), we deduce that if there are arcs in E(t0)\E+ going from Gu to
Gu′ , then ΨGu,Gu′
W |x|ℓ> 0. Because the sign of Ψ
Gu,Gu′
W |x|ℓ> 0 stays invariant in 0+, Property (2′) holds
for all |x| ∈ 0+. Property (1′) can be proved by the same logic. Thus, Properties (1′) and (2′) hold
for all |x| ∈ 0+, which implies that the LCP leads to the same E+ at all |x| ∈ 0+ by the equivalence
between Properties (1′) and (2′) and the LCP characterization.
Since for all |x| ∈ 0+, the solution to the LCP leads to E+, we deduce that E+ satisfies the LCP
characterization in Proposition 6. Then Proposition 6 guarantees that the fluid process exists.
“Uniqueness”: As we showed in Proposition 4, ifW (t) is a fluid process, then in t+0 the associated
arc set E(W (t)) must contain all intra-component arcs as well as inter-component arcs that remain
in G. Let E(t) denote the inter-component arcs that remain in G at time t. We next show that
ec12 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
E(t)≡ E+ for t∈ t+0 and E+ has to satisfy Properties (1) and (2). We prove this by contradiction.
Suppose there exists a sequence of times {tℓ | ℓ= 1,2, . . .} that converge to t0 from the right, and
E(tℓ) 6= E+ for all ℓ. Then we can find a sub-sequence {tℓh | h= 1,2, . . .} at which E(tℓh)≡ E′ 6= E+.
Since the arc set E+ that satisfies both Property (1) and (2) is unique, E′ cannot satisfy both
Property (1) and (2). If E′ does not satisfy (2), we can find two subsets Gu and Gu′ , such that
ΨGu,Gu′
W (tℓh) ≤ 0, which then implies Ψ
Gu,Gu′
W (t) ≤ 0 for all t ∈ t+0 by the “invariant sign” property of
ΨGu,Gu′
W (t) . However, if the score of Gu cannot increase as fast as that of Gu′ , it is impossible that
Gu and Gu′ are disconnected in t+0 because Gu can send flow to Gu′′ to make their score change
rates equal. Using the same logic, we can derive a contradiction if E′ does not satisfy (1). Thus we
show that E(t)≡ E+, which means the connected components {Gu}u=1,...,K with respect to E+ are
the routing components in t+. By invoking Proposition 3, we have proved that the fluid process is
unique, which is the unique solution to the ODE (EC.2) formulated based on {Gu}u=1,...,K .
EC.5. Proof for Corollary 2
Corollary 2 Suppose a fluid process in FCFS BQS has initial state Wi(0) =W1(0) for all i ∈ I
and λ(t)>µ(t) at all t. Then the following statements are equivalent:
1. The fluid process is globally FCFS.
2. The fluid process is exactly the one constructed by Algorithm 1 using the M+W index (40),
which has Wi(t) =W1(t) for all i∈ I, and W1(t) is the unique solution to the following integral
equation
W1(t) =W1(0)+
∫ 1
0
(1−µ(t)
λ(t,W1(t)))dt. (EC.32)
3. Given W1(t) constructed as above, the arrival rates {λi(t,W1(t))}i∈I and {µj(t)}j∈J satisfy
the GCC.
Proof.
1⇒ 2: If the fluid process is globally FCFS, then Wi(t)≡W1(t) for all i∈ I. Thus, gi(Wi(t)) =
Wi(t) =W1(t) and θi(t) = θ1(t) for all i∈ I. Because all queues always have the same waiting score,
according to the M+W index we defined in (40), i ∈A(t, j) if and only if j and i are compatible,
that is, (j, i)∈Eb. Thus we deduce that Eb(W (t))≡Eb. Since V is connected by Eb, V is thus the
routing component of the fluid process at all t. Thus θ1(t) can be computed as
θ1(t) = ΨVW (t)
=(
∑
i∈V
λi(t−Wi(t))FCi (Wi(t))
g′i(Wi(t))
)−1 (∑
i∈Vλi(t−Wi(t))F
Ci (Wi(t))−
∑
j∈Vµj(t)
)
= 1− µ(t)
λ(W1(t))
(EC.33)
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec13
where the second equality follows from g′i(Wi(t)) = 1 and our definition of λi(t) and λ(t). By
Proposition 3, the unique fluid processW1(t) can be constructed by solving the ODE (EC.2), which
leads to
g1(W1(t)) = g1(W1(0))+
∫ t
0
θ1(t)dt=W1(0)+
∫ t
0
(
1−µ(t)
λ(W1(t))
)
dt. (EC.34)
Because Wi(t) =W1(t) = g1(W1(t)) for all i ∈ I, the above equation leads to the integral equation
for W1(t) as given in (EC.32).
Finally, by our assumption that λ(t) > µ(t), we know that λ(t,0) = λ(t) > µ(t), so W ′1(t) =
1− λ(t,W1(t))
µ(t)is positive whenever W1(t) is sufficiently close to zero. Thus, the solution to the ODE
(EC.32) must be always positive, which implies that the condition (13) in Step 0 always fails and
the above solution can be successfully constructed by Algorithm 1.
2⇒ 1: This direction is straightforward by the definition of global FCFS.
2⇒ 3: Suppose {W (t) | t ≥ 0} is constructed by Algorithm 1. By above argument, the score
change rates for all queues is given by θ1(t) which can be computed as (EC.33). Since servers
in A can only serve queues in ∪j∈AI(j), but queues in ∪j∈AI(j) can receive supply fluid from
other servers, we have∑
∪j∈AI(j)
∑
j∈J rji(t)≥∑
∪j∈AI(j)
∑
j∈Arji(t) =
∑
j∈Aµj(t). Thus, for queue
i∈∪j∈AI(j), we have
θi(t) =(
∑
i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))
)−1
×(
∑
i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))−
∑
i∈∪j∈AI(j)
∑
j∈J rji(t))
≤(
∑
i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))
)−1(∑
i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))−
∑
j∈Aµj(t)
)
= 1−∑
j∈A µj(t)∑
i∈∪j∈AI(j) λi(t,Wi(t))
(EC.35)
which leads to inequality (41).
3⇒ 2: Proof by contradiction. Suppose 2 fails. Then the routing component cannot be always
V ; otherwise, Algorithm 1 must have constructed a fluid process W (t) that solves the ODE (EC.2)
with routing component V , and thus the integral equation (EC.32). If V is not always the routing
component, it means at some time t, some queues in I have θi(t) 6= θ1(t), and the edge set E(W (t))
is disconnected at t. We can thus find a connected component H whose score change rate θi 6= θ1.
Let A = H ∩ J denote the set of servers in H, then the set of queues in H are exactly the set
∪j∈AI(j) by H being a connected component at E(W (t)). Then for all queues i∈H, we have
θi =ΨHW (t) =1−
∑
j∈Aµj(t)
∑
i∈∪j∈AI(j) λi(t−Wi(t))FCi (Wi(t))
. (EC.36)
If θi < θ1, then the above inequality leads to 1−θ1 <∑
j∈A µj(t)∑
i∈∪j∈AI(j) λi(t−Wi(t))FCi
(Wi(t)), which contradicts
(41), so we proved 3⇒ 2. If θi > θ1, then there must exists another connected componentH ′ ⊆ V \H,
and all queues i∈H ′ have θi <θ1. Then the same argument can be used to get a contradiction.
ec14 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
The terminology of GCC follow from recent work by Cong et al. (2015), in which the Generalized
Chaining Condition (GCC) is proposed to provide a performance bound for a max-weight routing
policy in a non-overloaded BQS. Their definition of GCC slights differs from ours as they require
the existence of θ1 > 0 such that for all non-empty strict subsets A⊆J , λi+θ11 (1 denotes the all-
ones vector) satisfies the inequality (41), with ≥ replaced by < in their condition as they consider
the non-overloaded case.
EC.6. Proof for Theorem 4
Theorem 4 Suppose W i <∞ for all i∈ I, then W (t)→W ∗ when t→∞, where W ∗ denotes the
steady-state waiting times constructed as in Theorem 2.
Proof. According to Theorem 2, a unique HOL waiting-time vector W ∗ always exists. We
define ∆gi(t) := gi(Wi(t))− gi(W∗i ) as the difference in the scores of queue i between at time t and
at the steady state W ∗. Define I(t) := argmax{∆gi(t) | i ∈ I}, and ∆g(t) := max{∆gi(t) | i ∈ I};
similarly, define I := argmin{∆gi(t) | i ∈ I}, and ∆g(t) := min{∆gi(t) | i ∈ I}. We want to show
that for any ǫ > 0, there exists a T > 0 such that for all t≥ T , ∆g(t)≤ ǫ (or ∆gi(t)≥ −ǫ). Note
that the derivative may not exist at countably many points but that will not affect our subsequent
analysis.
We next prove the ∆g(t)≤ ǫ case. The main idea is to show that the supply fluid that flows into
any queue in I(t) at time t will be no less than that at the steady state. Consequently, the score
change rates of queues in I(t) will be smaller than that at the steady state and thus has to be
negative.
For any queue i∗ ∈ I(t), consider a connected component A(t) at time t (with respect to E(t))
that contains queue i∗. Let A(t) ∩ I(t) denote the set of queues in I contained by A(t), and let
J ∗(A(t)∩ I(t)) denote the set of servers connected to queues in A(t) ∩ I(t) at the steady state,
and let J t(A(t)∩I(t)) denote the set of servers that are serving queues in A(t)∩I(t) exclusively
at time t. We next show that J∗(A(t)∩I(t))⊆J t(A(t)∩I(t)). Intuitively, that is because queues
in I(t) have the highest difference between their scores at time t and at the steady state, so at
time t, they should be more competitive than at the steady state.
Formally, because a server j ∈ J ∗(A(t) ∩ I(t)) must be connected to some queue i ∈ I(t), it
means for for any queue i /∈ I,
gi(W∗i)+L(j, i)≥ gi(W
∗i )+L(j, i) (EC.37)
According to the definition of I(t), we have
∆gi(t) = gi(Wi(t))− gi(W∗i)>∆gi(t) = gi(Wi(t))− gi(W
∗i ). (EC.38)
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec15
Inequality (EC.37) and (EC.38) together imply that
gi(Wi(t))+L(j, i)> gi(Wi(t))+L(j, i). (EC.39)
The above inequality implies that server j ∈ J ∗(A(t)∩ I(t)) is only connected to queues in I(t),
thus by definition we deduce that j ∈ J t(A(t)∩ I(t)). So we have shown that J ∗(A(t)∩ I(t))⊆
J t(A(t)∩I(t)).
Since all queues in A(t)∩I(t) have the same score change rate as queue i∗, using an analogous
logic as we used to derive (11), we can derive the following expression for θi∗(t)
θi∗(t) =
∑
i∈A(t)∩I(t)
λiFCi (Wi(t))
g′i(Wi(t))
−1
∑
i∈A(t)∩I(t)
λi−∑
j∈J
rji(t)
. (EC.40)
Because∑
j∈J
rji(t)≥∑
j∈J t(A(t)∩I(t))
µj ≥∑
j∈J ∗(A(t)∩I(t))
µj ≥∑
j∈J
r∗ji(t),
we can further lower bound the RHS of (EC.40) as
θi∗(t) =(
∑
i∈A(t)∩I(t)
λiFCi (Wi(t))
g′i(Wi(t))
)−1(∑
i∈A(t)∩I(t) λi−∑
j∈J rji(t))
≤(
∑
i∈A(t)∩I(t)
λiFCi (Wi(t))
g′i(Wi(t))
)−1(∑
i∈A(t)∩I(t) λiFCi (Wi(t))−
∑
j∈J r∗ji
)
=(
∑
i∈A(t)∩I(t)
λiFCi (Wi(t))
g′i(Wi(t))
)−1(∑
i∈A(t)∩I(t) λiFCi (Wi(t))−λiF
Ci (W ∗
i (t)))
(EC.41)
where the last equality follows from the stationary condition (43). Since W ∗i <Wi(t) ≤W i, and
fi:= inf{fi(x) | x∈ [0,W i]}> 0, we have
∑
i∈A(t)∩I(t) λiFCi (Wi(t))−λiF
Ci (W ∗
i (t))
<∑
i∈A(t)∩I(t) λi(FCi (Wi(t))−FC
i (W ∗i ))
(
= λi
∫W∗i
Wi(t)fi(τ)dτ
)
< −∑
i∈A(t)∩I(t) λi(Wi(t)−W ∗i )f i
< −λi∗(Wi∗(t)−W ∗i∗)f i∗
(EC.42)
Since gi is strictly increasing over its compact domain [0,W i] for each i ∈ I, we can find c, c > 0
such that
c≤ g′i(τ)≤ c, for all i∈ I, τ ∈ [0,W i]. (EC.43)
Consequently, the term(
∑
i∈A(t)∩I(t)
λiFCi (Wi(t))
g′i(Wi(t))
)−1
can be lower bounded by
c(
∑
i∈A(t)∩I(t) λiFCi (Wi(t))
)−1
. When ∆g(t) = gi∗(Wi∗(t)) − gi∗(W∗i∗) ≥ ǫ, (EC.41) and (EC.42)
imply that
θi∗(t) <(
∑
i∈A(t)∩I(t)
λiFCi (Wi(t))
g′i(Wi(t))
)−1(∑
i∈A(t)∩I(t) λiFCi (Wi(t))−λiF
Ci (W ∗
i (t)))
< −c(
∑
i∈A(t)∩I(t) λiFCi (Wi(t))
)−1
λi∗(Wi∗(t)−W ∗i∗)f i∗
< −c(
∑
i∈A(t)∩I(t) λiFCi (Wi(t))
)−1
λi∗gi(Wi∗ (t))−gi(W
∗i∗ )
cfi∗
= −C(gi∗(Wi∗(t))− gi∗(W∗i∗))
< −Cǫ
(EC.44)
ec16 e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost
for some constant C > 0 which does not depend on t nor the choice of i∗ ∈ I(t). Thus, ∆g′i∗(t) =
θi∗(t)≤ −ǫ for all t and all i∗ ∈ I(t) when ∆g(t)≥ ǫ, and consequently ∆g′(t)≤max{∆g′i(t) | i ∈
I(t)}<−Cǫ. Moreover, as long as ∆g(t)> 0, inequality (EC.41) and (EC.42) imply that ∆g′(t)≤ 0.
Therefore, ∆g(t) is always decreasing, and its decreasing rate is at least Cǫ when ∆g(t)≥ ǫ. Thus,
for sufficiently large t, we have
∆g(t) =∆g(0)+
∫ t
0
∆g′(t)dt< ǫ, (EC.45)
which proves that ∆g(t)→ 0 when t→∞. An analogous argument can be used to prove that
∆g(t)→ 0. Therefore, we have shown ∆gi(t)→ 0 for all i∈ I, which lead to Wi(t)→W ∗i by strict
monotonicity of gi(·) and compactness of Wi(t).
EC.7. Parameter Estimation Details
Geyer and Sieg (2013) estimated the average net number of households that move out of PH1,
PH2, and PH3 per quarter, which can be regarded as the resource arrival rates in the BQS mode,
namely µ1 = 118.9, µ2 =29.5, and µ3 = 9.1.
To estimate the proportion of each choice profile, we use the discrete-choice model estimated by
Geyer and Sieg (2013), which assumes a renter’s utility to be
ujt = γj +β ln(yjt)+ δxt +mc+ ξjt, j =1,2,3u0t = ln(y0t)+ δxt + ξ0t, j =1,2,3
(EC.46)
where j denotes the community type, (j =0 means living outside public housing), t is the index of
an applicant, γj, ujt represents the utility for applicant t to accept a unit of type j, γj represents
the fixed effect of community type j and yjt denotes the net income of the applicant reduced by
the rent of housing type j, β denotes the coefficient for the log of net-income (Geyer and Sieg
(2013) assumed β = 1 for the outside housing option), xt is a vector of a household’s attributes,
mc stands for the moving cost for the household (which is incurred when the household accept
public housing), and ξjt represents idiosyncratic shocks which are assumed to be i.i.d. and possess
extreme-value distributions.
Geyer and Sieg (2013) studied the choice behavior of householders already residing in public
housing as well as those outside public housing. In order to do that, they created two population
pools to generate xt and yt: the residence pool and the waitlist pool. The residence pool consists of
households who resided in public housing in the city of Pittsburgh from June 2001 to June 2006.
The attributes of such households were recorded in the HACP’s dataset. The waitlist pool consists
of 14,234 randomly selected low-income applicants who were eligible for public housing from 13
e-companion to Author: An Overloaded Bipartite Queueing System with Matching Cost ec17
metropolitan areas with a public housing-household ratio similar to the city of Pittsburgh. Their
information is recorded in the SIPP dataset. Geyer and Sieg (2013) argued that these applicants
are a good proxy for those on the waitlist for public housing in Pittsburgh. Since our case study
focuses on the dynamics on the waitlist, we ignore possible transfers between different community
types (which are small). Therefore, it suffices to create the waitlist pool using the same method
described in the above paper and then generate xt and yt from that waitlist.
The waitlist pool we use is a subset of the dataset used by Geyer and Sieg (2013). Therefore
their choice model can be applied to predict household choice in our system. Using xt and yt,
we can calculate the deterministic parts of the utility function ujt and u0t using the coefficients
reported in Geyer and Sieg (2013). By generating the random errors (ξjt)j=0,1,2,3 according to
their distributions, we can simulate the values of ujt and u0t and calculate the probabilities for
each choice profile. For example, Pr(1,2) = Pr(u1t > u2t > 0), Pr(2) = Pr(u1t > 0, u2t < 0, u3t < 0).
Although our choice set has only three types rather than the six types of (Geyer and Sieg 2013),
this does not lead to any estimation bias of the choice probabilities due to the “independence of
irrelevant alternatives” property of the discrete choice model.
We estimate our matching function U(·, ·) using the following reasoning. (1) The function values
should have the same scale for different applicants. Thus, we assign U(j, (j1, j2)) = 1 when j = j1,
i.e., when an applicant’s first choice was met; and we assign U(j, (j1, j2)) = 0 if j 6= j1, j2, that is,
if j is not in an applicant’s choice set. (2) For each individual applicant, the degree of matching
for the 3 options, i.e., the “first choice”, “second choice”, and “not interested in public housing”
have to reflect an individual’s degree of interest among the three options. We find that on average,
the second largest value among the random utilities (ujt)j=0,1,2,3 is 0.66 of the largest value after
normalizing “not interested in public housing” to be zero. Therefore, we use 0.66 as the value for
the degree of matching if an offered housing unit is the second choice of the applicant.
Based on our estimation from the historical data, new applicants arrive at a constant arrival
rate of 443 every quarter of a year. The waitlist was closed in May 2015 and re-opened in July
2015. Assuming that the 1578 applicants on the older closed waitlist for non-senior housing needed
to re-register again on the new waitlist and that 50% of these applicants did register on the new
waitlist in the first two quarters, yields the total arrival rate of 50%×1578+443= 1232 per quarter
in the first two quarters, and thereafter a stable 443 per quarter since the third quarter.
Finally, applicants on a waitlist may abandon the queue before they are assigned a housing
unit for various reasons. Based on our conversations with HACP, the most typical reason that an
applicant loses eligibility because of an increase in their income. The data shows that the time
for an eligible applicant to lose their eligibility has a similar distribution across different queues,
and can be approximated by an exponential distribution with a hazard rate of d= 0.07 quarters.
Therefore the cumulative distribution of abandonment time is given by Fi(t) = 1− exp(−dt).