the benefit of capacity pooling for repairable spare parts · the beneﬁt of capacity pooling for...

The Benefit of Capacity Pooling

for

Repairable Spare Parts

by

Pedram Sahba

A thesis submitted in conformity with the requirements

for the degree of Doctor of PhilosophyGraduate Department of Mechanical and Industrial Engineering

University of Toronto

Copyright c© 2012 by Pedram Sahba

Abstract

The Benefit of Capacity Pooling

for

Repairable Spare Parts

Pedram Sahba

Doctor of Philosophy

Graduate Department of Mechanical and Industrial Engineering

University of Toronto

2012

Capacity pooling in production systems, in the form of production capacity or in-

ventory pooling, has been extensively studied in the literature. While production capac-

ity pooling has been proven to be beneficial, the impact of inventory pooling has been

less significant. These results cannot be easily extended to repairable systems due to

fundamental differences between repairable and production systems. For one thing, in

repairable systems, the demand rate is a function of the number of operational machines,

whereas it is exogenous and constant in production systems.

In this Thesis, to serve different fleets of machines possibly at different locations, we

study whether repair shop pooling is more cost effective than having dedicated on-site

repair shops for each fleet. In the first model, we consider transportation delays and

related costs, which have been traditionally ignored in the literature. We include on-site

spare-part inventories that operate according to a continuous-review base-stock policy.

Our numerical findings indicate that when transportation costs are reasonable, repair

shop pooling is a better alternative.

Next, we model a pooled repair shop that fixes failed components from different

k-out-of-n:G systems. We permit a shared spare parts inventory serving all systems

and/or reserved spare parts inventories for each system; we call this a hybrid model. The

ii

destination for a repaired component can be chosen either on a first-come-first-served

basis or by following a static priority rule. Our findings show that both hybrid policies

are more cost effective than having separate repair shops and inventories for each system.

We propose implementing the multilevel rationing (MR) policy in systems with shared

inventory. The MR policy prioritizes classes, and stops serving a class from inventory

if the inventory level is below the inventory threshold identified for that class. When

there is no inventory, the repaired component is sent to the highest priority class among

those with down machines. To approximate the cost of the MR policy, we study an

M/G/1//N queueing system serving multiple classes of customers with an unreliable

server. Our numerical findings indicate that the MR policy performs as well as the

ε-optimal policy and outperforms the hybrid policies.

iii

Dedication

To my grandfather,

Ebrahim Vahabzadeh Roudsari

(1926–2011),

who lived a life of honesty and morality.

iv

Acknowledgements

I would like to thank all the people who helped me to prepare this Thesis. First,

I would like to extend my sincere gratitude to my supervisor, Dr. Barıs Balcıoglu for

giving me the opportunity to work with him and supporting me in every possible aspect.

His knowledge, patience, endless encouragement, and his supervision made this research

possible.

I am grateful to Dr. Dragan Banjevic for his support and his accurate reviews. I

admire his passion for tackling complicated problems. Special thanks goes to Dr. Andrew

K.S. Jardine for his visionary supervision. I would like to thank Dr. Elizabeth Thompson

who has patiently proofread this Thesis.

Dr. Chi-Guhn Lee has demonstrated an interest in this research from the beginning,

offering invaluable comments and recommendations. I am indebted to Dr. Lee, Dr.

Banjevic, Dr. Hans Frenk, and Dr. Daniel Frances for reading and reviewing this Thesis.

I acknowledge the experience, knowledge, and insight that I have gained academically,

personally, and culturally through my graduate studies at the University of Toronto. I

appreciate the support of all faculty members and staff at the department of Mechanical

and Industrial Engineering. It has been a pleasure and an honour to work with my fellow

students and friends, especially Solmaz Azari-Rad, Dr. Hossein Abouee Mehrizi, Vahid

Sarhangian, and Dr. Nima Safaei.

Many thanks to Kim Hindle, Jim Gallagher, and Glen Davidge for their support and

encouragement.

Last but not least, I would like to thank my family for their unwavering support

during every stage of my life.

v

Contents

1 Introduction 1

1.1 The impact of Transportation on Repairshop Capacity Pooling . . . . . . 3

1.2 Capacity Pooling in k-out-of-n:G systems . . . . . . . . . . . . . . . . . . 7

1.3 Queue with an Unreliable Server . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Multilevel Rationing (MR) Policy . . . . . . . . . . . . . . . . . . . . . . 15

2 The Impact of Transportation Delays 18

2.1 The RIF System Network . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 The Solution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Numerical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.1 Examples with Identical Fleets . . . . . . . . . . . . . . . . . . . 30

2.3.2 Examples with Heterogenous Fleets . . . . . . . . . . . . . . . . . 38

3 Spare Parts in k-out-of-n:G Systems 42

3.1 The Hybrid FCFS (HF) Policy . . . . . . . . . . . . . . . . . . . . . . . . 44

3.1.1 Obtaining pi(k) in the RIF model . . . . . . . . . . . . . . . . . . 46

3.1.2 Obtaining pD in the HF model . . . . . . . . . . . . . . . . . . . . 51

3.2 The Hybrid Priority (HP) Model . . . . . . . . . . . . . . . . . . . . . . 51

3.2.1 Obtaining pi(k) in the RIP model . . . . . . . . . . . . . . . . . . 52

3.3 Numerical Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3.1 The Summary of the Numerical Results . . . . . . . . . . . . . . 57

vi

3.3.2 The Relative Performance of the HF and HP Policies . . . . . . . 58

4 Queues with an Unreliable Server 65

4.1 The M/G/1//N Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.1.1 The Process Completion Time with Setup Times . . . . . . . . . 67

4.1.2 Busy Period Analysis for the M/G/1//N Queue . . . . . . . . . . 69

4.1.3 System Size Distribution in the M/G/1//N Queue . . . . . . . . 74

4.1.4 The Conditional Residual Augmented Process Completion Time . 79

4.1.5 The Multi-class M/G/1//N Queue . . . . . . . . . . . . . . . . . 81

4.2 The ODD M/G/1//N Queue . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3 The M/M/1//N Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3.1 The Multi-class M/M/1//N Queue . . . . . . . . . . . . . . . . . 89


4.4.1 The Impact of Customer Arrival Rates . . . . . . . . . . . . . . . 91

4.4.2 The Impact of Class 2 Service Time Distribution . . . . . . . . . 92

4.4.3 The Impact of Class 1 Service Time Distribution . . . . . . . . . 95

4.4.4 The Impact of Interruption Time Distribution . . . . . . . . . . . 98

5 The Multilevel Rationing Policy 100

5.1 The Multilevel Rationing Policy . . . . . . . . . . . . . . . . . . . . . . . 102

5.2 Obtaining the Moments of the Server Interruption Time for Class k in

Sub-system k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.2.1 The MR Policy Approximation . . . . . . . . . . . . . . . . . . . 114


5.3.1 The Accuracy of the the MR Policy Approximation . . . . . . . . 115

5.3.2 Relative Performances of the Policies . . . . . . . . . . . . . . . . 119

5.3.3 The Comparison of the MR and Optimal Policies . . . . . . . . . 123

6 Conclusions 129

vii

Bibliography 133

viii

List of Tables

2.1 The comparison of the RIF and the BC systems when the fleets are het-

erogenous. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.1 The minimum, mean, median and maximum values of cost reduction of

the hybrid policies compared to the BC model. . . . . . . . . . . . . . . . 57

3.2 The minimum, mean, median and maximum values of cost reduction due

to the HP policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.1 The maximum N2 that can be served when class 2 service time distribution

changes and λ1 = 0.01, λ2 = 0.02. . . . . . . . . . . . . . . . . . . . . . . 93

4.2 Average number of class 2 customers out of the system (E[NO2 ]) when class

2 service time distribution changes and λ1 = 0.01, λ2 = 0.02. . . . . . . . 94


changes and λ1 = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.4 Average number of class 1 customers out of the system (E[NO1 ]) when class

1 service time distribution changes and λ1 = 0.01. . . . . . . . . . . . . . 96


changes and λ1 = λ2 = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . 97


changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

ix

4.7 The maximum N2 that can be served when interruption time distribution

changes and λ1 = λ2 = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.8 The maximum N2 that can be served when interruption time distribution

changes and P 2,N2 ≥ 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.1 Parameters of the Examples-I . . . . . . . . . . . . . . . . . . . . . . . . 116

5.2 Parameters of the Examples-II . . . . . . . . . . . . . . . . . . . . . . . . 117

5.3 The Optimal Inventory Rationing Levels and C∗MR of the MR Policy-I . . 119

5.4 The Optimal Inventory Rationing Levels and C∗MR of the MR Policy-II . 120

5.5 Comparison of C∗MR with Csim

MR -I . . . . . . . . . . . . . . . . . . . . . . 121

5.6 Comparison of C∗MR with Csim

MR-II . . . . . . . . . . . . . . . . . . . . . . 122

5.7 The minimum, mean, median and maximum values of cost reduction of

the MR policy compared to the HF and HP policies. . . . . . . . . . . . 123

5.8 The Optimal HF and HP policies-I . . . . . . . . . . . . . . . . . . . . . 124

5.9 The Optimal HF and HP policies-II . . . . . . . . . . . . . . . . . . . . . 125

5.10 Comparison of ε-Optimal Policy and the MR Policy-I . . . . . . . . . . . 127

5.11 Comparison of ε-Optimal Policy and the MR Policy-II . . . . . . . . . . 128

x

List of Figures

2.1 The closed queueing network for fleet i . . . . . . . . . . . . . . . . . . . 22

2.2 The closed queueing network for m fleets . . . . . . . . . . . . . . . . . . 23

2.3 The impact of transportation delays and costs on C∗RIF when Ni = 10,

bi = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 The impact of transportation delays and costs on∑3

i=1 S∗i in the RIF

system when Ni = 10, bi = 10 . . . . . . . . . . . . . . . . . . . . . . . . 32


bi = 10, and hwi = 0.33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34



system when Ni = 10, bi = 10, and hwi = 0.33 . . . . . . . . . . . . . . . 34


bi = 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35



system when Ni = 10, bi = 100 . . . . . . . . . . . . . . . . . . . . . . . . 35

2.9 The impact of transportation delays and costs on C∗RIF when Ni = 5, bi = 10 36



system when Ni = 5, bi = 10 . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.11 The impact of ρ on ∆ when c2 = c3 = 2, Ni = 5, bi = 10, i = 1, 2, 3 . . . . 39

2.12 The impact of ρ on ∆ when c2 = c3 = 2, Ni = 5, bi = 100, i = 1, 2, 3 . . . 39


xi


2.15 The impact of ρ on ∆ when c2 = c3 = 2, Ni = 10, bi = 100, i = 1, 2, 3 . . 39

2.16 The impact of ρ on ∆ when c2 = c3 = 8, Ni = 10, bi = 100, i = 1, 2, 3 . . 39

3.1 The hybrid model with both shared and reserved inventories . . . . . . . 43

3.2 A sample path of the HF model . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 The cost reduction of HF (∆HFBC%) and HP (∆HP

BC%) compared to the BC

system when u = 0.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4 The actual AII vs. target AII availabilities for system II when u = 0.9 . 60

3.5 The maximum AII value below which the HP policy outperforms the HF

policy vs. µII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


policy vs. kII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.7 The cost of the HF policy vs. kII when AII = 0.95 . . . . . . . . . . . . . 62


policy vs. λII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63


policy vs. µII(= 0.01nII) . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.1 Gamma/Gamma/Gamma case: Service level (P 2,N2) for class 2 customers

when λ1 = 0.01 for different λ2 values . . . . . . . . . . . . . . . . . . . . 92

4.2 Gamma/Gamma/Gamma case: Service level (P 2,N2) for class 2 customers

when λ1 = 0.05 for different λ2 values . . . . . . . . . . . . . . . . . . . . 92

5.1 A Sample Path of the Single-Class Sub-system 1 . . . . . . . . . . . . . . 103

5.2 A Sample Path in Sub-system 2 when L3 = L2 . . . . . . . . . . . . . . . 104

5.3 Break Down of the Interruption Time of Class k . . . . . . . . . . . . . . 108

5.4 A Sample path of the interruption time for class k . . . . . . . . . . . . . 113

xii

Chapter 1

Introduction

For many companies the recent economic downturn has increased the need to extend the

lifetime of their existing production equipment. To achieve this goal, maintenance jobs

are more frequently outsourced to companies such as ABB and Advanced Technology

Services (Economist1); this has been an increasing trend over the last decade (Hui and

Tsang, 2004; Markeset and Kumar, 2008). Among other things, the outsourcing com-

panies consider consolidating spare part suppliers to reduce asset management costs for

their clients. Consolidation can be in the form of pooling repair shop capacity and/or

spare part inventories. When equipment stoppages arise due to critical component fail-

ures, repairing the failed components can incur less cost than buying new ones. This can

also be a more environmentally friendly practice than using non-repairable components.

To increase the availability of the production system, keeping spare parts of critical

repairable components should be carefully considered. The current investment in in-

ventories is enormous. In the first quarter of 2008, the U.S. Department of Commerce

announced a total investment of $1.284 trillion in the United States. These inventories

include raw materials, intermediate, final products, etc., but a significant portion is re-

lated to spare parts. Unlike work-in-process (WIP) and finished goods, which are directly

1“Outsourcing: A quick fix”, Vol. 390, No: 8617, February 7-13th ,2009, p. 58

1

Chapter 1. Introduction 2

demanded by the market, spare parts are used to replace failed components; spare parts

are also needed for preventive replacement if preventive maintenance is being conducted.

Therefore, the demand pattern for spare parts is different from final products. This calls

for a specific type of management for spare parts inventories. Spare parts inventory levels

are determined based on the state of equipment and maintenance policies, whereas other

inventories depend on market demand, production plans, quality, etc. An important

decision in spare parts management is determining an appropriate level of spare parts so

that failed components can be replaced without production stoppages. An insufficient

level causes unacceptably long downtime; an unreasonably high level causes the company

to incur extra holding costs. . In order to optimize the spare parts inventory level, the

cost of acquiring and holding spare parts must be compared to the cost and risks of not

having them at the time of a failure.

An example is a Toronto based mining company, a leading producer of nickel, copper,

cobalt and precious metals. In one of its mines, spanning an area of 60 km by 20 km,

a central repair shop is in charge of repairing failed pumps. During the rainy season,

underground water must be pumped to the surface; therefore, it is vital to maintain the

pumps and keep the system running. Some pumps are on stand-by and can substitute

for a failed pump, and a broken pump can be quickly replaced by an available spare

pump. All failed pumps are sent to the central repair shop for repair; thereafter, they

are stocked and distributed on first demand or returned to the original station.

In this Thesis, we analyze a problem where machines in different fleets are subject to

failure due to a single critical component. We consider different inventory topologies and

various dispatching rules from a single repair shop where the failed components from all

fleets are fixed. After describing each policy, we compare them to address three questions:

(i) Should there be a separate repair resource (e.g., shop, crew) plus a separate inventory

for each system, or should a single repair resource with a higher capacity serve all fleets?

(ii) Given a shared high capacity repair shop (to represent repair resources), should we


reserve a separate spare parts inventory for each system, or should a single spare parts

inventory be shared by all systems, or would a mixture of the two be more cost effective?

(iii) And finally, when a repaired component from the repair shop is dispatched, should

we choose the destination fleet according to a first-come-first-serve (FCFS) rule, or could

static or dynamic rules of prioritization of fleets make this decision more cost effective?

The answers to these questions are important for the outsourcing companies and the

maintenance department serving the branches of its parent company.

1.1 The impact of Transportation on Repairshop Ca-

pacity Pooling

Capacity pooling is an important theme in the queueing and operations management

literature. To determine whether capacity pooling is beneficial, a queueing system with

pooled service capacity is usually contrasted with a system ofm resources serving different

streams of independent arrivals. Pooling is conceived in two ways. First, in the pooling

of service rates independent resources are consolidated with a single server providing a

faster service rate (Yu, Benjaafar, and Gerchak, 2009). In this Thesis, the service rate of

the server with pooled capacity is the sum of the service rates of independent resources.

Second, in the consolidation of servers multiple servers are placed at a single location

(Smith and Whitt, 1981, Benjaafar, 1995). Either way, capacity pooling usually decreases

the total system costs (for design problems on choosing the number of servers and service

rates, see Stidham, 1970). However, when queueing systems are used to analyze supply

chains, production capacity pooling cannot be solely modeled by a faster single server or a

larger group of servers at one location. In such problems, capacity pooling implies that the

products cannot be delivered to different markets instantaneously and that their delivery

costs the decision makers. The same issue arises in inventory pooling, too (Eppen, 1979,

Gerchak and He, 2003, Benjaafar, Cooper, and Kim, 2005). Thus, transportation delays


and costs have to be incorporated into models of the pooled system, something that has

been ignored in earlier research.

In Chapter 2, we explore the effect of transportation delays and costs on the ben-

efits of capacity pooling in a repair/maintenance shop. The decision maker can be an

outsourcing company serving a number of clients or the maintenance department of a

company that serves various branches of the parent company. Accordingly, each client

or branch at a different location is a fleet of identical machines, and each machine is

subject to failure due to a single critical component, e.g., engines. When a component

fails, it is sent to a repair shop to be fixed. To reduce down time, a stock of criti-

cal components reserved for each fleet is kept as spare parts on-site. If there is stock,

a spare component is instantaneously installed on the failed machine. Otherwise, the

failed machine is down until a repaired component can be dispatched from the repair

shop. If all machines are functional, the repaired component is placed in the spare part

inventory. In production/inventory systems, production and deliveries are usually done

in batches. In contrast, when components are expensive, and failures are rare (equiva-

lently, times to failures are much longer than repair times) it is assumed that the broken

(fixed) components to (from) the repair shop are sent (received) one by one (Graves,

1985, Caggiano, et al., 2009). In this problem, we consider continuous-review base-stock

policies for controlling spare-part inventories.

In this setting, the decision maker has two alternatives. In the first, a separate on-site

repair shop can be dedicated to the fleet at that location. The advantage is that fleets do

not suffer from transportation delays and the decision maker (alternatively, the system)

does not incur transportation costs. In the second, a centralized repair shop with a higher

capacity serves all fleets. Thus, some locations experience transportation delays and the

system incurs transportation costs. However, a higher capacity drastically reduces repair

times and can prevent lengthy down times. By comparing these two systems, we address

the important question of when repair shop pooling is beneficial if transportation times


and costs are not negligible.

The results of this research are important to maintenance outsourcing companies and

large companies that operate and maintain manufacturing plants at different locations.

In both cases, the goal is decreasing maintenance costs while not increasing production

stoppages and losses. For a company, the down time cost includes the cost of production

losses and the cost of repair. The holding cost includes capital costs of the investment

tied up in stock as well as operational costs of warehousing (Silver, Pyke, and Peterson,

1998, p. 45). Both holding and down time costs are regularly expressed per item per unit

time (Louit et al., 2010). Transportation costs refer to costs incurred for handling and

delivering broken and fixed components between the repair shop and the fleet locations.

In this context, whether to pool repair shop capacity is an important consideration. From

the outsourcing company’s point of view, the down time cost of a fleet may be the penalty

cost it will pay the client if machines become down. It is reasonable that the outsourcing

company is responsible for deliveries without charging transportation costs to the client.

It can also be agreed that inventory or repair facility space should be allocated on-site

by the client, as the outsourcing company will be tying up its capital in spare parts and

will be operating the inventories and the repair shop(s). Given this, cost minimization

is an objective for both an outsourcing company and a maintenance department of the

parent company. In both cases, the decision maker needs to compare the pros and cons

of repair shop pooling at a single location.

In this context, the nature and the formulation of the problem can make analysis

quite difficult. A simple system of one repair shop and one spare parts inventory at each

location can be analyzed by a birth-and-death process, e.g., Taylor and Jackson (1954).

But if failed components from each fleet are treated to form a separate class of customers,

and all customer types are served by a centralized resource, (e.g., the centralized repair

shop), the problem turns into a multi-class queueing system. If the repair shop is modeled

by an infinite server group, and failure rate at each location is considered constant,


assuming deterministic transportation times, the approximation due to Graves (1985)

can be used to determine base-stock levels at each location. Similarly, assuming constant

failure rates from each fleet and considering an infinite centralized inventory (instead

of a centralized repair shop) from which new non-repairable service parts are sent to

local warehouses when needed, Kutanoglu and Mahajan (2009) include transportation

delays in their model. Our problem, on the other hand, is a queueing system with

finite calling populations. In our problem, multiple fleets are served by a single repair

shop, making it a machine interference problem (MIP) (see Haque and M. J. Armstrong,

2007 for a recent literature survey). Observe that we consider state-dependent customer

arrival rates and a single server whereas Graves considers constant arrival rates and

an infinite server group. The typical solution for the MIP is to model the underlying

queueing system with finite calling populations (Haque and M. J. Armstrong, 2007) to

obtain the steady-state performance measures. However, multi-class systems such as

our problem with state-dependent failure rates (failure rates depend on the number of

functional machines in the fleet) served by a centralized repair shop with local spare

parts inventories at each location is difficult to analyze, even with a first-come-first-

served (FCFS) dispatching policy. Incorporating transportation delays in this model is

even more challenging. In fact, even in a production/inventory setting where demand

rates are assumed to be constant, incorporating transportation delays in the underlying

queueing model is difficult. This may explain why the impact of transportation delays

and costs has not been addressed in the literature on resource pooling. Simulation is a

viable yet costly approach to assess the impact of transportation delays on certain spare

part provisioning problems with a centralized repair shop.

To this end, we model this system as a closed queueing network; instead of balance

equations, we exploit the Mean-Value Analysis (MVA) developed by Reiser and Laven-

berg (1980) to obtain the stationary system size distribution. MVA, like the convolution

algorithm (Buzen, 1973), is a numerical algorithm that takes advantage of the product


form property of queueing networks with certain conditions (see Gordon and Newell,

1967). Since closed form cost functions are not available, we perform an extensive nu-

merical study, the results of which, we believe, are important. In Chapters 3 and 5, we

show that repair shop pooling is beneficial when transportation delays and costs are neg-

ligible, but it is not always the case when transportation delays and costs are considered.

However, when transportation costs are not unreasonably high, as one would expect in

land transportation, even fleets at long distances can be served from a centralized repair

shop; the system will incur less cost than would dedicated on-site repair shops. Moreover,

repair shop pooling is more attractive if fleet sizes increase or machines become more un-

reliable. Chapter 2 demonstrates the benefits of capacity pooling in realistic settings.

Since, to the best of our knowledge, this is the first research to include transportation

delays and costs in pooling problems with finite repair capacity, it will trigger interest in

production/inventory systems for which land and sea transportation are widely used. An

additional consideration in repair/spare part inventory and production/inventory systems

would be adding a shared inventory, which is discussed next.

1.2 Capacity Pooling in k-out-of-n:G systems

Many complex and technologically advanced systems such as those in the electrical power

industry (Levitin and Amari, 2010) and equipment such as radar or sonar systems used

in mining (de Smidt-Destombes, van der Heijden, and van Harten, 2004) are k-out-of-n:G

systems comprised of identical components. A k-out-of-n:G system consists of n compo-

nents each of which can fail from time to time. The system is deemed functional/available

as long as a minimum of k components are functional. In Chapter 3, we model a repair

shop that fixes failed components from several such systems with spares kept to increase

the availability of these systems. Investigating k-out-of-n:G systems in this chapter was

motivated by a British Columbia based mining company that uses thickeners in its pro-


cesses. A thickener is a large tank with a slow turning rake used to settle and remove

precipitated solids. In the acidic, neutral, and clarifying parts of the process, different

k-out-of-n:G systems are utilized, but each uses the same type of rake drive. In Chapter

3, we develop two alternatives involving a mixed inventory topology (a shared inventory

together with reserved emergency inventories) and a shared repair shop under the FCFS

and priority-based dispatching policies. We obtain the steady-state system size distribu-

tions at the repair shop; this allows us to compare performance and address the questions

raised above.

Considering repairable components in an k-out-of-n:G system with spares is not new

in the literature. Generally, however, the focus is on a single system; therefore, dis-

patching policies (except for 1-out-of-n:G systems) or different inventory structures do

not arise from the problem context. Gupta and Sharma (1981) model such a system

as a Markov chain by introducing operational, repair and installation as possible states

for a component that is not stored in inventory. Fawzi and Hawkes (1991) revisit the

same problem and assume that the single repair server gives installation of a spare part

preemptive repeat priority over repair. In our study, we consider instantaneous instal-

lations; instead, a component is either operational, in inventory, or in the repair shop

(either waiting to be fixed or being repaired). Assuming also a single server to model the

repair shop, Frostig and Levikson (2002) allow the repair times to follow non-Exponential

distributions. In all these studies including ours, each broken component is sent to the

repair shop as soon as it fails. de Smidt-Destombes, van der Heijden, and van Harten

(2004), on the other hand, assume that the repair process starts only after a given number

of failed components accumulate.

The case of a 1-out-n:G system, as in Chapters 2 and 5, can represent a fleet of

machines and has been extensively studied in the literature in machine interference or

machine repairperson problems. However, including a joint repair shop is difficult even

when an FCFS dispatching rule is followed for multiple 1-out-n:G systems. Earlier work


addresses this without considering inventories. Chandra (1986) employs MVA as the

only suitable analytical/numerical technique for the FCFS repair policy for m fleets of

machines sharing a single repair shop. He models the non-preemptive priority policy in

the same study. When priorities are also introduced in finite population systems, the

analysis becomes more challenging. Miller (1981) presents recursive computational for-

mulae to obtain the steady-state distributions of customers in a two-priority (preemptive

and non-preemptive) class Markovian single server queue. Veran (1984) analyzes the

same system assuming a preemptive-resume policy and avoids the computational com-

plexity of the method as in Jaiswal (1968). Bitran and Caldentey (2002) investigate a

two-priority class queueing system with state-dependent arrival rates operating under

the preemptive-resume priority policy. They present a general approach for computing

the steady-state distribution of the number of customers in the system for each class.

Iravani, Krishnamurthy, and Chao (2007) consider a Markovian finite-population queue-

ing system with heterogeneous fleets of machines repaired by a single server. They prove

that when preemption is not allowed, a simple static non-preemptive priority policy is

optimal, and they present sufficient conditions to prioritize the classes correctly. Iravani

and Kolfal (2005) study the same problem when preemption is permitted and show under

which conditions a static preemptive-resume priority policy is optimal.

In Chapter 3, we analyze a Markovian single server queueing system with multiple

classes of customers whose arrival rates are state-dependent. Under the FCFS and the

priority policy, when the exponential service/repair rate is the same for all classes, we

obtain the exact steady-state distribution of the number of customers in the system for

each class. Then, we extend the model in Bitran and Caldentey (2002) to more than

2 priority classes of customers, in which different exponential repair rates can also be

assumed for each class. An immediate benefit of this extension is that when the optimal

repair policy in a system without inventories is the static preemptive-resume policy, the

cost of the system studied by Iravani and Kolfal with m fleets of machines can be now


computed.

More importantly, these models allow a flexible use of spare part inventory structures.

This is important because earlier studies (Graves and Keilson, 1983, Dshalalow, 1991,

Abboud and Daigle, 1997) with spare part inventories usually assume a single finite

population. Therefore, reserved inventories for each population have not been compared

to a shared inventory for all. Benjaafar, Cooper, and Kim (2005) make this type of

comparison in systems operating under the FCFS policy with constant customer arrival

rates; they prove that a shared inventory results in a cost less than or equal to that of the

alternative with reserved inventories when the holding cost is the same in both models

and the backordering cost rate is the same for each class. In our case, different inventory

levels change the state-dependent arrival rates, rendering comparison more difficult.

We make this comparison by developing two models serving multiple k-out-of-n:G

systems. First, the HP (hybrid priority) model has a shared inventory for all systems

and may have reserved inventories for some systems. The shared inventory is depleted

on an FCFS basis, and when it is empty, the repair shop dispatches repaired components

to systems/reserved inventories according to their priorities. Second, the HF (hybrid

FCFS) model is similar except that when its shared inventory is depleted, the repair

shop dispatches repaired components to the system (or the inventory reserved for that

system) with the longest waiting repair order. Incorporating dynamic priority rules is

overly complex, and it is not studied in this Thesis.

Since it is not possible to show theoretically when one policy is better than the other

one, we provide the results of our extensive numerical study in Section 3.3. The examples

minimize the holding cost rates subject to k-out-of-n:G systems meeting minimum avail-

ability levels set for each system. We can summarize our conclusions in the following way.

First, we allocate a separate repair shop and inventory to each system. We refer to this

as the base case (BC) model. We add the repair rates of the separate repair shops and

use the total as the repair rate of the single repair shop of the HF and HP policies. This


is a common technique for modeling resource capacity pooling (see Yu, Benjaafar, and

Gerchak, 2009, and the references therein for capacity pooling in production/inventory

systems). Our numerical results show that both policies with a pooled repair shop capac-

ity are superior to the BC model in terms of reducing system costs. Next, we compare

the relative performances of the two policies. Our results indicate that the HP policy is

better in most of the examples; however, lower repair capacity degrades its performance,

and if the minimum availabilities set for systems are close, then, the HF policy is better.

The models summarized so far show the benefit of repair shop capacity pooling –

alongside inventory pooling in Chapter 3 – by assuming exponential repair times. If this

assumption is relaxed, by even permitting repair shop unavailability, and assuming class

specific non-exponential repair times, the analysis becomes more complex. However, in

Chapter 4, without considering inventories, we develop a formulation, which is discussed

next.

1.3 Queue with an Unreliable Server

In Chapter 4, we analyze anM/G/1//N queuing system with an unreliable server serving

m finite-source populations/customer classes indexed by k = 1, ..., m. Each population k

consists of Nk customers (type k customer). Such queueing models traditionally consider

only a single finite-source population and a reliable server and, as such, are extensively

studied in the literature. For instance, in the MIP, N can be the number of machines

in a fleet, each subject to failure; upon failure, they are repaired by the repair facility,

modeled as a single server. The repair facility may be unavailable from time to time

(see, e.g., Wang, 1990), thus increasing the wait times of failed machines in the repair

shop. In modeling telecommunication or computer networks (e.g., Sztrik and Gal, 1990,

Almasi and Sztrik, 2004), the finite number (N) of potential customers might correspond

to active terminals generating jobs for the central processor unit (CPU), which can be


modeled as a single server. The CPU might be interrupted and become unavailable from

time to time; jobs generated by the terminals cannot be processed until the CPU is

recovered.

We assume that customers from different classes are served according to the preemptive-

resume priority discipline. We consider setup times prior to picking up the next customer

or resuming the service of an interrupted customer. The server can be disrupted whether

it is idle, under setup or serving a customer. We define the times between interruptions,

or the ON periods, as the times between the end of one interruption and the start of

the next. We assume that ON periods and times between customer arrivals are expo-

nentially distributed (possibly with different arrival rates for different customer types).

A distinctive feature of our model is its capacity to include multiple classes of customers

and setup times. We also are unique in assuming that service, setup and down times (also

called OFF periods) are random variables (r.v.s) with general distributions. Using our

model, via numerical examples in Section 4.4, we demonstrate that the distributions of

service and down times have a significant impact on service levels, which can sometimes

be counterintuitive. For instance, unlike in the classical M/G/1 queue with constant ar-

rival rates, we find that higher service time variability can sometimes reduce the expected

queue length in the M/G/1//N queue. This underlines the importance of incorporating

general distributions for these r.v.s in the analysis to correctly compute the performance

measures for each class.

In Chapter 4, we focus on “operation-independent disruptions (OID)” indicating that

the server can be disrupted at any time – even when it is idle or being set up – except

during its own OFF periods. If we assume that the characteristics of times between

interruptions and down times experienced by an idle server differ from when it is being

set up or serving customers, we arrive at the ODD M/G/1//N queue where ODD stands

for “operation-dependent disruptions”. We present the analysis of the ODD M/G/1//N

queue in Section 4.2. Note that we adopt the definitions of OID and ODD from Altıok


(1997, p. 85). Since this Thesis is primarily on the M/G/1//N queue with OID, for

the sake of simplicity, we simply refer to it as the M/G/1//N queue. The system with

operation-dependent disruptions will be always called the ODD M/G/1//N queue.

Queueing models with unreliable servers have been widely studied since the semi-

nal paper by White and Christie (1957). Although the nature and the context of the

problems analyzed vary considerably, the early body of work loosely revolves around two

considerations: 1) whether the customer population is infinite or finite, and 2) whether

the ON periods of the server(s) are operation-independent or operation-dependent.

We first summarize the papers that consider infinite populations. White and Christie

assume operation-independent exponential ON periods in the M/M/1 queue. Assuming

that OFF periods are also exponential r.v.s, they obtain the steady-state probability

distribution of the time a customer spends in the system. Gaver (1962), Avi-Itzhak

and Naor (1963), and Thiruvengadam (1963) extend this model assuming that service

times and OFF periods have general distributions. In his analysis, Gaver (1962) con-

siders operation-dependent ON periods and assumes that the customer whose service is

interrupted resumes its service from the moment of interruption once the OFF period

is over. He introduces the process completion time, the total time a customer spends

on the server including its actual service time plus possible OFF periods. Avt-Itzhak

and Naor (1963) and Thiruvengadam (1963) consider both operation-dependent and

operation-independent ON periods. Mitrany and Avi-Itzhak (1968) and Neuts and Lu-

cantoni (1979) study the multi-server M/M/c queues with random breakdowns. For

M/G/1 queues with operation-independent ON times, Federgruen and Green (1986) de-

rive bounds and approximations for the mean waiting time, probability of delay and

steady-state system size distribution when ON and OFF periods are general i.i.d. r.v.s.

Federgruen and Green (1988) revisit the problem, this time assuming that ON periods

are phase-type r.v.s. They provide an exact algorithm to obtain the steady-state system

performance measures. For the M/G/1 queue with interruptions, we also refer the reader


to Wang, Cao, and Li (2001), Atencia, Bouza, and Moreno (2008), and Fiems, Maertens,

and Bruneel (2008). Balcıoglu, Jagerman, and Altıok (2007) design an accurate approxi-

mation to obtain the mean waiting time in the GI/D/1 queue with operation-dependent

phase-type ON and general OFF periods.

Next we note the papers that consider finite-calling populations, which are part

of the MIP (see Stecke and Aronson, 1985, and Haque and Armstrong, 2007, for an

extensive bibliography on the MIP) with unreliable servers. Wang (1990) analyzes

the M/M/1//N queue with an unreliable server. For both operation-dependent and

operation-independent interruptions, Wang assumes exponential ON and OFF periods.

Wang and Kuo (1997) extend this model assuming exponential operation-independent

ON periods, Erlangian service times and Erlangian OFF periods. Chakravarthy and

Agrawal (2003) generalize the results of Wang and Kuo by considering phase-type dis-

tributions for service times and OFF periods.

As the literature review suggests, using non-exponential distributions for underlying

r.v.s in these queueing systems is challenging. Neither incorporating non-exponential

times between customer arrivals nor assuming non-exponential ON period distributions

is analytically tractable in systems with a finite-calling population, whether these systems

experience operation-dependent or operation-independent server disruptions (except in

M/G/1 systems with phase-type ON periods as in Federgruen and Green, 1988, and

Balcıoglu, Jagerman, and Altıok, 2007). Similar difficulties arise for general service time

and OFF period distributions. Among the three papers that are relatively closest to our

problem, (Wang, 1990, Wang and Kuo, 1997, and Chakravarthy and Agrawal, 2003) two

have successfully incorporated either Erlang distribution (Wang and Kuo, 1997) or phase-

type distributions (Chakravarthy and Agrawal, 2002) for both r.v.s. considering only a

single finite population of customers to be served by the unreliable server. These studies

employ the matrix-analytic method to find the steady-state system size distribution;

this can be computationally intensive if the structure of the phase-type distribution is


complex.

In Chapter 4, we first study a finite-source single-class queue with an unreliable server.

We then analyze the busy period in this queueing system, and derive the steady-state sys-

tem size distributions at departure/arrival, and arbitrary time epochs. We introduce the

residual augmented process completion times conditioned on the number of customers

in the system to obtain the system time distribution. Using busy period analysis, we

develop a recursive method to extend our result to multi-class queues. In our numerical

experiment, we consider an unreliable server that is attending to two classes of customers.

We explore how the variability in service times and OFF periods changes the maximum

number of customers that can be served from each class while meeting certain service

levels. In these examples, lower variability in service times does not increase the service

level; in some cases, it even decreases the maximum number that can be served. More in-

terestingly, we also observe instances in which higher service time variability shortens the

expected queue lengths more than smaller service time variability. While less variability

in the OFF times of the server and high-priority class service times usually increases the

number of low-priority customers that can be served, there are counterexamples where

higher variance in these r.v.s leads to better performance. These examples reveal the

intricate dynamics among the underlying r.v.s, and indicate the importance of incorpo-

rating the general distributions into our model. The model studied in Chapter 4 is also

useful in studying the multilevel rationing policy in repair shop as summarized in the

next section.

1.4 Multilevel Rationing (MR) Policy

In Chapter 5, we develop a model for a system comprised of a repair shop and a centralized

spare part inventory serving several fleets of finitely many machines. All machines fail

due to a single type of repairable component. When a component fails, if a spare cannot


be installed, the machine housing that component also fails and stays down until a

component can be installed. We propose implementing the multilevel rationing (MR)

policy from the literature in this setting. Briefly stated, the MR policy prioritizes classes

and stops serving a class if the inventory level is below the inventory threshold identified

for that class. When there is no inventory, the repaired component is sent to the highest

priority class among those with down machines.

The benefits of holding a single inventory for all classes instead of keeping separate

inventories was first shown by Eppen (1979). Since then, the literature has modeled

production/inventory systems where customers are assumed to place orders according to

homogenous Poisson processes. Ha (1997a) models a Markovian multi-class single server

system with a centralized inventory in which unsatisfied demands are lost. Ha (1997b)

studies the same problem with two classes of customers when backordering is allowed.

In both studies, he proves that in systems with centralized inventories, the MR policy

is the optimal production control policy. de Vericourt, Karaesmen, and Dallery (2001)

prove that the MR policy is optimal when serving m classes of customers from a pooled

inventory when backordering is allowed. de Vericourt, Karaesmen, and Dallery (2002)

provide an algorithm to compute the optimal rationing levels and the cost of the MR

policy in M/M/1 systems serving m classes of customers when backordering is allowed.

Finally, Abouee-Mehrizi, Balcıoglu and Baron (2011) obtain the optimal cost and the

rationing levels of the MR policy in M/G/1 systems.

The primary difference between our problem and earlier work is the state-dependent

arrival/failure rates of broken components (customers) at the repair shop. One of the

complications arising from state-dependent arrival rates of customers is the difficulty of

identifying the optimal policy. There are few models one can employ as an alternative

to the MR policy developed here; the HF and HP policies proposed in Chapter 3 are

suitable alternatives.

For a system with two classes, the MR policy is a special case of the HP policy.


However, this does not hold for three classes or more. In Chapter 5, to analyze a system

operating under the multi-class MR policy, we break this system down into a number of

sub-systems. A part of these sub-systems is, in fact, a finite-source priority queue with

a server subject to operation independent failures, studied in Chapter 4. Our numerical

study in Chapter 5 shows that the MR policy is the best known policy that can perform

as well as the ε-optimal policy. Using four models, this Thesis points to the advantages

of having a high capacity repair shop and a spare part inventory controlled by the MR

policy to reduce system costs and/or increase availabilities of multiple fleets of machines.

Chapter 2

The Impact of Transportation

Delays

In this chapter, a system of m fleets at different locations (e.g., manufacturing plants or

mines) is considered. Each fleet i has Ni machines (interchangeably referred to as type i

machine), i = 1, . . . , m, and aims to have all machines functional at all times to continue

production at targeted levels. Each machine is subject to failure due to a single repairable

component which must not necessarily be the same across all locations but allowed to be

location-specific. We assume that times to failure for each machine/component follow an

independent exponential distribution with possibly different rates, λi. When a machine

fails, the broken component is repaired at a designated repair shop. In this case, the

repair shop is modeled as a single server queueing system with exponential service/repair

times that do not depend on the “origin” of the component. To reduce the unavailability

of a failed machine, Si units of the critical component are kept in a spare parts inventory

at location i (namely, inventory i) for fleet i. Thus, the failed component can be replaced

immediately if a spare part is available, thereby avoiding any production loss. Otherwise,

the machine is down until a component is repaired and re-installed on that machine.

During down times, the system incurs a down time cost of bi per type i machine down

18

Chapter 2. The Impact of Transportation Delays 19

per unit time. Similar to Louit et al. (2010), we assume that the total inventory holding

costs are paid for the entire stock as hi × Si as the capital cost tied up for keeping Si

units of the critical component for fleet i, where hi is incurred per unit per unit time.

One can consider the warehousing cost for the items kept in inventories as well using the

warehousing cost hwi per stock in inventory i per unit time. We let ci denote the unit

time cost incurred for transporting a component from repair shop to a fleet i and in the

opposite direction.

We consider two alternative repair systems. In the first, each location can have its own

repair shop. We refer to this system as the base case (BC) system. In the BC system,

the repair shop at location i has a repair rate of µi. In the second, there is a single

centralized repair shop that serves all fleets with a repair rate of µ. In this case, as soon

as a machine fails at location/in fleet i, the broken component is sent to the centralized

repair shop as a type i order. In dispatching repaired components to locations, the FCFS

policy is the only policy that we can analyze using the MVA as discussed in Section

2.2. Considering priority policies appears to be a more difficult problem which we are

planning to work on in the future. Under the FCFS policy, components sent from fleet i

are returned to the same fleet once their repairs are completed. We refer to this system as

the reserved inventory-FCFS (RIF) system. In both systems, we exclude the possibility

of transhipment of a spare part from the positive inventory of another location.

We assume that transportation times and costs are significant in the RIF system;

these obviously do not apply to the BC system. In the RIF system, when a component

fails, it must be transported to the repair shop; after its repair, it is returned to its

fleet. We assume that the mean transportation times from (to) the repair shop to (from)

location i are equal to 1/γi. In addition to the time a broken component spends in the

repair shop, the time spent in these two stages of transportation can lengthen the down

times of machines and shorten the periods when the inventory level is high. In other

words, transportation times have an indirect impact on the overall system holding and


down time costs. Our objective is to determine the optimal base-stock levels S∗i in both

of the BC and RIF systems to reserve for each fleet i to minimize the long-run average

system cost as a time-average.

Let pi(n) be the steady-state probability of having n functional components at location

i, that is, the number of components in use and spare parts in inventory i. The optimal

objective values C∗BC and C∗

RIF of the BC and RIF systems, respectively, can be expressed

as follows:

C∗BC = min

m∑

i=1

Ci(Ki), (2.1)

C∗RIF = min

(m∑

i=1

Ci(Ki) + 2ci

m∑

i=1

θi(K)

), (2.2)

where K = (K1, K2, . . . , Km) with Ki = Ni +Si, i = 1, . . . , m and Ci(Ki) is the cost due

to fleet i given by

Ci(Ki) = hi(Ki −Ni) + hwi

Ki∑

n=Ni

(n−Ni)pi(n) + bi

Ni∑

n=0

(Ni − n)pi(n), (2.3)

and θi(K) is the expected number of components from fleet i transported per unit time

in either way to or from the repair shop. The mean number of components from fleet i

transported per unit time to and from the repair shop is equal (shown in Section 2.1).

Thus, we have 2ci∑m

i=1 θi(K) in Eq. (2.2). In Eq. (2.3), the term hi(Ki − Ni) captures

the capital cost tied up because of investing in an additional Ki − Ni = Si units of the

critical component. This is the cost incurred irrespective of whether these Si units are in

the warehouse, in transit or the repair shop. The second and third summations on the

right hand side of the same equation represent the average warehousing and down time

costs, respectively.

Note that Ci(Ki) in Eq. (2.3) differs for the BC and RIF systems because of different

pi(n) values. In the BC system, these probabilities can be derived by a simple birth-

and-death process (e.g., Gross and Harris, 1998, p. 82-83); using Eqs. (2.1) and (2.3),

the optimal inventory levels can then be found by searching on K. For the RIF system,


to find pi(n) and θi(K), we use the queueing networks approach (presented in the next

section). Then, Eqs. (2.2) and (2.3) are used to determine the optimal inventory levels

by searching on K.

The main contribution of this chapter is the introduction of a queueing model in an

unexplored field; more specifically, we consider the impact of transportation delays and

costs on an inventory/repair shop system. The rest of the chapter is organized as follows.

In Section 2.1, we define a repair system network for a centralized repair shop. In Section

2.2, we present the algorithm to obtain the cost of this system. The results of a numerical

study comparing two alternatives is presented in Section 2.3.

2.1 The RIF System Network

To incorporate transportation delays into the RIF system, we construct a closed queueing

network for each fleet i (network i), as shown in Figure 2.1. In this figure, we see four

queueing systems (alternatively stations) forming a chain for fleet i. The transportation

delays are modeled as the service times in two M/G/∞ queues, which are stations i1

and i3. Between them at station i2, we have fleet i and inventory i, modeled as an

M/M/Ni queue. This queue is interpreted as follows. The busy (idle) servers at any

time correspond to functional (down) machines. Any customers waiting in the queue

correspond to spare parts available in inventory i. For instance, in Figure 2.1, there are

two customers in the queue; that is, all Ni machines of the fleet are functional, and we

have two units of spare parts in inventory i. A service completion in this M/M/Ni queue

corresponds to a component failure. Every departure from this queue is an instantaneous

arrival at the M/G/∞ queue/station i3. The service time of this customer at station i3

is the transportation delay of the broken component from location i to the repair shop.

Thus, each fleet has its own three stations modeling its functional machines and/or spare

parts and two transportation delays.


Figure 2.1: The closed queueing network for fleet i

All fleets share the centralized repair shop at the repair station, which is modeled

as a single server queue with exponential service (repair) times with rate µ. Here, the

queued customers are the broken components waiting to be fixed. Customers arriving

from station i3 at repair station are type i customers and are served on an FCFS basis. In

Figure 2.1, a type i component is being repaired, while the next job in the queue has come

from network/fleet m, followed by another type i customer. Upon service completion at

the repair station, type i customers are instantaneously sent to the M/G/∞ queue at

station i1. The service time of a customer here is the transportation delay of the fixed

component from the repair shop to location i.

Thus, with m fleets, each fleet having its own three stations and a single repair

station shared by all fleets, the RIF system can be represented by I = 3m+ 1 stations.

The entire queueing network with m fleets is given in Figure 2.2. In the RIF system

network, we use the following indexing: station I denotes the repair shop, station i2

corresponds to location i, stations i1 and i3 model transportation delays for components

of fleet i, i = 1, . . . , m. We can represent the state of the RIF system by the vector n =

(n11, n12, n13, . . . , nm1, nm2, nm3, nI), where nij is the number of components at station j

(j = 1, 2, 3) of fleet i, i = 1, . . . , m, and nI is the number of broken components from all

fleets at the repair shop.


Figure 2.2: The closed queueing network for m fleets

In the RIF system network, all servers follow the FCFS discipline. The routing of a

component between stations only depends on its current station (therefore, the routing

is Markovian), and in fact, given its current station, we know which station it will enter

next. A network with these conditions assumes a product-form solution. That is, the

steady-state distribution of the number of components from each fleet present in each

station has the following form (Baskett et al., 1975):

p(n) = CgI(nI)m∏

i=1

3∏

j=1

gij(nij). (2.4)

In Eq. (2.4), gij(nij) is a function of nij , the number of components in station ij, and

gI(nI) is a function of nI = (n1I , n2I , ..., nmI) where niI is the number of components

from fleet i in the repair shop. Since this network is closed, the total number of compo-

nents in the system is constant and equals∑m

i=1Ki. The use of the normalizing constant,

C, makes the probabilities add up to 1.

In steady-state, the mean arrival and departure rates of any station are equal. Due

to the deterministic routing rule, a departure from a station is an arrival at the next

station in network i; hence, the throughput of network i, θi(K), is equal to the mean

arrival/departure rate of any arbitrary station in it.

Throughputs in a closed network cannot be obtained simply by solving the system of


traffic equations. However, by making use of the product form property of the network,

a number of algorithms have been developed to obtain them and the steady-state system

size distributions. For example, the convolution algorithm (Buzen, 1973) exploits a spe-

cial property of the normalization factor, C. This algorithm starts with a single queuing

system of the network; it adds a new queue at each step and updates the normalization

factor by convolution each time a new queue is added to the system.

Alternatively, the MVA (Reiser and Lavenberg, 1980) starts with a complete system

containing all queues/stations but no customers. Through the algorithm, customers are

added to the system one by one until the desired number of customers is reached. This

process is similar to adding one more spare part to the system. However, we need to

develop a method to obtain the total cost of the system each time a new spare part is

added so that the search can stop after the base-stock levels minimizing the RIF system

cost are attained. We present this algorithm in the next section.

2.2 The Solution Algorithm

In this section, we employ the MVA to obtain the steady-state system size distribution

of the RIF system network. Recalling that K = (K1, K2, . . . , Km) and Ki = Ni +Si, Eq.

(2.3) is rewritten as

Ci(Ki) = hi(Ki−Ni)+hwi

Ki∑

n=Ni

(n−Ni)pi2(n,K)+ bi

Ni∑

n=0

(Ni−n)pi2(n,K)+2ci

m∑

i=1

θi(K),

(2.5)

where, the subscript i2 from ni2 is dropped for the sake of brevity; as before, pi2(n,K) is

the steady-state probability of having n components, either in use or as spare parts, at

location i.

To obtain pi2(n,K), we introduce Nij(K) (NiI(K)) and Tij(K) (TiI(K)) as the ex-

pected number of components and the expected system time of a component from

fleet i, i = 1, . . . , m, in station j, j = 1, 2, 3 (station I), respectively. We also define


NI(K) =∑m

i=1NiI(K) as the mean number of components in station I. The expected

system time in the infinite server queues is simply the mean service/transportation time

of that station.

We start by assuming that Ni = 1 possibly with some spare components (as queued

components) at stations i2, i = 1, . . . , m. At these stations, which are single server queues

for now, and at the repair shop station, the expected system time is composed of the

service time of the component itself and the sum of the service times of the components

present in the system at this component’s arrival instant. The latter can be found by

the arrival theorem or the random observer property. Let qij(n,K) be the steady-state

probability of finding n components at station ij at arrival epoches to this station. Then,

the arrival theorem states that qij(n,K) = pij(n,K − ei) (see Breuer and Baum 2006,

page 93 for proof). Using this theorem, the expected number of components an arrival

sees at station ij is

Nij(K) =

Ki−1∑

n=0

nqij(n,K) =

Ki−1∑

n=0

npij(n,K− ei) = Nij(K− ei),

Therefore, for i = 1, . . . , m, we have

Ti1(K) = Ti3(K) =1

γi,

Ti2(K) =1

λi+

1

λiNi2(K− ei),

TiI(K) =1

µ+

1

µNI(K− ei), (2.6)

where ei is an m-dimensional vector of zeroes except for its ith element, which is 1. For

i = 1, . . . , m, the following are direct results of the Little’s formula:

θi(K) =Ki

TiI +∑3

j=1 Tij

, (2.7)

Nij(K) = θi(K)Tij(K), j = 1, 2, 3,

NiI(K) = θi(K)TiI(K). (2.8)

Starting from Nij(0) = NiI(0) = 0, Eqs. (2.6-2.8) provide a recursion to find the


mean values. The algorithm has converged for all the problems tested and presented in

Section 2.3.

However, it is possible that Ni > 1 for some i. To reflect this in the model, starting

with the expected number of components in each station and taking advantage of the

product form property, we can obtain the following sojourn time from Theorem 1 in

Reiser and Lavenberg (1980):

Ti2(K) = 1 +Ni2(K− ei) +

Ki∑

n=1

n

(1

λi(n)− 1

)pi2(n− 1,K− ei), (2.9)

where λi(n) = min(n,Ni)λi (Note that, the term τr,l denoting the service demand of a

customer in chain r at station l in Reiser and Lavenberg (1980) is one in our case). If the

repair shop is multi-server, the sojourn time in the repair shop, TiI(K), can be obtained

in the same way. Observe that

1 +Ni2(K− ei) =

Ki∑

n=1

pi2 (n− 1,K− ei) +

Ki∑

n=1

(n− 1) pi2 (n− 1,K− ei),

which is used to re-write the RHS of Eq. (2.9) to have

Ti2 (K) =

Ki∑

n=1

n

(1

λi (n)

)pi2 (n− 1,K− ei)

=

Ni−1∑

n=1

n1

nλi

pi2 (n− 1,K− ei) +

Ki∑

n=Ni

n1

Niλi

pi2 (n− 1,K− ei)

=1

Niλi

(Ni−1∑

n=1

Nipi2 (n− 1,K− ei) +

Ki∑

n=Ni

npi2 (n− 1,K− ei)

). (2.10)

Recalling

Ni2(K− ei) =

Ki∑

n=1

(n− 1) pi2 (n− 1,K− ei)

=

Ki∑

n=1

npi2 (n− 1,K− ei)−

Ki∑

n=1

pi2 (n− 1,K− ei)

=

Ki∑

n=1

npi2 (n− 1,K− ei)− 1,


and

0 = Ni2(K− ei) + 1−

Ni−1∑

n=1

npi2 (n− 1,K− ei)−

Ki∑

n=Ni

npi2 (n− 1,K− ei) ,

and adding the RHS of the equation above to the RHS of Eq. (2.10) we get

Ti2 (K) =1

Niλi

(1 +Ni2 (K− ei) +

Ni−1∑

n=1

Nipi2 (n− 1,K− ei)−

Ni−1∑

n=1


−

Ki∑

n=Ni

npi2 (n− 1,K− ei) +

Ki∑

n=Ni


),

which after cancelations and adjusting the remaining summation yields to

Ti2(K) =1

λiNi

(1 +Ni2(K− ei) +

Ni−2∑

n=0

(Ni − n− 1)pi2(n,K− ei)

). (2.11)

Starting from pi2 (0, 0) = 1 and using Lemma 1 in Reiser and Lavenberg (1980), we

obtain the following recursive relation to find the system size distribution:

pi2(n,K) =1

λi(n)θi(K)pi2(n− 1,K− ei). (2.12)

Given these formulae, the solution algorithm performs iterations. We start with

Nij(0) = NiI(0) = 0 and set K0 as a vector of 0’s. At each iteration t, we increase

an arbitrary element Kt−1i of the vector Kt−1 by 1 (i.e., we add one more component

in location i) provided that Kt−1i < Ki. Then, using Eq. (2.11) for stations i2 and

Eq. (2.6) for other stations, we obtain Tij(Kt) and TiI(K

t). Next, using Eq. (2.7) we

obtain θi(Kt) and from Eq. (2.12) we compute pi2(n,K

t). For the next iteration, we

need Nij(Kt) and NiI(K

t) found from Eq. (2.8). We stop the algorithm after it runs

iteration M =∑m

i=1Ki when KM = K = (K1, K2, . . . , Km) with Ki = Ni + Si.

In each iteration, i.e., each time Kt−1 is updated, we need pi2(0,Kt). Observe that

other pi2(n,Kt) for n = 1, . . . , Kt

i , where Kti is the size of fleet i plus the base-stock level

of inventory i at iteration t, can be found from Eq. (2.12) independent of pi2(0,Kt).

Therefore, we can use pi2(0,Kt) = 1 −

∑Kti

n=1 pi2(n,Kt). Because this approach can be


unstable, Reiser and Lavenberg (1980) recommend an alternative way to obtain pi2(0,Kt)

which may reduce the round-off error of the algorithm. At iteration t, letting N ti denote

the size of fleet i, from the Little’s formula, the mean number of idle servers at location

i isNt

i∑

n=0

(N ti − n)pi2(n,K

t) = N ti − θi(K

t).

With pi2(n,Kt) for n = 1, . . . , Kt

i and θi(Kt), from the equation above one can solve

for pi2(0,Kt).

At the final iteration M , having computed pi2(n,K), from Eq.s (2.5) and (2.2), the

cost of the RIF system is found for a given K. The optimal base-stock levels for each

inventory i is found trying different K vectors. From K∗ = (K∗1 , . . . , K

∗m), the one that

minimizes the RIF system cost, the optimal number of spares is found as S∗i = K∗

i −Ni

for each fleet i, i = 1, . . . , m.

2.3 Numerical Study

In this section, via our numerical experiments we attempt to determine when the RIF

system with pooled repair capacity is superior to the BC system in reducing system costs.

Obviously, transportation costs compared to down time and holding costs, transportation

delays compared to repair times, and how heavily repair shops need be used are critical

factors to consider.

All our examples consider three fleets, possibly with different fleet sizes and failure

rates, located at three different locations. In the BC system, each fleet has its own local

repair shop with repair rate µi = 1, i = 1, 2, 3, and the system has no transportation

delays or costs. If these three repair shops are merged into one, in a bid to incur lower

system costs because of higher pooled repair capacity, we have an RIF system. As

discussed by Yu, Benjaafar, and Gerchak (2009), we set the repair rate of the centralized

repair shop to µ =∑3

i=1 µi = 3. We assume that repair costs, i.e., cost of building repair


shops, paying the repair crew, and maintaining the repair equipment, increase linearly

with repair rate. Thus, since total repair rates are the same, repair costs are also the

same in the BC and RIF systems and can be ignored in comparisons.

Choosing the location of the centralized repair shop in the RIF system needs be

considered in a more general setting. In a network where the nodes correspond to possible

sites (including the current fleet locations) for placing the repair shop, and the edges to

distances between sites, we can consider every node as a potential location and find

the one with the minimum cost. However, we do not choose the repair shop’s optimal

location; instead, we assume that it will be located at one of the fleet locations, say

location 1. Thus, fleet 1 will not suffer from transportation delays or incur transportation

costs, whereas fleets 2 and 3, which need to send (await) their broken (fixed) components,

will suffer.

Except for one example, we set hwi = 0 in Eq. (2.3) for i = 1, 2, 3 since the holding cost

due to capital tied up in keeping stocks reflected by hi is much higher than operational

costs of warehousing reflected by hwi (Silver, Pyke, and Peterson, 1998, p. 45). In all

examples, we set hi = 1 but assume different bi values. High backordering/down time

cost to holding cost ratio is a common assumption in the literature (e.g., Benjaafar,

Cooper, and Kim, 2005, Pena Perez and Zipkin, 1997) and in industry. For instance,

while conducting a case study involving a single fleet, Louit et al. (2010) compute a

holding cost of $1.51 per unit per unit time and a down time cost of $2,173.3 per unit

per unit time. In conjunction with a unit holding cost rate, we consider 10, 50, and 100

as down time cost rates.

Since the repair shop is at location 1, we only need to vary the transportation delays

of locations 2 and 3 to location 1, 1/γ2 and 1/γ3, respectively. We assume the same

transportation cost rates for fleets/locations 2 and 3, i.e., c2 = c3. In addition to the case

with negligible costs (c2 = c3 = 0), we consider c2 = c3 > 1. In their numerical analysis,

Kutanoglu and Mahajan (2009) choose transportation cost rates less than the holding


cost rate. Noting that high transportation costs increase the RIF system cost, with our

choice of transportation cost rates that are higher than the holding cost rate, we focus

on environments where the benefit of repair shop pooling may decrease.

We present our numerical study in two sections. In Section 2.3.1, we consider identical

fleets with the same fleet size, failure rate, and down time costs. From these examples,

we observe that if the down time cost rises, the performance of the RIF system increases

when compared to the BC system, and a centralized repair shop can serve fleets at

greater distances, even when transportation cost rates are much higher than the holding

cost rate. If machines become more unreliable, the utilization of the repair shop increases,

and the performance of the RIF system compared to the BC system tends to increase.

We observe that if failure rate is the same, the relative performance of the RIF system

over the BC system improves with larger fleet sizes. Section 2.3.2 presents examples with

non-identical/heterogeneous fleets. Although they use the same spare part, we see that

the fleets at different locations not only have different fleet sizes, but differ in their failure

rates and down time costs. We observe that the RIF system can be more cost-effective

even when the centralized repair shop serves heterogenous fleet at long distances. In this

section, we discuss the possibility of partial pooling of the repair shop to serve two fleets

while leaving the other fleet with its own repair shop independent of the other two.

2.3.1 Examples with Identical Fleets

We start with three identical fleets with Ni = 10, λi = 0.08, and bi = 10, i = 1, 2, 3.

Using Eq. (2.1), the optimal number of spares and costs for the BC system are S∗i = 5

and C∗i = 8.54, respectively, for fleets i = 1, 2, 3, and C∗

BC = 25.62. Thus, a total of 15

spare parts are carried for a total of 30 machines in three fleets . We mark C∗BC and the

total number of spares in the BC system by horizontal dashed lines in Figures 2.3 and

2.4, respectively.

We refer to the RIF system with no transportation delays and costs as the base RIF


(BRIF) system and denote its optimal cost by C∗BRIF . This is the best performance

the RIF system can exhibit; with non-negligible transportation delays and/or costs, its

performance will deteriorate. In this numerical example, C∗BRIF = 12.73 with S∗

i = 2 at

each location. Thus, compared to the BC system the total number of spares in the BRIF

system decreases to 6 from 15. We mark C∗BRIF and the total number of spares in the

RIF system by solid horizontal lines in Figures 2.3 and 2.4, respectively.

In Figures 2.3 and 2.4, we increase the mean transportation delays between locations

2 and 3 and the repair shop at location 1 equally (1/γ2 = 1/γ3 = 1/γ) as a percentage

of the mean repair time (µ/γ) shown on the x-axis. We consider c2 = c3 = 0 for the

RIF system with non-negligible transportation delays but negligible transportation costs.

Then we consider c2 = c3 = 2 and c2 = c3 = 3, which are 20% and 30% of the down time

costs. Using the model and its solution algorithm in Sections 2.1 and 2.2, respectively,

we find S∗i , i = 1, 2, 3 and the optimal RIF system cost C∗

RIF . With increasing mean

transportation delays and transportation cost rates, as we expect, C∗RIF increases from

C∗BRIF = 12.73 and eventually surpasses C∗

BC = 25.62 as seen in Figure 2.3. Note that

when c2 = c3 = 3, the RIF system incurs C∗RIF = 25.4 when µ/γ = 225%. In other

words, if the mean repair time is 1 hour, even if it takes 2.25 hours to bring (send) a

broken (fixed) component from (to) location 2 or 3 to (from) the centralized repair shop

at location 1, the RIF system still incurs less cost than the BC system.

We expect to see that the total optimal number of spares in the RIF system increases

with higher mean transportation delays and costs. In Figure 2.4, most of the time, the

total number of spares is the same for the three transportation cost rates considered and

there is a trend of increase with longer mean transportation delays. However, at certain

points, the increase in mean transportation delay results in a decreased total number of

spares. For instance, at µ/γ = 25%, the total number of spares is 9 (3 at each location)

but at µ/γ = 50% it decreases to 8 (2 at location 1 and 3 at the other two locations).

This is due to lower repair shop utilization at µ/γ = 50% than at µ/γ = 25%. The repair


0 200 400 600 800 10000

10

20

30

40

µ/γ(%)

Opt

imal

RIF

sys

tem

cos

t

c2=c

3=0 c

2=c

3=2 c

2=c

3=3

C*BC

C*BRIF

Figure 2.3: The impact of transportation

delays and costs on C∗RIF when Ni = 10,

bi = 10

0 200 400 600 800 10000

5

10

15

20

25

µ/γ(%)

Opt

imal

Num

ber

of S

pare

s

c2=c

3=0 c

2=c

3=2 c

2=c

3=3

S*BRIF

S*BC

Figure 2.4: The impact of transportation de-

lays and costs on∑3

i=1 S∗i in the RIF system

when Ni = 10, bi = 10

shop utilization in the RIF system is

ρ =

∑mi=1 θi(K)

µ, (2.13)

where θi(K) is the throughput for network i (as defined in Section 2.2); this is also the

arrival rate of broken components at the repair shop from fleet i. In this case, the increase

in transportation delay does not increase the optimal base-stock levels S∗i but decreases

θi(K) for fleets i = 2, 3, and the utilization in Eq. (2.13) goes down. Consequently, by

hosting the repair shop, location 1 can carry one less spare part in its inventory, bringing

the total number of spares down to 8.

In Figure 2.4, we circle points with a dashed ellipse when the total number of spares

differs for different c2 = c3 values. Consider the first ellipse on the left of the figure: At

µ/γ = 150%, the RIF system carries a total of 11 spares when c2 = c3 = 0 and carries a

total of 8 spares when c2 = c3 = 2 or c2 = c3 = 3. When transportation cost is higher

(2 or 3 instead of 0), the system resists increasing the base-stock levels -even though the

transportation delay increases- to avoid incurring more holding costs.

The second ellipse on the left at µ/γ = 375% points to a reduction to 12 spare parts

from 13 at µ/γ = 350% when c2 = c3 = 0. Again, although the transportation delays

increase by 25%, the base-stock levels at locations 2 and 3 do not change; however, the


component arrival rates from these locations decrease and, thus, repair shop utilization

is reduced. This enables the RIF system to decrease the base-stock level at location 1 by

1, yielding 12 as the total number of spares.

The third ellipse at µ/γ = 450% gives the lowest total number of spares for c2 =

c3 = 3. Here the RIF system decreases base-stock levels by 1 at each location compared

to base-stock levels when c2 = c3 = 0 and c2 = c3 = 2. The ellipse at µ/γ = 925%,

meanwhile, shows a case when the total number of spares is minimum at c2 = c3 = 2.

We cannot predict these behaviors in advance. All we can say is that the RIF system

can sometimes lower the total number of spares, even when the mean transportation

delay increases, as a way to keep the system costs lower. An additional observation is

that the RIF system can carry the same number of spares as the BC system (or even

more) but still be more cost-effective. For instance, when c2 = c3 = 2 at µ/γ = 450%,

C∗RIF = 25.57, which is less than C∗

BC while carrying the same number of spares as in the

BC system. When c2 = c3 = 0 at µ/γ = 900%, C∗RIF = 25.32, which is less than C∗

BC

but carries 19 spare parts, i.e., 4 more spares than the BC system.

We now revisit the example, this time assuming that hwi = 0.33 as 1/3 of hi = 1

in accordance with the cost ratios estimated by Waters (2003, p. 257). In this case,

C∗BC = 27.72 and the total number of spares is 12 (marked by dashed horizontal lines

in Figures 2.5 and 2.6, respectively). When transportation costs and delays are ignored,

i.e., in the setting where the RIF system can perform the best, we find C∗BRIF = 13.91

and the total number of spares as 6 (marked by solid horizontal lines in figures). As seen

in Figures 2.5 and 2.6, when mean transportation delays increase, C∗RIF and the total

number of spares increase as well. The RIF system carries the same total number of

spares most of the time for the three c2 = c3 values considered, except that when c2 =

c3 = 0, the total inventory increases sooner than in those cases with higher transportation

costs. Compared to the case with hwi = 0, the RIF system carries fewer spare parts, as

seen in Figure 2.6, due to higher holding costs (capital tied up plus warehousing costs).


0 200 400 600 800 10000

10

20

30

40

µ/γ(%)

Opt

imal

RIF

sys

tem

cos

t

c2=c

3=0 c

2=c

3=2 c

2=c

3=3

C*BC

C*BRIF



bi = 10, and hwi = 0.33

0 200 400 600 800 10000

5

10

15

20

25

µ/γ(%)

Opt

imal

Num

ber

of S

pare

s

c2=c

3=0 c

2=c

3=2 c

2=c

3=3

S*BC

S*BRIF




when Ni = 10, bi = 10, and hwi = 0.33

Additionally, when c2 = c3 = 2 (c2 = c3 = 3), the RIF system is better than the BC

system at µ/γ = 475% (µ/γ = 250%), the distances at which the BC system was better

in Figure 2.3, where hwi = 0. Thus, incurring higher holding costs can make the RIF

system more cost-effective, even when the centralized repair shop serves fleets located at

greater distances.

To see the impact of the down time cost, after setting hwi = 0 (as is the case in the

remainder of the section), we increased bi to 100. In this case, C∗BC = 54.71 and the total

number of spares is 42 (marked by dashed lines in Figures 2.7 and 2.8, respectively).

When transportation costs and delays are ignored, C∗BRIF = 25.23 and the total number

of spares is 21 (marked by solid horizontal lines in these figures). With higher down

time costs, we see that at higher transportation cost rates, such as c2 = c3 = 8, the RIF

system at µ/γ = 250% with C∗RIF = 54.42 is still better than the BC system. In Figure

2.8, the total number of spares carried by the RIF system is invariant in c2 = c3 and is

non-decreasing in mean transportation time.

Next, we consider there identical but smaller fleets with Ni = 5, consisting of less

reliable machines with λi = 0.16, and bi = 10, i = 1, 2, 3. Using Eq. (2.1), we obtain

C∗BC = 20.89 and a total of 8 spare parts would be carried by the BC system. When


0 200 400 600 800 10000

10

20

30

40

50

60

70

µ/γ(%)

Opt

imal

RIF

sys

tem

cos

t

c2=c

3=0 c

2=c

3=6 c

2=c

3=8

C*BC

C*BRIF



bi = 100

0 200 400 600 800 10000

10

20

30

40

µ/γ(%)

Opt

imal

Num

ber

of S

pare

s

c2=c

3=0 c

2=c

3=6 c

2=c

3=8

S*BC

S*BRIF




when Ni = 10, bi = 100

transportation costs and delays are ignored, C∗BRIF = 10.99 and the total number of

spares is 6. For Figure 2.9, unlike the case shown in Figure 2.3, the distances at which the

RIF system loses its superiority shorten. For instance, when c2 = c3 = 3 at µ/γ = 50%,

the RIF system is more costly than the BC system, whereas for the case summarized in

Figure 2.3, the RIF system sustains its superiority until µ/γ = 250%. Figure 2.10 shows

that longer mean transportation times usually increase the total number of spares; this

is the same for all c2 = c3 values except at a couple of points, such as at µ/γ = 925%,

when the case with c2 = c3 = 3 carries fewer items as spare parts.

Although not presented here, we have three more examples involving three fleets with

a) Ni = 5, λi = 0.16, and bi = 100, b) Ni = 15, λi = 0.054, and bi = 10, and c) Ni = 15,

λi = 0.054, and bi = 100. These experiments also show that the RIF system can serve

farther-flung fleets at higher transportation costs if down time costs rise.

However, in this set of experiments, the failure rates change only when we consider

different fleet sizes. To capture the impact of machine reliability on the relative perfor-

mances of the RIF and the BC systems, we conduct another set of experiments. This

time, fixing Ni, i = 1, 2, 3, transportation and down time cost rates as well, we vary the

machine failure rates λi in each example. Higher failure rates correspond to more unreli-


0 200 400 600 800 10000

10

20

30

40

µ/γ(%)

Opt

imal

RIF

sys

tem

cos

t

c2=c

3=0 c

2=c

3=2 c

2=c

3=3

C*BC

C*BRIF


lays and costs on C∗RIF when Ni = 5, bi = 10

0 200 400 600 800 10000

5

10

15

20

25

µ/γ(%)

Opt

imal

Num

ber

of S

pare

s

c2=c

3=0 c

2=c

3=2 c

2=c

3=3

S*BC

S*BRIF


delays and costs on∑3

i=1 S∗i in the RIF sys-

tem when Ni = 5, bi = 10

able machines; although not shown in Eq. (2.13), in all examples, repair shop utilization

ρ increases with higher λi. In other words, higher ρ implies that the fleets consist of more

unreliable machines. In Figures 2.11-2.16, the x-axis shows ρ (in %). The y-axis displays

∆ =C∗

BC − C∗RIF

C∗BC

%,

namely, the percentage cost decrease of the RIF system over the BC system. In each

example, we assume that 1/γ2 = 1/γ3 = 1/γ and consider µ/γ = 0%, 200%, 500%, moving

from negligible transportation delays to cases in which mean transportation delays for

fleets 2 and 3 are 500% of the mean repair time in the RIF system. In all these figures,

positive ∆ values indicate when the RIF system is better, and negative values indicate

when the BC system is superior. We see that as ρ increases or, equivalently, as the

reliability of machines decreases, the relative performance of the RIF system tends to

increase, with ∆ sometimes getting slightly smaller with an increase in ρ and usually

when ρ = 100. More interestingly, except for the case in Figure 2.11 when µ/γ = 500%,

∆ eventually becomes positive at ρ value; and it does not become negative or approach

0 as the utilization increases. This observation differs from what Benjaafar, Cooper,

and Kim (2005) show when they compare the benefit of production capacity pooling in

systems with constant demand rates for each class of customers (Scenarios c and c′). In


their study, as ρ increases, the ratio of the cost of the system with separate production

facilities to the cost of the system with a pooled production facility increases, and the

absolute cost difference is unbounded in ρ. This is not the case in our problem. In the

system Benjaafar, Cooper, and Kim (2005) analyze, as ρ approaches 100%, system load,

base-stock levels, and system costs in two alternative systems, explode. Thus, the more

efficient use of capacity is critical. On the other hand, due to finite calling populations,

in our problem, base-stock levels remain at finite values.

If we examine the results more in detail, in Figure 2.11, for the case with Ni = 5,

bi = 10, i = 1, 2, 3, when µ/γ = 500%, the RIF system never beats the BC system.

If down time costs increase to bi = 100 as in Figure 2.12, however, the RIF system

starts performing better at ρ = 0.52, even when µ/γ = 500%. As we expect, higher

transportation costs will undermine the performance of the RIF system. For instance, if

c2 = c3 = 8 while bi = 100, from Figure 2.13, we see that only at very high utilization

levels or systems in which the machines are very unreliable, can the RIF system be

preferred.

In Figures 2.14 - 2.16, we present the results of the problems assuming the same costs

as in Figures 2.11 - 2.13, but this time considering Ni = 10, i = 1, 2, 3. The observations

made based on the examples with Ni = 5 can be repeated for these examples as well. If

we compare the parallel examples in which all costs and transportation delays are the

same but the fleet sizes are different -such as Figures 2.11 and 2.14- at the same ρ value,

the relative performance of the RIF system is higher when Ni = 10 than when Ni = 5.

Note that same ρ value in parallel examples does not mean that the failure rate when

Ni = 10 is the same as the failure rate when Ni = 5. In fact, when fleet size is bigger,

smaller failure rates increase the server utilization rapidly, since there are usually more

machines that can fail at any time (recall state-dependent failure rates). Therefore, in

parallel figures, the same ρ implies more reliable machines when Ni = 10.

Finally, we compare the problems with same costs and transportation delays and


same failure rates but different fleet sizes. The comparison indicates that if failure rate

is the same but the fleet size gets bigger, the relative performance of the RIF system

improves.

2.3.2 Examples with Heterogenous Fleets

In this section, we consider that N1 = 5 at location 1, which also hosts the centralized

repair shop in the RIF system, whereas N2 = 10 and N3 = 15. We consider three different

failure rates for each fleet. Accordingly λ1 ∈ {0.14, 0.16, 0.18}, λ2 ∈ {0.07, 0.08, 0.09},

and λ3 ∈ {0.047, 0.053, 0.06}. We assume that the down time cost rates are not equal and

choose bi ∈ {10, 50, 100}. In Table 3.1, columns 2 to 7, we list the failure and down time

cost rates. Column 8 lists the optimal cost of the BC system, C∗BC . When transportation

delays and costs are negligible, the RIF system displays the best possible performance

(the BRIF system) and its cost C∗BRIF appears in column 9. When C∗

BC and C∗BRIF are

compared, the minimum cost decrease of the BRIF system over the BC system is 51.65%,

and the maximum is 64.2%.

We assume c2 = c3 = 2 and vary (µ/γ2, µ/γ3) in steps of 100%. Obviously, it is not

feasible to find a surface of the maximum (µ/γ2, µ/γ3) values outside of which C∗RIF will

exceed C∗BC for each of these 36 problems. However, we chose (µ/γ2, µ/γ3) as listed in

column 11 at which C∗RIF listed in column 10 is still less than C∗

BC . From Table 3.1, we

see that in the worst case scenario, the transportation delays between fleets 2 and 3 could

be 900% of the mean repair time of the centralized repair shop, yet, the RIF system still

costs less.

As we know from Section 2.3.1, higher c2 = c3 will eventually make the RIF system

perform worse than the BC system. For instance when c2 = c3 = 5, in 18 out of 36

cases, the RIF system is more costly at µ/γ2 = µ/γ3 = 100% than the BC system.

However, even in these cases, partial pooling of the repair shop is more cost effective.

Partial inventory pooling, in the form of keeping inventories of common components used


40% 50% 60% 70% 80% 90% 100%−50%

−40%

−30%

−20%

−10%

0%

10%

20%

ρ

Cos

t dec

reas

e (%

)

µ/γ = 0%

µ/γ = 200%

µ/γ = 500%

Figure 2.11: The impact of ρ on ∆ when

c2 = c3 = 2, Ni = 5, bi = 10, i = 1, 2, 3

40% 50% 60% 70% 80% 90% 100%−20%

−10%

0%

10%

20%

30%

40%

ρ

Cos

t dec

reas

e (%

)

µ/γ = 0%

µ/γ = 200%

µ/γ = 500%


c2 = c3 = 2, Ni = 5, bi = 100, i = 1, 2, 3

40% 50% 60% 70% 80% 90% 100%−80%

−60%

−40%

−20%

0%

20%

ρ

Cos

t dec

reas

e (%

)

µ/γ = 0%

µ/γ = 200%

µ/γ = 500%


c2 = c3 = 8, Ni = 5, bi = 100, i = 1, 2, 3

40% 50% 60% 70% 80% 90% 100%−50%

−40%

−30%

−20%

−10%

0%

10%

20%

30%

ρ

Cos

t dec

reas

e (%

)

µ/γ = 0%

µ/γ = 200%

µ/γ = 500%


c2 = c3 = 2, Ni = 10, bi = 10, i = 1, 2, 3

40% 50% 60% 70% 80% 90% 100%−10%

0%

10%

20%

30%

40%

50%

ρ

Cos

t dec

reas

e (%

)

µ/γ = 0%

µ/γ = 200%

µ/γ = 500%


c2 = c3 = 2, Ni = 10, bi = 100, i = 1, 2, 3

40% 50% 60% 70% 80% 90% 100%−80%

−60%

−40%

−20%

0%

20%

40%

ρ

Cos

t dec

reas

e (%

)

µ/γ = 0%

µ/γ = 200%

µ/γ = 500%


c2 = c3 = 8, Ni = 10, bi = 100, i = 1, 2, 3


by different end-products (Baker, Magazine, and Nuttle, 1986) or considering product

substitution (Yang and Schrage, 2009), is considered in the inventory literature. Since

in both the BC and RIF system, we have separate inventories at each location, partial

pooling of the repair shop is a possibility. According to this scheme, in 18 cases where

the RIF system fails to decrease costs due to high transportation costs, we consider

having a repair shop at location 1 with rate µ = 2 to serve fleets 1 and 2 and having

a repair shop with rate µ3 = 1 at location 3 solely dedicated to fleet 3. By increasing

µ/γ2 in steps of 10%, and by employing RIF system analysis in Sections 2.1 and 2.2,

we computed the the cost due to fleets 1 and 2, and the cost due to fleet 3 was found

from Eq. (2.3). Their summation gives the total system cost under partial pooling. In

15 out of 18 problems, partial repair shop pooling scheme results in less cost than the

BC system while 690% ≥ µ/γ2 ≥ 100%. For instance in problem 25, at µ/γ2 = 100%,

the cost of the system with partial pooling is 38.77 and less than C∗BC = 38.797. Or in

problem 35, at µ/γ2 = 690%, the cost of the system with partial pooling is 44.10 and

less than C∗BC = 44.12. However, in problems 9, 15, and 21, even the system with partial

pooling of the repair shop for fleets 1 and 2, cannot cost less than the BC system.


Table 2.1: The comparison of the RIF and the BC systems when the fleets are heteroge-

nous.

No λ1 λ2 λ3 b1 b2 b3 C∗

BCC∗

BRIFC∗

RIF(µ/γ2 , µ/γ3)

1 0.14 0.08 0.06 10 50 100 52.628 18.839 52.387 (1700,1800)

2 0.14 0.08 0.06 10 100 50 49.547 18.804 49.395 (1600,1500)

3 0.14 0.08 0.06 50 10 100 50.224 18.981 50.219 (1600,1700)

4 0.14 0.08 0.06 50 100 10 41.293 18.882 40.810 (1000,1000)

5 0.14 0.08 0.06 100 10 50 46.020 18.933 45.796 (1300,1400)

6 0.14 0.08 0.06 100 50 10 40.171 18.878 39.977 (1000,900)

7 0.14 0.09 0.053 10 50 100 48.003 18.746 47.895 (1500,1400)

8 0.14 0.09 0.053 10 100 50 50.879 18.889 50.874 (1600,1700)

9 0.14 0.09 0.053 50 10 100 40.550 18.493 40.466 (1000,1000)

10 0.14 0.09 0.053 50 100 10 48.320 19.323 47.718 (1500,1400)

11 0.14 0.09 0.053 100 10 50 39.405 18.511 38.885 (900,900)

12 0.14 0.09 0.053 100 50 10 44.299 19.253 43.803 (1200,1200)

13 0.16 0.07 0.06 10 50 100 49.678 18.332 49.589 (1700,1700)

14 0.16 0.07 0.06 10 100 50 45.490 18.213 45.253 (1400,1400)

15 0.16 0.07 0.06 50 10 100 51.556 19.243 51.524 (1800,1900)

16 0.16 0.07 0.06 50 100 10 39.226 18.663 38.739 (1000,900)

17 0.16 0.07 0.06 100 10 50 48.407 19.285 47.965 (1600,1600)

18 0.16 0.07 0.06 100 50 10 40.265 18.859 39.513 (1000,1000)

19 0.16 0.09 0.047 10 50 100 43.418 18.098 43.109 (1300,1200)

20 0.16 0.09 0.047 10 100 50 47.411 18.302 47.408 (1500,1600)

21 0.16 0.09 0.047 50 10 100 37.954 18.349 37.673 (900,900)

22 0.16 0.09 0.047 50 100 10 49.212 19.473 48.905 (1600,1700)

23 0.16 0.09 0.047 100 10 50 38.981 18.535 38.559 (1000,900)

24 0.16 0.09 0.047 100 50 10 46.246 19.493 46.072 (1400,1500)

25 0.18 0.07 0.053 10 50 100 38.797 17.606 38.011 (1000,1000)

26 0.18 0.07 0.053 10 100 50 37.668 17.510 37.307 (900,1000)

27 0.18 0.07 0.053 50 10 100 44.521 18.992 44.482 (1500,1400)

28 0.18 0.07 0.053 50 100 10 40.944 18.855 40.500 (1200,1100)

29 0.18 0.07 0.053 100 10 50 46.931 19.323 46.522 (1600,1600)

30 0.18 0.07 0.053 100 50 10 44.483 19.285 43.954 (1400,1400)

31 0.18 0.08 0.047 10 50 100 37.162 17.499 36.553 (900,900)

32 0.18 0.08 0.047 10 100 50 38.257 17.602 37.968 (1000,1000)

33 0.18 0.08 0.047 50 10 100 40.593 18.524 40.124 (1200,1100)

34 0.18 0.08 0.047 50 100 10 43.903 19.224 43.505 (1400,1300)

35 0.18 0.08 0.047 100 10 50 44.120 18.897 43.587 (1400,1400)

36 0.18 0.08 0.047 100 50 10 46.336 19.532 45.439 (1500,1500)

Chapter 3

Spare Parts in k-out-of-n:G Systems

In this chapter, we consider a centralized repair shop that serves m systems parame-

terized by i = 1, ..., m. Each system i is a ki-out-of-ni:G system comprised of identical

components and is available if ki or more components out of ni are functional. Although

components fail from time to time, they are repairable. Additionally, spare components

are kept to increase the proportion of times these systems are available. Times to failure,

that is the periods between installation of a new or repaired component in system i and

the next failure instant of this component, follow an exponential distribution with rate λi

(implying that each repair makes the component as good as new, and the failure rate only

depends on the system using it). Different failure rates can be due the type of service a

system renders or specific operating conditions they are subject to. When a component

fails, it is sent to a repair shop, which is modeled as an FCFS single server queue where

the repair times are independent and identically distributed (i.i.d.) exponential random

variables (r.v.s) with rate µ. If there is a stock of critical components kept as spare

parts, a spare component can be installed immediately to replace the failed component.

Otherwise, the number of functional components in that system decreases by 1. When

only ki components are functional there is no spare available, and if one more component

fails, system i fails and is down until a repaired component can be dispatched from the

42

Chapter 3. Spare Parts in k-out-of-n:G Systems 43

Figure 3.1: The hybrid model with both shared and reserved inventories

repair shop (during such down times, the remaining ki − 1 components do not fail).

In other words, keeping spare part inventories might help increase system availability

at the expense of incurring inventory holding cost. Separate inventories for each system

can be reserved; or due to the same component used, a shared inventory can serve all

systems. In this chapter, we model a mixture of the two, with both shared and reserved

inventories calling it the hybrid model as given in Figure 3.1. In the hybrid model,

by setting shared or reserved inventory levels to zero, one can create a system of only

reserved inventories or of only a shared inventory, respectively. Therefore, the optimal

cost of the hybrid model cannot be strictly higher than the optimal cost of having solely

reserved inventories or only a shared inventory.

The problem also involves a component allocation problem. In this chapter, we study

the FCFS and the priority policies which result in two alternative policies, the hybrid

FCFS (HF) policy and the hybrid priority (HP) policy. In both cases, in addition to a

reserved inventory for each system, there is a shared inventory for all systems, and each

inventory operates according to a base-stock policy. First spare parts from the shared

inventory are expended, and only when they are depleted, are the reserved inventories

used. The dispatching decision comes into play when the shared inventory is empty, and

some reserved inventories are below their base-stock levels, or some systems do not have


all of their components functional. When this is the case, the repair shop has pending

repair orders from systems missing functional components or spares in their reserved

inventories. We study the HF policy in Section 3.1. In this policy, the repair shop

dispatches the repaired components in an FCFS manner among systems with pending

orders. Under the HP policy, studied in Section 3.2, the repaired component is used to

serve the highest priority system among those with pending orders. In Section 3.3, we

present numerical results comparing their relative performances of these policies.

3.1 The Hybrid FCFS (HF) Policy

In this section, we analyze the model in which a shared inventory of S > 0 spare parts is

kept for all systems in addition to (emergency) reserved inventory of Si ≥ 0 spare parts

for each system i, i = 1, ..., m. When a component fails, it is sent to the repair shop.

If there is positive stock in the shared inventory, a spare part is installed. If the shared

inventory happens to be empty but the reserved inventory level is positive, a spare part

from the latter is used. Otherwise, system i lacks one more component until a repaired

one can be sent from the repair shop. When ki−1 functional components remain, system

i fails and no more component failures can be observed until a repaired component can

be sent from the repair shop on a FCFS basis.

Let O(t) be the number of components in the repair shop at time t. If O(t) ≤ S,

the shared inventory level is I(t) = S − O(t) spare parts. All reserved inventories are at

their respective base-stock levels Si, and ni components are functional in each system.

Therefore, whenever a component is repaired, it is placed in the shared inventory, raising

its level by 1.

We assume w.l.o.g that O(0) = 0. Letting ςD0 = 0, we define the following stopping


Figure 3.2: A sample path of the HF model

times,

ςUm = inf{t : O(t) = S|t > ςDm−1

},

ςDm = inf{O(t) = S − 1|t > ςUm

}. (3.1)

In other words, ςUm is a failure instant (equivalently, an arrival instant) of a component

(at the repair shop) when the shared inventory level decreases from 1 to 0, and ςDm is a

repair completion instant when the shared inventory level increases from 0 to 1 for the

mth time since time 0. Thus, D =⋃∞

m=1[ςUm, ς

Dm) is the time period during which each

additional component to fail in system i generates a type i repair order (See Figure 3.2

for a realization of the sample path). Let Oi(t) be the number of type i orders at time

t. If Oi(t) ≤ Si, the reserved inventory level for system i is Ii(t) = Si − Oi(t), all ni

components are operational, and if Si < Oi(t) ≤ ni+Si−ki+1, system i lacks Oi(t)−Si

components. When a repair is done, the component is sent to the fleet with the longest

standing order. Let p(k) := P (O = k) (piHF (k) := P (Oi = k)) be the steady-state

probability of having k components (k type i orders) in the repair shop, and pD be the

proportion of time the HF model is in D.

Consider a separate model, the reserved inventory-FCFS (RIF) model, with exactly

the same parameters (e.g., systems served, failure and repair rates, base-stock levels of


the reserved inventories) but no shared inventory (S = 0). Obviously the HF model

during D is probabilistically identical to the RIF model. If we can obtain the steady-

state probability of having k type i orders in the repair shop in the RIF model, denoted

by pi(k), then, the steady-state probability of having k type i orders in the HF model is

simply

piHF (k) = pDpi(k). (3.2)

Therefore, the analysis of the RIF model is necessary for the HF model. In the next

section, we derive the steady-state distribution of the number of orders of each type in

the repair shop of the RIF model. Then we obtain pD (and p(k)) in the HF model in

Section 3.1.2. With them, Eq. (3.2) gives piHF (k), which together with p(k), provides

the steady-state distribution of the number of repair orders in the HF system.

3.1.1 Obtaining pi(k) in the RIF model

In order to obtain pi(k), we start by characterizing the state of the RIF model at an

arbitrary time by the vector (x1, x2, . . . , xn) stating that there are a total of n repair

orders where xj = i means that the jth order from the end of the repair queue is type

i, i.e., originating from system i. Hence, xn and x1 are the first and last repair orders

in the system, respectively, xn being the one under repair. Whenever it is not necessary

to present the entries of the state vector, we will use ωn as a shorthand notation for

(x1, x2, . . . , xn).

We define Yi(ωn) =∑n

j=1 I(xj = i) as the number of type i orders, where I(E) is

the indicator function which equals 1 if event E is true and 0 otherwise. Accordingly,

Yi(ωn) ∈ {0, 1, . . . , ni + Si − ki + 1}. Observe that the failure rate from fleet i depends

only on Yi(ωn) (equivalently, the number of components functional in system i). We

define this failure rate Λi(ωn) as


Λi(ωn) =

niλi, if 0 ≤ Yi(ωn) ≤ Si,

(ni + Si − Yi(ωn))λi, if Si < Yi(ωn) ≤ ni + Si − ki + 1,

0, otherwise

= (ni + Si −max{Yi(ωn), Si})λi. (3.3)

Adding the failure rates from all systems, Λ(ωn) =∑m

i=1 Λi(ωn), gives us the state-

dependent arrival rate of repair orders at the single server queue.

Let pn(ωn) be the steady-state probability of being in state ωn. Next, we relate

probabilities of interest to one another. Let p0(0) denote the probability that the repair

shop is idle and ni components are running and Si spare parts are available in the

inventory for each system i. We can write the global balance equations:

−p0(0)Λ + [p1(1) + · · ·+ p1(m)]µ = 0, (3.4)

where Λ =∑m

i=1 niλi and for a feasible x1 ∈ {1, . . . , m}, and setting N =∑m

i=1(ni+Si−

ki + 1) (the maximum number of components that can be in the repair shop),

pn−1(x2, . . . , xn)Λx1(x2, . . . , xn)− pn(x1, x2, . . . , xn)[µ+ Λ(x1, x2, . . . , xn)]

+[pn+1(x1, x2, . . . , xn, 1) + · · ·+ pn+1(x1, x2, . . . , xn, m)]µ = 0,

1 < n ≤ N − 1, (3.5)

and finally,

pN−1

(x2, . . . , xN )Λx1(x2, . . . , xN )− pN(x1, x2, . . . , xN )[µ+ Λ(x1, x2, . . . , xN)] = 0. (3.6)

Note that some states might be infeasible, in which case, the corresponding probabilities

in Eqs. (3.5-3.6) are zero.

We assume that pN(ωN) = p

N(ω

′

N) = pN > 0 for any ωN and ω′

N. We obtain

the limiting probabilities expressing them in terms of pN and using a normalization


constraint. Since the underlying Markov chain is irreducible and has a finite number of

states, it is necessarily positive recurrent. Therefore, the limiting distribution is unique

(Bhat and Miller, 2002, p. 222) implying that our solution is the only possible one.

To obtain the limiting probabilities, we start from Eq. (3.6) where pN(x1, x2, . . . , xN ) =

pN . Note that Λ(x1, . . . , xN ) = 0 because in this state, all systems are down (each sys-

tem i has only ki − 1 components functional), and no more components can fail. For

x1 = i ∈ {1, . . . , m}, in the state (x2, . . . , xN) in Eq. (3.6), system i is the only available

system; it fails if one more component in it fails with rate Λx1(x2, . . . , xN) = kiλi. This

simply gives pN−1

(x2, . . . , xN ) = µPN/(kiλi). Furthermore, if we fix x1 = i for some i, x1

with each ordering of x2, . . . , xN is a different ωN, and the probability of being in any one

of these states, according to our assumption, is the same, pN . For each such ωN ordering,

dropping x1 = i yields the ωN−1 that satisfies Eq. (3.6), and pN−1

(ωN−1) = µPN/(kiλi).

Thus, ωN−1 is a generic state with ki (kj − 1) functional components in system i (each

system j 6= i), and the probability of being in that state does not depend on the sequence

of repair orders but on system i being the only available system.

Next, after rearranging Eq. (3.5) and substituting N − 1 for n, we obtain

pN−2(x2, . . . , xN−1)Λx1(x2, . . . , xN−1) = µpN−1(x1, . . . , xN−1)

+pN−1(x1, . . . , xN−1)Λ(x1, . . . , xN−1)

−µpN (x1, . . . , xN−1, xN). (3.7)

Observe that in state (x1, . . . , xN−1), there is only one system (say system i) avail-

able which can fail. This forces Λ(x1, . . . , xN−1) = kiλi and pN(x1, . . . , xN−1, xN = i).

Given our assumption, we have pN (x1, . . . , xN−1, i) = pN (i, x1, . . . , xN−1). In the ear-

lier discussion on making use of Eq. (3.6), we showed that kiλipN−1(x1, . . . , xN−1) =

µpN(i, x1, . . . , xN−1), which implies kiλipN−1(x1, . . . , xN−1) = µpN(x1, . . . , xN−1, i). This

helps the last two terms on the RHS of Eq. (3.7) to cancel out. As previously, if we fix

x1 = j, x1 with each ordering of x2, . . . , xN−1 is a different ωN−1, and the probability of


being in any one of these states has been shown to be the same, µpN/(kiλi). For each

such ordering ωN−1, dropping x1 = j gives the ωN−2 satisfying Eq. (3.7). If i 6= j, ωN−2

implies two available systems i and j with ki and kj functional components, respectively,

and if i = j, this implies only one available system i with ki+1 functional components. In

either case, Eq. (3.7) is satisfied as pN−2(ωN−2)Λj(ωN−2) = µpN−1(ωN−1) = µ2pN/(kiλi).

This again brings us to the conclusion that pN−2(ωN−2) (just like Λj(ωN−2)) is indepen-

dent of how different types of repair orders are sequenced; it depends only on how many

repair orders for each type exist (equivalently, how many functional components are

available for which systems).

For the general case (1 ≤ n ≤ N −3), we are going to prove by induction that pn(ωn)

depends only on the number of repair orders from each system. If we assume this to hold

for pn+2(ωn+2) and pn+1(ωn+1), then pn+2(x1, x2, . . . , xn+1, i) = pn+2(i, x1, x2, . . . , xn+1)

and pn+1(x1, x2, . . . , xn+1)Λi(x1, . . . , xn+1)= µpn+2(i, x1, x2, . . . , xn+1). Summing up, over

all i, we have

pn+1(x1, x2, . . . , xn+1)Λ(x1, . . . , xn+1) = µ

m∑

i=1

pn+2(x1, x2, . . . , xn+1, i).

Using this equality above in Eq. (3.5) results in cancelations, and we arrive at

pn(x2, . . . , xn+1)Λx1(x2, . . . , xn+1) = µpn+1(x1, . . . , xn+1).

In other words, given that the state ωn has one less failed component of type x1 than

the state ωn+1, we have established pn(ωn)Λx1(ωn) = µpn+1(ωn+1). Thus, pn(ωn) also

depends only on the number of failed components from each system, not on how they

are ordered in the repair shop queue.

Let us consider a state ωjn with n repair orders with y1 of type 1, y2 of type 2,..., and ym

of type m at the repair shop, i.e., y1 = Y1(ωjn), . . . , ym = Ym(ω

jn) and y1+ y2+ · · ·+ ym =

n. We use a superscript j because there are K such states differing from one another

due to the ordering of repair orders, and pn(ωjn) = q(y1, . . . , ym) is the same for all j,


j = 1, . . . , K, where

K =

(n

y1, . . . , ym

).

Recall that Λi(ωjn) only depends on yi, making Λ(ωj

n) =∑m

i=1 Λi(ωjn) dependant only

on (y1, . . . , ym). Then in the remainder of the discussion, we can make use of Λi(yi), the

failure rate of system i when there are yi type i orders in the repair shop. Our discussion

so far has shown that q(y1, . . . , yi, . . . , ym)Λi(yi) = µq(y1, . . . , yi + 1, . . . , ym). Summing

up over all i, we obtain

q(y1, . . . , ym) =µ

Λ(y1, . . . , ym)(q(y1 + 1, y2, . . . , ym) + q(y1, y2 + 1, . . . , ym)

+ · · ·+ q(y1, y2, . . . , ym + 1)), (3.8)

where due to Eq. (3.3), Λ(y1, . . . , ym) =∑m

i=1 Λi(yi) =∑m

i=1(Ni +Si −max{yi, Si})λi as

the total failure rate given (y1, . . . , ym). However, we are interested in the probability of

being in any one of these j states with y1 orders of type 1, y2 orders of type 2, . . . , and

ym of type m. We denote this probability by p(y1, y2, . . . , ym):

p(y1, y2, . . . , ym) =

K∑

j=1

pn(ωjn) = Kq(y1, . . . , ym) = q(y1, . . . , ym)

(n

y1, y2, . . . , ym

).

If we multiply both sides of Eq. (3.8) by K, we arrive at

p(y1, y2, . . . , ym) =µ

(n+ 1)Λ(y1, . . . , ym){(y1 + 1)p(y1 + 1, y2, . . . , ym)

+(y2 + 1)p(y1, y2 + 1, . . . , ym) + . . .

+(ym + 1)p(y1, y2, . . . , ym + 1)}. (3.9)

Using Eq. (3.9), we can express all p(y1, y2, . . . , ym) in terms of pN, which is the

probability of having N repair orders in the system. After employing the normalization

constraint,N∑

n=0

∑

y1+···+ym=n

p(y1, y2, . . . , ym) = 1,

we obtain pNand all p(y1, y2, . . . , ym) where yi ∈ {0, 1, . . . , ni+Si−ki+1} for i = 1, . . . , m.


Then, the steady-state probability of having k type i orders is

pi(k) =∑

y1, . . . , ym, j 6= i, yi = k

0 ≤ yj ≤ nj + Sj − kj + 1

p(y1, y2, . . . , ym). (3.10)

3.1.2 Obtaining pD in the HF model

To obtain pD, we consider the system when the number of orders is less than or equal

to S. The system behaves as a birth-and-death process with the following local balance

equations

Λp(k) = µp(k + 1), k = 0, . . . , S − 2

Λp(S − 1) = µp(S) = µpDp0(0),

where, as before, Λ =∑m

i=1 niλi, and p0(0) is found from Eq. (3.4). After expressing all

p(k) in terms of pDp0(0) as

p(k) = rS−kpDp0(0), k = 0, . . . , S, (3.11)

where r = µ/Λ, using∑S−1

k=0 p(k) = 1− pD, we obtain

pD =1

1 + p0(0)∑S−1

k=1 rk. (3.12)

With pD in Eq. (3.12) and pi(k) in Eq. (3.10), we can compute piHF (k) Eq. (3.2). Note

that all these probabilities would change with any change in any S or Si, i = 1, . . . , m.

3.2 The Hybrid Priority (HP) Model

The HP model is similar to the HF model except for the dispatching policy employed

during the period D (defined by making use of the stopping times given in Eq. (3.1))

when O(t) > S, and Oi(t) > 0 for some i. While in the HF model, the repaired compo-

nent is sent to the system with the longest awaiting order, in the HP model, it is sent


to the highest-priority system among those with outstanding orders. We assume that

systems/classes 1 to m are prioritized from highest to lowest.

To analyze this system, we consider a separate model, called the reserved inventory-

priority (RIP) model, with exactly the same parameters as the HP model but with no

shared inventory (S = 0). The HP model during D is probabilistically identical to the

RIP model. We obtain pi(k)’s of the RIP model in Section 3.2.1, and with p0(0) in Eqs.

(3.12 -3.11), we find pD and p(k) in the HP model. By substituting these in Eq. (3.2), we

obtain the steady-state probability of having k type i orders in the HP model, piHP (k).

3.2.1 Obtaining pi(k) in the RIP model

We use a matrix approach similar to Bitran and Caldentey (2002) who obtain pi(k)’s for a

two-class preemptive-priority system with state-dependent Poisson arrival rates possibly

with class specific exponential service times. First, we adjust Bitran and Caldentey’s

solution to our problem.

As in Section 3.1.1, let p(y1, y2) be the steady-state probability of having y1 orders

from higher priority class 1 and y2 orders from class 2 in a two-class system. Similarly,

Λi(yi) is the failure rate for class i for i = 1, 2 given that there are yi orders.

Let Mi = ni + Si − ki + 1, for i = 1, 2. Since for class 1 customers,

Λ1(k)p1(k) = µp1(k + 1), for k = 0, · · · ,M1 − 1,

holds, the sequence π0 = 1 and πk = (Λ1(k − 1)/µ)πk−1 can be defined such that

p1(k) =πk∑M1

j=0 πj

, for k = 0, · · · ,M1. (3.13)

For class 2, their algorithm is more complex: For a given k for the number of class 2

orders, we define


Ak =

a0,k −µ 0

−Λ1(0) a1,k −µ

−Λ1(1) a2,k −µ

. . . . . . . . .

−Λ1(M1 − 2) aM1−1,k −µ

0 −Λ1(M1 − 1) aM1,k

, Rk =

p(0, k)

p(1, k)

.

.

.

p(M1, k)

where ay1,k is given by

ay1,k =

N1λ1 +N2λ2, y1 = k = 0,

N1λ1 + Λ2(k) + µ, y1 = 0, k > 0,

Λ2(k) + µ, y1 = M1, k ≥ 0,

Λ1(y1) + Λ2(k) + µ otherwise.

Additionally (M1+1)× (M1+1) matrices Bk = Ak−Λ2(k)(e1eT) are defined, where

e1 is an (M1 + 1) × 1 vector with 1 as its first entry and 0’s for the rest, and eT is the

transpose of an (M1 + 1) × 1 vector of 1’s. Except for B0, which has a rank M1, all

matrices Bk have full rank. Using P, which is the right eigenvector of B0 associated

to eigenvalue 0, Bitran and Caldentey define another sequence of vectors C0 = P and

Ck = Λ2(k − 1)Bk−1Ck−1, k = 1, · · · ,M2.

Then,

p2(k) =eTCk∑M2

j=0 eTCj

for k = 0, · · · ,M2. (3.14)

Rk =Ck∑M2

j=0 eTCj

for k = 0, · · · ,M2.

We now extend the result to m classes. We compute pi(k)’s in a recursive manner by

adding one new class at a time. Each time a new class is added, we use the two-priority

class model. With Mi = ni + Si − ki + 1, the following Theorem states how pm(k) can

be found, given pi(k) for i = 1, . . . , m− 1:


Theorem 3.1 Given pi(k) for i = 1, . . . , m − 1, pm(k) (m > 2) is equal to p2(k) in

Eq. (3.14) of a two-class RIP system with n1 = k1 = 1, S1 = ∞, λ1 =∑m−1

i=1 Λi where

Λi =∑Mi

k=0 Λi(k)pi(k) and n2 = nm, k2 = km, S2 = Sm, λ2 = λm.

Proof. We obtain pi(k)’s for i = 1, 2 according to two-class priority model. Assume that

pi(k)’s for i = 3, . . . , m−1, have been found using the method described in Theorem 3.1.

At time t, given that there are k repair orders from class i, the probability of failure in

the next ∆t time units is Λi(k)∆t. If we remove the condition on the number of repair

orders at time t, Λi∆t =∑Mi

k=0Λi(k)pi(k)∆t is the probability of a failure in system i in

the next ∆t time units. Then, Λi is the average failure rate from system i; it is also the

effective arrival rate of components from system i at the repair shop. Whether or not

the arrival processes of components from different fleets are independent of each other,

∑m−1i=1 Λi is the total failure rate of the classes 1, . . . , m−1, which from the point of view of

system/class m is a single high-priority class. Additionally, class m perceives a constant

failure rate,∑m−1

i=1 Λi, for the single high-priority class while it is itself experiencing a

state-dependent failure rate (for systems with finite and infinite population interactions,

see, e.g. Boxma, 1986 and Kaufman, 1984). Then, we can use an equivalent system,

i.e., the RIP system with two priority classes such that n1 = k1 = 1, λ1 =∑m−1

i=1 Λi

and S1 = ∞ (or S1 = M where M is a large integer to guarantee that there is always

one component functional in system 1, and the failure rate is always λ1), and n2 = nm,

k2 = km, S2 = Sm, λ2 = λm. In this case, p2(k) of this equivalent RIP system gives

pm(k).

3.3 Numerical Experiment

In previous sections, we have analyzed the HF and HP policies. However, we have not

addressed two important questions. (i) In both policies, we assumed a centralized repair

shop. Alternatively, we could consider a separate repair shop and a separate inventory


for each system, which we will call the Base Case (BC) model. Although the benefit of

server capacity pooling is well-known in the literature of production/inventory systems

(see Yu, Benjaafar, and Gerchak, 2009 and the references therein), what is the benefit, if

any, of repair shop pooling (plus sharing some spare part inventory) in our problem? In

other words, is it worth considering the more complex HF and HP policies if they do not

lead to significant cost reductions? (ii) And more importantly, how do the HF and HP

systems perform with respect to one another? Can we have general insight into when to

use one policy instead of the other?

In order to investigate these questions, we have designed a series of numerical experi-

ments involving two ki-out-of-ni:G systems. These experiments are set up as optimization

problems in which the minimization of the total capital cost tied up in keeping spares is

the objective function (e.g., Louit et al., 2011) and the steady-state availability of each

fleet i has to meet a minimum target level Ai. Let AP,i(S, SI , SII) denote the steady-state

availability of system i under policy P which is

AP,i(S, SI , SII) = 1− piP (ni + Si − ki + 1), i = I, II.

Note that under the HP policy, the high-priority system/class 1 could be either system

I or II. Also observe that piP (ni + Si − ki + 1) is a function of S, SI , and SII , as well as

the policy, and is found from Eq. (3.2) for the HF policy (and for the HP policy, with

adjustments as explained at the beginning of Section 3.2). The optimization problem for

HF and HP policies becomes

min hS + hISI + hIISII ,

subject to

AP,i(S, SI , SII) ≥ Ai, i = I, II,

where h, hI , hII are the holding cost rates due to the capital cost tied up in keeping spares

in shared and reserved inventories, respectively.


Modeling optimization problems of this type is common in the literature (Sasaki,

Kaburaki, and Yanagi, 1977, Yanagi, Sasaki, and Umazume, 1981). A common technique

to solve these models is presented by Lawler and Bell (1966). Two other methods are

dynamic programming (e.g., Messinger and Shooman, 1970) and incremental reliability

(e.g., Barlow and Proschan, 1965). One necessary condition to apply these methods is

that AP,i(S, SI , SII) must be monotone non-decreasing in each of the variables S, SI ,

and SII . However, while AP,i(S, SI , SII) is non-decreasing in Si, it can be shown to be

non-increasing in Sj (j 6= i): Adding a unit of spare to the reserved inventory of system

i may only improve its availability, and the proportion of time that system i is down

may shorten. Hence, the average failure rate of this system increases, resulting in higher

utilization of the repair shop, which, in turn, lowers the availability of the other fleet.

This fact can be used to skip some sub-optimal solutions in an exhaustive search to

find the optimal solution. For instance, let SLII and SL+

II denote the lowest values of SII

that satisfy the availability constraints together with the set {S, SI} and {S, SI + 1},

respectively. Then, SL+II ≥ SL

II , and therefore, {S, SI + 1, SL+II } cannot be the optimal

solution. Note that this approach can be easily extended to problems involving more

than two k-out-of-n:G systems.

The BC model can be analyzed more easily by constructing a separate birth-and-

death process for each system i. With a ki-out-of-ni:G system comprising components,

each having λi failure rate, and µi repair rate, for a given Si, the steady-state system

size distribution at the repair shop for system i can be computed by solving the global

balance equations of the underlying birth-and-death process. In this analysis, the cost

is hiSi. By searching base-stock level, we can determine optimal S∗i and optimal cost

Ci(S∗i ). The optimal cost of the BC system is C∗

BC = CI(S∗I ) + CII(S

∗II).


3.3.1 The Summary of the Numerical Results

This section provides a brief summary of the numerical examples in an attempt to answer

the two questions raised at the start of Section 3.3. In all numerical examples discussed

in this and subsequent sections, system I is a 90-out-of-100:G system (kI = 90 and

nI = 100) with AI = 0.999, λI = 0.009. The holding cost rates of all inventories are

set to 1. By varying a certain parameter of system II, we have generated four sets of

examples, each consisting of 600 examples, to be discussed in more detail in Section 3.3.2.

In the HF and HP policies with a pooled repair shop, we set µ = µI + µII .

To answer Question (i) in addressing the benefit of using HF and HP policies instead of

the BC model, for each of the 2,100 examples (300 out of 2,400 were repeating examples),

we computed

∆HFBC ≡

C∗BC − C∗

HF

C∗BC

, ∆HPBC ≡

C∗BC − C∗

HP

C∗BC

,

in which C∗HF and C∗

HP are the optimal cost of the system under the HF and HP policy.

These ratios measure the cost decrease due to repair shop pooling plus a shared inventory

(if non-zero) under the optimal HF and HP policies with respect to the optimal BC system

cost.

Table 3.1: The minimum, mean, median and maximum values of cost reduction of the

hybrid policies compared to the BC model.

Min(%) Mean(%) Median(%) Max(%)

∆HFBC 42 78 79 100

∆HPBC 41 87 95 100

In Table 3.1, we see remarkable cost savings under each policy with a centralized

repair shop. This clearly justifies using the more complex HF or HP policies instead of

the BC model.


As an initial attempt to answer Question (ii), in order to measure the cost decrease

due to using the optimal HP policy instead of the optimal HF policy for 2,000 examples

(100 out of 2,100 had an optimal cost of 0 under the HF policy), we computed

∆HPHF ≡

C∗HF − C∗

HP

C∗HF

.

Table 3.2: The minimum, mean, median and maximum values of cost reduction due to

the HP policy.


∆HPHF -800 38 67 100

From Table 3.2, we see that although the HP policy results in, on average, 38% less

cost than the HF policy, in some cases it can be costlier (up to 800% of the cost of the

HF policy). Therefore, in the next section, we compare the two policies in more detail.

3.3.2 The Relative Performance of the HF and HP Policies

As stated at the beginning of Section 3.3.1, we generate four sets of examples by choosing

a certain parameter of system II and assigning it 6 different values. For each parameter

value, we increment the target availability for system II in steps of 0.001 such that

AII ∈ (0.9, 0.999). In other words, in each set, for each parameter value considered, we

obtain the optimal solution under the BC system, HF and HP policies for 100 different

AII values.

3.3.2.1 The Impact of Repair Capacity

In the first set of 600 examples, system II is also a 90-out-of-100:G system (kII = 90 and

nII = 100) with λII = 0.009, i.e., identical to system I. With u ∈ {0.75, 0.8, 0.85, 0.9, 0.95, 0.99},


0.90.910.920.930.940.950.960.970.980.99140

50

60

70

80

90

100

AII

Cos

t Red

uctio

n (%

)

HPHF

Figure 3.3: The cost reduction of HF (∆HFBC%) and HP (∆HP

BC%) compared to the BC

system when u = 0.9

we vary µi as µI = µII = 0.9/u. Recalling that µ = µI +µII , this corresponds to varying

the centralized repair shop capacity under the HF and HP policies as well.

In Figure 3.3, u = 0.9 and µII = 1 = µI . We see that the HF policy outperforms

the HP policy for AII ≥ 0.978 only. Thus, the HF policy should be preferred only when

target availabilities of identical systems are close. For the examples presented in Figure

3.3, the HP policy gives priority to system I and stores spare parts – if any – solely in

the reserved inventory of system II. For AII ≤ 0.951, the HP policy does not carry any

inventory at all (∆HPBC%=100). The HF policy, in contrast, stores spares in the shared

and system I reserved inventories, and no matter how low AII gets, the inventory levels

do not reduce to 0.

Let Ai (≥ Ai) denote the actual availability system i is provided with when the

optimal number of spares is obtained under a given policy. In Figure 3.4, we see that

under both policies AII tends to decrease with AII getting smaller until a minimum is

met. This minimum is 0.951 for the HP policy, the availability at which it also starts

carrying no inventory. The minimum actual utilization under the HF policy, on the other

hand, does not decrease below 0.972. At first glance, having a higher actual utilization


0.90.910.920.930.940.950.960.970.980.9910.95

0.96

0.97

0.98

0.99

1

AII

Act

ual A

vaila

bilit

y fo

r S

yste

m II

HPHF

Figure 3.4: The actual AII vs. target AII availabilities for system II when u = 0.9

seems better, but from Figure 3.3, we recall that the HF policy incurs non-zero cost

of carrying spares inventories. In other words, higher actual availability under the HF

policy comes with a cost.

For other u or equivalently µ values, the relative performances of the two policies, as

well as the way inventories are used, remain the same: system I is the high-priority class

under the HP policy that stores all spares – if any – in system II reserved inventory.

In contrast, the HF policy stores the spares in the shared inventory when the target

availabilities of the systems are close. When AII diminishes, the shared inventory level

decreases while the reserved inventory of system I increases, but their sum, i.e., the total

number of spares, reduces.

With lower µ, not only do the optimal levels of spares increase, but also the per-

formance of the HP policy worsens. To see this, let T denote the AII at which the

performances (the optimal costs) of the HP and HF policies are the same. This means

that when AII < T , the HP policy is more cost-effective. Figure 3.5 plots the T values for

six µII (=µI , µ = 2µII) values. At the smallest capacity considered, when µII = 0.909,

we read T = 0.949. This implies that from the 100 examples optimized for both poli-

cies, the HP policy was more cost-effective than the HF policy in 48 examples when


0.9 0.95 1 1.05 1.1 1.15 1.2 1.250.94

0.95

0.96

0.97

0.98

0.99

1

µII (=0.9/u)

T

Figure 3.5: The maximum AII value below which the HP policy outperforms the HF

policy vs. µII

AII ∈ {0.9, 0.948}. Figure 3.5 shows that with higher µII (hence, higher µ = 2µII), T

increases monotonically. At the highest capacity considered, when µII = 1.2, the HP

policy is better in 90 examples out of 100 for AII < 0.991 = T . We conclude that if the

repair capacity is low, AII should be considerably smaller than T < AI = 0.999 in order

for the HP policy to beat the HF policy. With sufficiently high capacity, the HP policy

performs better than the HF policy, even when the difference between AI and AII is not

significant.

3.3.2.2 The Impact of System II Reliability

In the second set of examples, we fix µI = µII = 1 and vary kII ∈ {80, 82, 84, 86, 88, 90}

of system II, which is a kII-out-of-100:G system (nII = 100) with λII = 0.009. Lower

kII implies higher reliability for system II. Figure 3.6 plots T (the maximum AII value

below which the HP policy is better than the HF policy) versus kII . Here, we see that

lower kII has a similar impact on the performance of the HP policy as higher µII in

Figure 3.5. In order to prefer the HP policy, the difference between AI and AII does not

need to be large if the system II reliability is high. For instance, when kII = 80, the HP


policy is better in 96 examples out of 100 for AII < 0.997 = T .

80 81 82 83 84 85 86 87 88 89 900.975

0.98

0.985

0.99

0.995

1

1.005

kII

T


policy vs. kII

As a side note, when kII decreases, the repair shop utilization may increase which,

in turn, increases the spare part inventory levels and costs. Figure 3.7 provides such an

example when AII = 0.95.

80 81 82 83 84 85 86 87 88 89 906

6.5

7

7.5

8

8.5

9

kII

Cos

t

Figure 3.7: The cost of the HF policy vs. kII when AII = 0.95


3.3.2.3 The Impact of System II Failure Rate

In the third set of examples, we fix µI = µII = 1 but vary nII ∈ {20, 50, 70, 80, 90, 100},

choosing kII = ⌈0.9nII⌉ (greatest integer less than or equal to 0.9nII). We set λII =

0.9/nII ; in other words, the components become less reliable as the size of system II

decreases. When nII = 20 for which the components are the least reliable, the HF policy

always outperforms the HP policy. In this case, the HP policy prioritizes system II for

AII ≥ 99.2, and system I otherwise. For other nII values, in Figure 3.8, we plot T (the

maximum AII value below which the HP policy is better than the HF policy) versus λII .

Here, we see that if system II has many and more reliable components yielding small

λII values, the difference between AI and AII does not need to be large in order the HP

policy to beat the HF policy. At nII = 100, with λII = 0.009, the HP policy is better

in 77 examples out of 100 for AII < 0.978 = T . When λII increases, the model tends to

give priority to system II.

0.008 0.01 0.012 0.014 0.016 0.018 0.020.92

0.93

0.94

0.95

0.96

0.97

0.98

λII=(0.9/n

II)

T


policy vs. λII


3.3.2.4 Increasing µII with nII

In the fourth set of examples, we fix λII = 0.009 and µI = 1 but vary nII ∈ {20, 50, 70, 80,

90, 100}, choosing kII = ⌈0.9nII⌉. This time, we set µII = 0.01nII ; in other words, the

repair capacity increases with the size of system II. In this set of examples, the HP

policy prioritizes system II only when AII = 0.999. Figure 3.9 shows that the behavior

of T is not monotone. Only for intermediate values of repair shop capacity (also for

medium size system II), do we observe that the targeted availabilities should be wider

apart, in order for the HP policy to perform better.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.97

0.975

0.98

0.985

0.99

0.995

1

1.005

µII (=0.01n

II)

T


policy vs. µII(= 0.01nII)

Chapter 4

Queues with an Unreliable Server

In this chapter, we study the M/G/1//N queue with an unreliable server attending

to multiple fleets of finitely many machines. The server goes through exponentially

distributed ON periods followed by OFF periods with general distributions. Since repair

times are assumed to be general i.i.d. r.v.s, without considering inventories, and assuming

a preemptive-resume priority policy among different fleets, we model a system with a

pooled repair shop. In Section 4.1, we first focus on the M/G/1//N queue serving a

single finite-calling population. We redefine the process completion time r.v., this time

including setup times, and obtain its Laplace Transform (LT) in Section 4.1.1. This is

followed in Section 4.1.2 by the busy period analysis of the system. Here, we derive its

LT and the mean length of the busy period. This enables us to obtain the steady-state

system size distribution at departure/arrival and arbitrary time epochs in Section 4.1.3.

For the probabilities at arbitrary time epochs, we need the LT of the residual time left

until the departure of the first customer from the system. This is derived in Section 4.1.4.

After completing the analysis for a single finite-population, we design a recursive method

to include multiple classes in the M/G/1//N queue in Section 4.1.5. The single-class

ODD M/G/1//N queue is analyzed in Section 4.2.

As a special case, we propose an alternative solution for the M/M/1//N queue in

65

Chapter 4. Queues with an Unreliable Server 66

Section 4.3. With exponential service times, this method considers general OFF times;

however, setup times are assumed to be negligible. Its extension to multiple classes still

requires the busy period analysis from the M/G/1//N queue, as explained in Section

4.3.1. In Section 4.4, we present our numerical experiments.

4.1 The M/G/1//N Queue

In this chapter, we analyze a queueing system with an unreliable single server serving

a finite population of N customers (the case with multiple finite populations is studied

in Section 4.1.5). The times between the completion of a customer’s service and its next

arrival at the system follow an exponential distribution with rate λ. The actual service

times – in the absence of disruptions and excluding setup times – are independent and

identically distributed (i.i.d.) r.v.s with an LT, b(s). From time to time the server

is subject to interruptions making it unavailable to serve even if there are customers

waiting. In other words, the server is subject to “operation-independent” interruptions;

this differentiates the problem from those where a server can be interrupted only when

it is serving a customer. When an interruption occurs, the server becomes unavailable

or “down”. The length of server down/interruption times are i.i.d. r.v.s, denoted by D;

this follows a general continuous distribution F (y) =∫ y

0f(u)du with density function

f(y), and has a LT f(s). Letting F (y) = 1 − F (y), its first moment will be denoted

by E[D] =∫∞

0F (y)dy and its hazard rate function by β(y) = f(y)/F (y). The times

between the end of an interruption/down time and the next interruption are exponentially

distributed r.v.s with rate α. If the interruption occurs during a service time, the customer

being serviced is preempted; service resumes from the point of interruption once the server

is ready to serve again. During the service time of a customer, the server may have no

interruption, or it may have one or more. Each time the server attempts to serve a

customer (for the first time or after an interruption), it undergoes a setup/loading time


which is denoted by the i.i.d. r.v. U with a density function g(y) that is independent

of both D and the (remaining) service time r.v. Interruptions can occur during setup

time. At the end of the ensuing down time, a new setup time is generated from the

same distribution until one is not interrupted. Only then does the server start serving or

resume serving the customer.

When the server is not down, it is considered to be “up”, which means that it is either

idle and ready to serve, or is being set up (and the server is considered to be “loading”),

or is serving a customer (and the server is “in-service”). Therefore, at any given time,

the server is in one of the following four states: idle, in-service, loading, or down.

We employ three stochastic processes to characterize the state of the system at time

t: R(t) equals 0 if the server is up, and 1 if it is down; W (t) ∈ {0, 1, ..., N} is the number

of customers out of the queueing system; V (t) is the elapsed time since the server went

down. The elapsed time since the last setup time started is another stochastic process,

but we do not need this information in our derivations. We do not use the stochastic

process that gives the number of customers in the queuing system at time t, which is

N − W (t), because it is easier to express the state dependent arrival rates via W (t) in

our derivations. In the rest of this chapter, we denote the mean for any r.v. X by E[X ].

4.1.1 The Process Completion Time with Setup Times

The process completion time (PCT) r.v. (Gaver, 1962), denoted by C, is the total time

a customer spends on the server; this includes the actual service time plus any possible

OFF periods it may experience. The literature on the PCT ignores setup times, and

considers OFF periods to be down times during which the server cannot experience

new interruptions. We extend this model by incorporating setup times during which

a customer has to wait. In our problem, C is the elapsed time between the instant a

customer’s first setup time begins and the instant the same customer departs from the

system. This means that if interruptions occur, once the subsequent down time is over,


a customer waits for an uninterrupted setup time before it resumes its service. If the

server is interrupted during a setup time, the remaining service time of an interrupted

customer does not change. Only the amount of work done after an uninterrupted setup

time reduces the remaining service time.

Let C(U,Z) be the r.v. denoting the PCT as a function of the setup time r.v., U , the

remaining service time r.v., Z, (Z can also be the service time of the customer finding

the server idle), and the time until the next interruption, y. Then,

C(U,Z) =

U + Z, if y ≥ U + Z,

y +D + C(U,Z − (y − U), if U ≤ y < Z + U,

y +D + C′

(U,Z), if 0 ≤ y < U,

where C′

(U,Z) is identically distributed as C(U,Z). Given that U = u and Z = z, the

LT of C(U = u, Z = z), c(s|u, z) is given by

c(s|u, z) = e−s(z+u)e−α(z+u) + αf(s)

∫ z

0

e−(s+α)(z+u−ω)c(s|u, ω)dω

+f(s)c(s|u, z)

∫ u

0

αe−(α+s)ydy,

which, after being rearranged and by letting ω = z + u− y, becomes

c(s|u, z)e(s+α)(z+u) = 1 + αf(s)

∫ z

0

e−(s+α)ω c(s|u, ω)dω

+α

α + s

(e−(s+α)u − 1

)e(s+α)zf(s)c(s|u, z),

(e(s+α)u +

α

s+ α(1− e(s+α)u)f(s)

)e(s+α)z c(s|u, z) = 1 + αf(s)

∫ z

0

e(s+α)ω c(s|u, ω)dω.

After taking the derivative of both sides with respect to z,

∂ln e(s+α)z c(s|u, z)

∂z=

αf(s)

e(s+α)u + αs+α

(1− e(s+α)u)f(s),

we obtain the following solution

c(s|u, z) = e−

(

s+α− αf(s)

e(s+α)u+ αs+α

(1−e(s+α)u)f(s)

)

z

.


If we remove the condition on z by integrating c(s|u, z) over all possible values of z, we

obtain

c(s|u) = b

(s+ α−

αf(s)

e(s+α)u + αs+α

(1− e(s+α)u)f(s)

).

Similarly, when we remove the condition on u, we obtain the LT of C as

c(s) =

∫ ∞

0

b

(s+ α−

αf(s)

e(s+α)u + αs+α

(1− e(s+α)u)f(s)

)g(u)du.

Note that when there is no setup time, from the equations given above we arrive at

the LT found in the literature (e.g., Altıok, 1997, p. 94)

c(s) = b(s+ α− αf(s)). (4.1)

4.1.2 Busy Period Analysis for the M/G/1//N Queue

We define a busy period as an interval starting with either an interruption or a customer’s

arrival when the server is idle, and ending as soon as the server becomes idle again. Thus,

each busy period starts with an “initial delay” either in the form of a down time after an

interruption, or a PCT after a customer’s arrival. Let pDN(n) (pCN−1(n)) be the probability

of having 0 ≤ n ≤ N (0 ≤ n ≤ N−1) customers present at the end of a down time (PCT)

initiating a busy period in the M/G/1//N system. Unlike the systems with constant

customer arrival rates, in this system, state dependent arrival rates must be taken into

account.

Before presenting the following Theorem, we define PDN (n|d) (PC

N−1(n|c)) as the prob-

ability of having n customers in the M/G/1//N system at the end of the down time

(PCT) initiating a busy period given that D = d (C = c). Further, PDN (n, s) =

∫∞

0PDN (n|y)e−syf(y)dy (PC

N−1(n, s) =∫∞

0PCN−1(n|z)e

−szc(z)dz).


Theorem 4.1 The LT PDN (n, s) is given by

PDN (0, s) = f(s+Nλ), (4.2)

PDN (n, s) =

N∑

i=N−n

(−1)i−(N−n+1)

(N

i

)(i

N − n

)(f(s)− f(s+ iλ)), 0 < n < N,(4.3)

PDN (N, s) =

N∑

i=1

(−1)i−1

(N

i

)(f(s)− f(s+ iλ)). (4.4)

Proof. To prove Theorem 4.1, we need the following Lemma.

Lemma 4.1 During the down time initiating a busy period in the M/G/1//N system,

the time-to-arrival r.v. TN,n of the nth customer has the following cumulative distribution

function:

HN,n(t) = (N − n + 1)

N∑

i=N−n+1

(−1)i−(N−n+1)

(N

i

)(i

N − n+ 1

)(1− e−iλt)

i. (4.5)

Proof. Note that if an interruption initiates a busy period, at the beginning of the

down time, N customers are not yet in the queueing system. During the down time

initiating a busy period, when W (t) = N −n, the time-to-arrival of the next customer is

exponentially distributed with rate of (N − n)λ, and TN,n is the sum of n exponentially

distributed r.v.s with rates of Nλ, (N − 1)λ, . . . , and (N − n + 1)λ, i.e.,

TN,n =N∑

i=N−n+1

Ti,

where Ti follows an exponential distribution with rate iλ. Let hN,n(s) be the LT of TN,n,

then

hN,n(s) =Nλ

Nλ + s

(N − 1)λ

(N − 1)λ+ s· · ·

(N − n + 1)λ

(N − n + 1)λ+ s,

=N !λn

(N − n)!

N∏

i=N−n+1

1

iλ+ s. (4.6)

Using

N !λn−1

(N − n+ 1)!

N∏

i=N−n+1

1

iλ+ s=

N∑

i=N−n+1

(−1)i−(N−n+1)

(N

i

)(i

N − n+ 1

)1

iλ+ s,


in Eq. (4.6), we arrive at

hN,n(s) = (N − n + 1)λN∑

i=N−n+1

(−1)i−(N−n+1)

(N

i

)(i

N − n + 1

)1

iλ+ s,

the inversion of which gives Eq. (4.5).

To prove Theorem 4.1, given that D = d, and using Lemma 4.1, we have

PDN (0|d) = P{T1 > d} = 1−HN,1(d) = e−Nλd, (4.7)

and for 0 < n < N

PDN (n|d) = P{Tn < d < Tn+1} = HN,n(d)−HN,n+1(d)

=N∑

i=N−n

(−1)i−(N−n+1)

(N

i

)(i

N − n

)(1− e−iλd), (4.8)

and finally,

PDN (N |d) = P{TN < d} = HN,N(d) =

N∑

i=1

(−1)i−1

(N

i

)(1− e−iλd). (4.9)

Taking the LT of Eqs. (4.7)-(4.9) yields Eqs. (4.2)-(4.4), respectively.

Note that Theorem 4.1 can be adjusted to obtain PCN−1(n, s) (see the proof of Corollary

4.2). The following Corollary directly follows from Theorem 4.1 since PDN (n) = PD

N (n, 0).

Corollary 4.1 The steady-state probability of having n customers in the M/G/1//N

system at the end of the down time initiating a busy period is given by

PDN (0) = f(Nλ), (4.10)

PDN (n) =

N∑

i=N−n

(−1)i−(N−n+1)

(N

i

)(i

N − n

)(1− f(iλ)), 0 < n < N, (4.11)

PDN (N) =

N∑

i=1

(−1)i−1

(N

i

)(1− f(iλ)). (4.12)

Similarly,


Corollary 4.2 The steady-state probability of having n customers in the M/G/1//N

system at the end of the PCT initiating a busy period is given by

PCN−1(0) = c((N − 1)λ), (4.13)

PCN−1(n) =

N−1∑

i=N−1−n

(−1)i−(N−n)

(N − 1

i

)(i

N − 1− n

)(1− c(iλ)), 0 < n < N − 1,

(4.14)

PCN−1(N − 1) =

N−1∑

i=1

(−1)i−1

(N − 1

i

)(1− c(iλ)). (4.15)

Proof. The fundamental difference between a down time initiating a busy period and

a PCT initiating a busy period are the following. The PCT has a different distribution

from the down time, and at the beginning of the PCT, N − 1 customers are not yet in

the queueing system. Therefore, Lemma 4.1 and Theorem 4.1 can be employed for an

M/G/1//(N − 1) system where the down time has the same distribution as the PCT,

and Eqs. (4.13)-(4.15) can be obtained.

If there are no customers waiting for service at the end of an initial delay, the busy

period ends; otherwise, it continues until the server becomes idle. If there are n customers

present in the system at the end of an initial delay (1 ≤ n ≤ N if the initial delay is

a down time, and 1 ≤ n ≤ N − 1 if it is a PCT), in addition to the initial delay, the

busy period consists of n sub-cycles. Each sub-cycle starting with i customers in the

system (1 ≤ i ≤ n) is the time it takes until i − 1 customers remain in this system

and is identical in distribution to the busy period of an M/G/1//(N − i + 1) system

(the queuing system with the same underlying stochastic processes but serving a finite

population of N − i+ 1 customers) initiated by a PCT (see Shanthikumar and Sumita,

1985, for a similar approach analyzing the M/G/1//N queue without setup times and

interruptions). Therefore, if there are n customers in the M/G/1//N system at the end

of an initial delay, the first (last) sub-cycle is identical in distribution to the busy period

initiated by a PCT in theM/G/1//(N−n+1) system (M/G/1//N system). If we denote

the length of the busy periods initiated by a down time and a PCT in an M/G/1//j


system by TDj and TC

j , and denote their LT’s by hDj (s) and hC

j (s), respectively, for the

M/G/1//N system, we have

TDN =

D, if there are no arrivals at the end of D,

D +∑N

j=N−n+1 TCj , if 0 < n ≤ N arrivals at at the end of D,

TCN =

C, if there are no arrivals at the end of C,

C +∑N

j=N−n+1 TCj , if 0 < n ≤ N − 1 arrivals at the end of C,

from which their LT’s can be obtained using Theorem 4.1, respectively, as

hDN(s) = f(s+Nλ) +

N∑

n=1

PDN (n, s)

N∏

j=N−n+1

hjC(s), (4.16)

hCN(s) = c(s+ (N − 1)λ) +

N−1∑

n=1

PCN−1(n, s)

N∏

j=N−n+1

hCj (s).

Solving the equation above for hCN(s) we get

hCN(s) =

c(s+ (N − 1)λ)

1−∑N−1

n=1 PCN−1(n, s)

∏N−1j=N−n+1 h

Cj (s)

.

Since a busy period starts either with an interruption or a customer arrival, the LT of

the length of the busy period r.v. TN in the M/G/1//N system is

hN(s) =α

α +NλhDN (s) +

Nλ

α +NλhCN(s). (4.17)

Then, the mean length of the busy period is

E[TN ] =α

α +NλE[TD

N ] +Nλ

α +NλE[TC

N ], (4.18)

where

E[TDN ] = −

dhDN(s)

ds|s=0 = E[D] +

N∑

n=1

E[TCn ]

N∑

j=N−n+1

PDN (j), (4.19)

E[TCN ] = −

dhCN(s)

ds|s=0 =

E[C] +∑N−1

n=2 E[TCn ]∑N−1

j=N−n+1 PCN−1(j)

PCN−1(0)

.


Note that the times between two busy periods follow an exponential distribution with

rate α+Nλ. By invoking the renewal theorem, the fraction of time the server is idle and

up is (1 +E[TN ](α+Nλ))−1, and the fraction of time the server is up is (1 + αE[D])−1.

Thus, the fraction of time the server is in-service is (1+αE[D])−1−(1+E[TN ](α+Nλ))−1.

4.1.3 System Size Distribution in the M/G/1//N Queue

In this section, we obtain the steady-state probabilities of having i customers out of the

system at departure/arrival epochs in Section 4.1.3.1; we then provide the system size

distribution at an arbitrary instant in Section 4.1.3.2.

4.1.3.1 System Size Distribution at Arrival/Departure Epochs

We start our analysis by studying the embedded Markov chain of the number of customers

left in the system after a customer departs. Let pi,j be the transition probability that

the next departure leaves j customers in the system, given that the last departure left i

customers. If the last departure left i customers, 0 < i < N , in the queue, the steady-

state probability of the next departure leaving j customers behind (j = i− 1, . . . , N − 1)

is the probability of having j − i + 1 arrivals during the PCT. This probability is the

same as the steady-state probability of having j− i+1 customers at the end of the PCT

that initiates a busy period in the M/G/1//(N − i+ 1) system, and can be obtained by

invoking Corollary 4.2. Any other transition from i, 0 < i < N , is not possible. After a

departing customer leaves an empty system, the next arrival can find the server down, or

up and idle. If the server is found to be down, in steady-state, this arrival waits for the

residual down time before the setup time starts. We denote this r.v. by DR. Following

Fiems, Maertens, and Bruneel (2008), the LT of DR can be found as

fR(s) =Nλ(Nλ− s) +Nλα(f(s)− f(Nλ))

(Nλ + α− αf(Nλ))(Nλ− s), (4.20)


with

fR(Nλ) = lims→Nλ

fR(s) =Nλ

(1− αf ′(Nλ)

)

Nλ + α− αf(Nλ).

Only then does the PCT of the customer arriving during a down time start. In order

for such a customer to leave j customers behind (j = 0, 1, . . . , N − 1), there should be j

arrivals during the interval L = DR + C, with an LT of l(s) = fR(s)c(s), and a mean of

E[L] = −dl(s)

ds|s=0 = E[DR] + E[C]. (4.21)

Using Corollary 4.2 by substituting l(s) for c(s), PLN−1(j) = p0,j (j = 0, 1, . . . , N −1) can

be obtained. In summary, we have

pi,j =

PLN−1(j), i = 0, 0 ≤ j ≤ N − 1,

PCN−i(j − i+ 1), 1 ≤ i < N, i− 1 ≤ j ≤ N − 1,

0, otherwise.

Now that we have pi,j, we can construct the N ×N transition probability matrix P.

FromΠ = ΠP and∑N

i=1 πi = 1, we can solve for the 1×N vector Π= [πN , πN−1, . . . , π1].

Here, πi is the steady-state probability of having i customers (including the departing

customer) out of the queueing system at departure instants (or equivalently having N− i

customers left behind in the queueing system). Since this is an ergodic Markov chain, πi

is also the steady-state probability that an arrival finds N − i customers in the system.

4.1.3.2 System Size Distribution at an Arbitrary Instant

In this section, we obtain P i, the steady-state probability of having i customers out of

the system.

Lemma 4.2 With E[TN ] as the mean length of a busy period,

PN =Nλ+ α− αf(Nλ)

Nλ(1 + E[TN ](α+Nλ)). (4.22)


Proof. The probability of the system being empty is

PN = limt→∞

P {(W (t) = N) ∩ R(t) = 0}+ limt→∞

P {(W (t) = N) ∩R(t) = 1} . (4.23)

The probability of having no customers in the system and the server being up (as dis-

cussed at the end of Section 4.1.2) is

limt→∞

P {(W (t) = N) ∩ R(t) = 0} =1

1 + E[TN ](α +Nλ).

Observe that only during the down time which initiates a busy period can the server

be down while no customer exists in the system; the average time the system remains

empty during such a down time is given by

∫ ∞

0

(∫ y

0

tNλe−Nλtdt+ y

∫ ∞

y

Nλe−Nλtdt

)f(y)dy =

1− f(Nλ)

Nλ.

Since the fraction of time the system is in a busy period initiated by a down time is

αE[TDN ]

1 + E[TN ](α +Nλ),

the fraction of time the server is empty and down is

limt→∞

P {(W (t) = N) ∩R(t) = 1} =α∫∞

0

(∫ y

0tNλe−Nλtdt+ y

∫∞

yNλe−Nλtdt

)f(y)dy

1 + E[TN ](α +Nλ).

The summation of these in Eq. (4.23) gives PN in Lemma 4.2.

To obtain the entire distribution, we introduce the “augmented PCT” (APCT) r.v.

denoted by C, which is the PCT for all customers (i.e. C = C) except those arriving

as the first customers during a down time that initiates a busy period. In the latter

case, the APCT is the residual down time such customers wait plus their PCT, that is

C = L. Then, the residual APCT r.v. CR with cR(x) as its density function is the time

left until the departure of the first customer (who may be waiting for the down time

that initiates a busy period, or is in service, or is preempted) in the system. It is known

that P (CR = 0) = PN , i.e., the probability that there are no customers in the queueing

system, but we define cR(0) = limx→0 cR(x).


Let CR(t) denote the residual APCT at time t and

Pi(t, x)dx = P{W (t) = i, x < CR(t) < x+ dx}, 0 ≤ i ≤ N − 1,

denote the joint probability distribution of having i customers out of the queueing system

at time t (W (t) = i), and the residual APCT of the customer (preempted or currently

receiving service) being in the interval [x, x + dx]. Observe that from t to t + ∆t, the

residual APCT will decrease by ∆t. Assuming that the probability of having more than

one arrival is o(∆t) and P−1(t, x) and its limiting probability are 0,

PN−1(t+∆t, x) = (1− (N − 1)λ∆t)PN−1(t, x+∆t) +Nλ∆tPN (t)l(x)

+PN−2(t, 0)c(x)∆t + o(∆t),

Pi(t+∆t, x) = (1− iλ∆t)Pi(t, x+∆t) + (i+ 1)λ∆tPi+1(t, x+∆t)

+Pi−1(t, 0)c(x)∆t+ o(∆t), 0 ≤ i ≤ N − 2,

where PN(t) is the probability of having N customers out of the system at time t.

Here l(x), and c(x) are the density functions of the r.v.s L and C, respectively, and

c(x)∆t = P (x ≤ C ≤ x+∆t). Re-arranging the equations given above, we obtain(

∂

∂t−

∂

∂x

)PN−1(t, x) = −(N − 1)λPN−1(t, x) +NλPN(t)l(x) + PN−2(t, 0)c(x),

(∂

∂t−

∂

∂x

)Pi(t, x) = −iλPi(t, x) + (i+ 1)λPi+1(t, x) + Pi−1(t, 0)c(x), 0 ≤ i ≤ N − 2.

Letting Pi(x) = limt→∞ Pi(t, x), if we take the limit of the equations given above as

t → ∞,

d

dxPN−1(x) = (N − 1)λPN−1(x)−NλPN l(x)− PN−2(0)c(x), (4.24)

d

dxPi(x) = iλPi(x)− (i+ 1)λPi+1(x)− Pi−1(0)c(x), 0 ≤ i ≤ N − 2. (4.25)

Observe that Pi(x) is the density function of the residual APCT and i customers are out

of the queueing system. When i = 0, integrating both sides of Eq. (4.25) gives

P0(∞)− P0(0) = −λP 1

P0(0) = λP 1.


Recursively, we can show that

Pi(0) = (i+ 1)λP i+1, 0 ≤ i ≤ N − 1. (4.26)

Note that Pi(0) is the probability that a customer is about to leave the server and there

are i customers out of the queueing system. Then, using Bayes’ theorem

πi+1 ≡ P{i customers out of the system|a departure is about to occur}

=Pi(0)

cR(0)=

Pi(0)∑N−1i=0 Pi(0)

, 0 ≤ i ≤ N − 1,

πi =iλP i∑Nj=1 jλP j

, 1 ≤ i ≤ N,

P i =NPN

iπN

πi, 1 ≤ i ≤ N. (4.27)

Using Eq. (4.27) together with Lemma 4.2, we derive the solution for P i, which is also

the steady-state probability of having N − i customers in the system. Eq. (4.27) also

helps us obtain cR(0) = NλPN/πN (we do not use cR(0) for our computations).

The following theorem provides an alternative solution. Before presenting it, we

introduce the conditional residual APCT, given that there are i customers out of the

system. By definition, its density function is (the LT cR|i(s) is obtained in Section 4.1.4)

cR|i(x) =Pi(x)

P i

. (4.28)

Theorem 4.2 There is a recursive relationship between the steady-state probabilities P i

so that

PN−1 =N

(N − 1)

1− l((N − 1)λ)

c((N − 1)λ)PN , (4.29)

P i =(i+ 1)P i+1

ic(iλ)(1− cR|i+1(iλ)), 0 < i ≤ N − 2. (4.30)

Proof. After substituting PN−2(0) = (N−1)λPN−1 from Eq. (4.26) into Eq. (4.24) and

multiplying both sides by e−(N−1)λx, eventually, we have

d

dx

(e−(N−1)λxPN−1(x)

)= −Nλe−(N−1)λxPN l(x)− (N − 1)λe−(N−1)λxPN−1c(x).


Integrating both sides gives

−e−(N−1)λxPN−1(x) = −NλPN

∫ ∞

x

e−(N−1)λul(u)du−(N−1)λPN−1

∫ ∞

x

e−(N−1)λuc(u)du.

(4.31)

At x = 0, Eq. (4.31) is

PN−1(0) = NλPN l((N − 1)λ) + (N − 1)λPN−1c((N − 1)λ).

The equation above together with Eq. (4.26) for PN−1(0) gives Eq. (4.29).

Similarly, by multiplying both sides of Eq. (4.25) by e−iλx, and skipping similar steps

as in the first part of the proof, we arrive at

Pi(x) = eiλx(∫ ∞

x

λe−iλu(i+ 1)Pi+1(u)du+ iλP i

∫ ∞

x

e−iλuc(u)du

). (4.32)

For x = 0, Eq. (4.32) is

Pi(0) =

∫ ∞

0

λe−iλu(i+ 1)Pi+1(u)du+ iλP i

∫ ∞

0

e−iλuc(u)du.

Note that by the definition given in Eq. (4.28), Pi+1(s) = P i+1cR|i(s), which together

with Eq. (4.26), leads us to

Pi(0) = (i+ 1)λP i+1 = (i+ 1)λP i+1cR|i+1(iλ) + iλP ic(iλ),

from which Eq. (4.30) follows.

4.1.4 The Conditional Residual Augmented Process Comple-

tion Time

In this section, we obtain the LT cR|i(s) of the conditional residual APCT given that

there are i customers out of the system.


Theorem 4.3 There is a recursive relationship for cR|i(x) such that

cR|N−1(x) =(N − 1)λe(N−1)λx

1− l((N − 1)λ){c((N − 1)λ)

∫ ∞

x

e−(N−1)λul(u)du

+(1− l((N − 1)λ))

∫ ∞

x

e−(N−1)λuc(u)du}, (4.33)

cR|i(x) = iλeiλx∫ +∞

x

e−iλu

(c(iλ)

cR|i+1(u)

1− cR|i+1(iλ)+ c(u)

)du, 0 < i ≤ N − 2.(4.34)

Proof. Eq. (4.33) follows directly by substituting Eq. (4.29) in Eq. (4.31). Eq. (4.34),

which is the same as Eq. (2) in Kerner (2008), is obtained by substituting Eq. (4.30) in

Eq. (4.32).

And,

Theorem 4.4 There is a recursive relationship for cR|i(s) such that

cR|N−1(s) =(N − 1)λ

s− (N − 1)λ

c((N − 1)λ)(1− l(s)

)− c(s)

((1− l((N − 1)λ))

)

1− l((N − 1)λ)),(4.35)

cR|i(s) =iλ

s− iλ

(c(iλ)

1− cR|i+1(s)

1− cR|i+1(iλ)− c(s)

), 0 < i ≤ N − 2, (4.36)

cR|0(s) =P 1

P 0

λ(1− cR|1(s)

)

s. (4.37)

Proof. After multiplying both sides of Eq. (4.24) with e−sx and integrating, we have

∫ ∞

0

e−sxdPN−1(x) = (N − 1)λ

∫ ∞

0

e−sxPN−1(x)dx−NλPN

∫ ∞

0

e−sxl(x)dx

−PN−2(0)

∫ ∞

0

e−sxc(x)dx,

sPN−1(s)− PN−1(0) = (N − 1)λPN−1(s)−NλPN l(s)− PN−2(0)c(s),

PN−1(s) =NλPN (1− l(s))− (N − 1)λPN−1c(s)

s− (N − 1)λ.

Note that for the last equation above, we used Eq. (4.26). After multiplying both sides

of Eq. (4.29) by λ, we re-arranged it to express NλPN . When this is substituted in the

last equation above, we get

PN−1(s) =(N − 1)λPN−1

(c((N − 1)λ)(1− l(s))− c(s)

(1− l((N − 1)λ)

))

(1− l((N − 1)λ))(s− (N − 1)λ).


Dividing the equation given above by PN−1 according to Eq. (4.28) gives Eq. (4.35).

Similarly, Eq. (4.36) can be found by starting with Eq. (4.25) and is the same as Eq. (4)

in Kerner (2008). When i = 0, multiplying both sides of Eq. (4.25) by e−sx, integrating

the results, and then using Eq. (4.28), gives

P0(s) =λ(P 1 − P1(s))

s

=λP 1(1− cR|1(s))

s.

Dividing this equation by P 0 according to Eq. (4.28) gives Eq. (4.37).

The following Theorem is presented without a proof since its proof is, in principle,

the same as Theorem 2.2.2 in Kerner (2008). Kerner exploits Theorem 1 by van Doorn

and Regterschot (1988).

Theorem 4.5 The conditional residual APCT at an arrival epoch given that there are i

customers out of the system has cR|i(x) as its density function.

Recall from Section 4.1.3.1 that in steady-state an arrival finds N − i customers in the

system with probability πi. Using Theorem 4.5, the system time of such a customer is

the residual APCT of the customer on the server plus the sum of N − i PCT’s of the

customers waiting in the queue and the new arrival; this has the LT of

wi(s) = cR|i(s)cN−i(s), 1 ≤ i ≤ N − 1.

With probability πN , the customer finds no customers in the system and its system time

is L. By the law of total probability, the LT of the system time of a customer is given by

w(s) =N−1∑

i=1

πiwi(s) + πN l(s).

4.1.5 The Multi-class M/G/1//N Queue

In this section, we consider m finite-source populations/customer classes indexed by

k = 1, . . . , m served by a single unreliable server. Each population k consists of Nk


customers (type k customer). The times between the completion of a type k customer’s

service and the next arrival at the queueing system follow an exponential distribution

with rate λk. The actual service times of customers – in the absence of disruptions

and excluding setup times – are i.i.d. r.v.s with an LT, bk(s). The assumptions made

concerning U , D and times between the end of a server interruption/down time and the

next server interruption following an exponential distribution with rate α remain valid.

Customer classes are prioritized as class 1 to m from highest to lowest. Since preemptive-

resume priority policy is used, a class k customer can be serviced only during the periods

the server is not allocated to higher priority classes 1 to k− 1. The busy period for class

k customers and the distribution of the number of type k customers in the system can be

found from the single class M/G/1//Nk queue with an “effective” interruption rate, from

the point of view of type k customers, as αk = α+∑k−1

n=1Nnλn (to be used instead of α)

and using the LT of the busy period for type k − 1 customers from Eq. (4.17) instead

of f(s). Obviously, the LT of the busy period for type k − 1 customers can be found

by applying this procedure recursively, starting by analyzing class 2 in the M/G/1//N2

queue.

4.2 The ODD M/G/1//N Queue

In the ODDM/G/1//N queue, the server may experience different disruptions depending

on the state of the server, i.e., the interruption process is altered or halted when the server

becomes idle. If the failure rate and the LT transform of the length of down times during

the idle state change to αI and fI(s), respectively, different from those of the busy state,

only the time between busy periods and the down times that initiate a busy period are

affected. The process completion time remains unchanged. In this case, Eqs. (4.16),

(4.17), (4.18),(4.20), and (4.22) must be obtained using αI and fI(s) instead of α and

f(s), respectively.


In a special case, when αI = 0 in the single-class ODD system, a busy period can only

be initiated by a PCT, and in our derivation of the system size distribution, no residual

down time should be taken into account (i.e., DR = 0). If one is only interested in the

system size distribution, as an alternative, the PCT r.v. can be used as the service time

r.v. in an M/G/1//N queue without server interruptions, as analyzed by Gupta and

Srinivasa Rao (1996); this will be referred to as the M/PCT/1//N queue. Note that the

M/PCT/1//N queue should have the same λ and N as the original ODD M/G/1//N

queue; additionally, c(s) in Eq. (4.1) should be used instead of the service time LT in the

algorithm by Gupta and Srinivasa Rao. This way, the probability of having i customers

in the M/PCT/1//N equals that of the original ODD M/G/1//N queue.

Unlike in the M/G/1//N queue with OID, a high-priority class is not completely

independent of the lower-priority classes in the ODD M/G/1//N queue. This prevents

us from employing an approach similar to that used in Section 4.1.5 for the ODD case.

To explain the dependency of a high-priority class on the low-priority class, consider a

system with two classes. If the failure rate, or the fleet size of the low-priority class

increases, or its repair rate decreases, the amount of time the server allocates to this fleet

tends to increase. This, in turn, tends to decrease the amount of time the server stays

idle in both the OID and ODD systems. However, in the ODD M/G/1//N queue, this

also decreases (increases) the probability that the system experiences interruptions with

rate αI (α). Consequently, the performance measures for the high-priority class in the

ODD M/G/1//N queue are affected by the characteristics of the low-priority class.

4.3 The M/M/1//N Queue

In this section, we analyze the single class M/M/1//N queue. Our assumptions here

differ from those in Section 4.1 in that there is no setup time and actual service times

in the absence of disruptions are exponentially distributed with rate µ. Due to the


memoryless property of the exponential service times, the remaining service times of

interrupted customers are also exponentially distributed with rate µ. We introduce

Pi,0(t) = Pr{W (t) = i, R(t) = 0}, 0 ≤ i ≤ N,

which is the probability that there are i customers out of the queueing system and the

server is up at time t. Let

Pi,1(t, y)dy = Pr{W (t) = i, R(t) = 1, y ≤ V (t) ≤ y + dy}, 0 ≤ i ≤ N,

be the probability that there are i customers out of the queueing system, the server is

down at time t, and the length of time since the server went down is in the interval

[y, y + dy].

By considering the transitions between states at time t, we have

d

dtPN,0(t) = −(Nλ + α)PN,0(t) + µPN−1,0(t) +

∫ ∞

0

PN,1(t, y)β(y)dy, (4.38)

d

dtPi,0(t) = −(iλ + µ+ α)Pi,0(t) + (i+ 1)λPi+1,0(t) + µPi−1,0(t)

+

∫ ∞

0

Pi,1(t, y)β(y)dy, 0 ≤ i ≤ N − 1, (4.39)

and

(∂

∂t+

∂

∂y+Nλ+ β(y))PN,1(t, y) = 0, (4.40)

(∂

∂t+

∂

∂y)Pi,1(t, y) = −(iλ+ β(y))Pi,1(t, y)

+(i+ 1)λPi+1,1(t, y), 0 ≤ i ≤ N − 1. (4.41)

Letting Pi,0 = limt→∞ Pi,0(t) and Pi,1(y) = limt→∞ Pi,1(t, y) for 0 ≤ i ≤ N , if we take

the limit as t → ∞ in Eqs. (4.38)-(4.41), we obtain

(Nλ + α)PN,0 = µPN−1,0 +

∫ ∞

0

PN,1(y)β(y)dy, (4.42)

(iλ+ µ+ α)Pi,0 = (i+ 1)λPi+1,0 + µPi−1,0 +

∫ ∞

0

Pi,1(y)β(y)dy, 0 ≤ i ≤ N − 1,(4.43)

d

dyPN,1(y) = −(Nλ + β(y))PN,1(y), (4.44)

d

dyPi,1(y) = −(iλ + β(y))Pi,1(y) + (i+ 1)λPi+1,1(y), 0 ≤ i ≤ N − 1, (4.45)


and the boundary equation is

Pi,1(0) = αPi,0, 0 ≤ i ≤ N. (4.46)

We introduce the following to be used in Theorems to follow:

QN−1 =Nλ+ α− αf(Nλ)

µ, (4.47)

Qi−1 =(iλ+ µ+ α)Qi − (i+ 1)λQi+1 − α

∑Nj=iQjζi,j

µ, 1 ≤ i ≤ N − 1. (4.48)

ζi,i = f(iλ), 0 ≤ i ≤ N, (4.49)

ζi,j =j

j − iζi,j−1 −

i+ 1

j − iζi+1,j, 0 ≤ i ≤ N. (4.50)

Di = Di+1 −N∑

j=i

Qjζi,j +Qi, 1 ≤ i ≤ N − 1, (4.51)

Theorem 4.6 The steady-state probability that there are i customers out of the system

is

Pi,0 =PN,1(0)

αQi, 1 ≤ i ≤ N − 1, (4.52)

Pi,1 =PN,1(0)

iλDi, 1 ≤ i ≤ N − 1, (4.53)

Proof. If we divide both sides of Eq. (4.44) by e−Nλy−∫ y

0β(x)dxPN,1(0) and Eq. (4.45) by

e−iλy−∫ y

0 β(x)dxPN,1(0), we get

d

dy

(eNλy+

∫ y

0 β(x)dxPN,1(y)

PN,1(0)

)= 0, (4.54)

d

dy

(eiλy+

∫ y

0 β(x)dxPi,1(y)

PN,1(0)

)=

(i+ 1)λeiλy+∫ y

0 β(x)dxPi+1,1(y)

PN,1(0), 0 ≤ i ≤ N − 1,(4.55)

which are first order differential equations. Next, recalling that F (y) = e−∫ y

0β(x)dx, we

define

Qi(y) =Pi,1(y)

e−iλyF (y)PN,1(0), 0 ≤ i ≤ N, (4.56)


and solve Eqs. (4.54) and (4.55) as

QN(y) = 1, (4.57)

Qi(y) = Qi(0) + (i+ 1)λ

∫ y

0

Qi+1(x)e−λxdx, 0 ≤ i ≤ N − 1. (4.58)

Considering the definition given in Eq. (4.56), and employing Eqs. (4.42), (4.43) and

(4.46), we obtain

QN−1(0) =Nλ+ α− α

∫∞

0QN(y)e

−Nλyf(y)dy

µ, (4.59)

Qi−1(0) =(iλ + µ+ α)Qi(0)− (i+ 1)λQi+1(0)− α

∫∞

0Qi(y)e

−iλyf(y)dy

µ, 1 ≤ i ≤ N−1.

(4.60)

For simplicity, we define

Qi = Qi(0), (4.61)

Bi =

∫ ∞

0

Qi(y)e−iλyf(y)dy. (4.62)

In order Qi and Bi to be finite, we have to show that Qi(y) is finite for all i = 0, · · · , N ,

which is proved in the following Lemma.

Lemma 4.3 limy→∞Qi(y) = Qi(∞) exists and is finite. We also have Qi(y) ≤ Qi(∞).

Proof. From Eq. (4.56), Qi(y) ≥ 0 and from Eq. (4.58), we see that Qi(y) is increasing

in y. Let Qi(∞) = limy→∞ Qi(y). Then, Qi(y) ≤ Qi(∞), 0 ≤ i ≤ N − 1. If we take the

limit as y → ∞ in Eq. (4.58),

Qi (∞) ≤ Qi (0) + (i+ 1)Qi+1 (∞) , 0 ≤ i ≤ N − 1.

Starting with QN(∞) = 1 (due to Eq. 4.57) and using induction from the above equation,

we see that Qi(∞) is finite for all i = 0, · · · , N .

Let Φi(s) =∫∞

0Qi(y)e

−sydy be the LT of the function Qi(y). In this case, the LT’s

of QN (y) and Qi(y) from Eq.s (4.57) and (4.58) will be

ΦN (s) =1

s, (4.63)

Φi(s) =1

sQi + (i+ 1)

λ

sΦi+1(λ+ s), 0 ≤ i ≤ N − 1. (4.64)


Starting from Eq. (4.63) and using the recursive formula in Eq. (4.64), we establish

Φi(s) =N∑

j=i

(j

i

)(j − i)!λ(j−i)

s(λ+ s) · · · ((j − i)λ+ s)Qj , 0 ≤ i ≤ N − 1. (4.65)

Using

k!λk

s(λ+ s) · · · (kλ+ s)=

k∑

j=0

(−1)j(k

j

)1

jλ+ s,

Eq. (4.65) can be rewritten as

Φi(s) =N∑

j=i

(j

i

)Qj

j−i∑

l=0

(−1)l(j − i

l

)1

lλ+ s, 0 ≤ i ≤ N − 1. (4.66)

Observe that (lλ + s)−1 on the right hand side of Eq. (4.66) is the LT of e−lλy. Using

this, when we invert Φi(s), we obtain

Qi(y) =N∑

j=i

(j

i

)Qj

j−i∑

l=0

(−1)l(j − i

l

)e−lλy

=N∑

j=i

(j

i

)Qj

j−i∑

l=0

(j − i

l

)(−e−λy)l

=N∑

j=i

(j

i

)Qj(1− e−λy)

j−i, 0 ≤ i ≤ N − 1. (4.67)

Substituting Eq. (4.67) in Eq. (4.62), we have

Bi =N∑

j=i

Qjζi,j, 0 ≤ i ≤ N, (4.68)

where

ζi,j =

(j

i

)∫ ∞

0

(1− e−λy)j−ie−iλyf(y)dy, j ≥ i. (4.69)

This leads to Eqs. (4.49) and (4.50). Together with Eq. (4.68) as defined in Eq. (4.62),

Eq. (4.61) gives Eqs. (4.47) and (4.48).

We define Di = iλ∫∞

0(Pi,1(y)/PN,1(0))dy. Noting from Eq. (4.58) that dQi(y) =

(i+ 1)λQi+1(y)e−λy, if we rewrite Eq. (4.62) as Bi = −

∫∞

0Qi(y)e

−iλydF (y), integration

yields

Bi = Qi(0) + (i+ 1)λ

∫ ∞

0

Qi+1(y)e−(i+1)λyF (y)dy − iλ

∫ ∞

0

Qi(y)e−iλyF (y)dy.


Considering Eq. (4.56) for Di, the above given equation gives Eqs. (4.51) and (4.53).

With Eq. (4.46) and the definitions given in Eqs. (4.56) and (4.61), we obtain Eq. (4.52).

Corollary 4.3 The probability density function of having i customers out of the system

and an elapsed down time of y is

Pi,1(y) = e−iλyF (y)PN,1(0)

N∑

j=i

(j

i

)Qj(1− e−λy)

j−i, 0 ≤ i ≤ N. (4.70)

Proof. Substituting Eq. (4.56) in Eq. (4.67), we arrive at Eq.(4.70)

Theorem 4.7 The probability that there are no customers in the system when the server

is up is

PN,0 =PN,1(0)

α= ((1 + αE[D])

N∑

i=0

Qi)−1. (4.71)

Proof. By definition∑N

i=0 P i =∑N

i=0(Pi,0 +∫∞

0Pi,1(y)dy) = 1, which by using Eq.

(4.46), becomes∑N

i=0(Pi,0(0)/α+∫∞

0Pi,1(y)dy) = 1. If we divide this equation by PN,1(0),

we have

PN,1(0) = (1

αSN(0) +

∫ ∞

0

SN(y)dy)−1, (4.72)

where SN(y) =∑N

i=0 Pi,1(y)/PN,1(0).

Summing up Eqs. (4.44) and (4.45), we obtain the first order differential equation

d

dySN(y) = −β(y)SN(y),

that has a solution of SN(y) = SN(0)e−∫ y

0β(x)dx = SN(0)F (y). Substituting this in Eq.

(4.72) and using the fact that E[D] =∫∞

0F (y)dy gives us

PN,1(0) = (SN(0)(1

α+ E[D]))−1

Considering Eqs. (4.61) and (4.56), SN (0) =∑N

i=0Qi; after substituting it in the

equation given above, and using the boundary condition in Eq. (4.46) we obtain Eq.

(4.71).


4.3.1 The Multi-class M/M/1//N Queue

If the single unreliable server attends to multiple finite-source populations as described

in Section 4.1.5, with the difference of no setup time and type k customers requiring

exponential service times with rate µk, the analysis presented above can be used only

for class 1 customers. For type k > 1 customers, we need to obtain the LT of the busy

period of type k − 1 customers; this will be used as the LT of the down time r.v. in the

M/M/1//Nk queue with the αk given in Section 4.1.5 as the effective interruption rate.

The result of the busy period analysis from Section 4.1.2 can be used for this purpose.


In this section, we consider a single unreliable server that is already attending to a

group of (class 1) customers. During the times the server is available and there are no

class 1 customers in the system, instead of keeping it idle, the server can be assigned

to serve a secondary group of (class 2) customers. Recall that class 1 customers have

preemptive-priority, and their service level is unaffected by the presence and underlying

characteristics of class 2 customers. Our main focus in this section is on class 2 customers.

Let P k,i be the probability that i customers of type k are out of the system. Observe

that P k,i is nothing but P i in the single class M/G/1//Nk queue for which the inter-

ruption rate and the LT of the server down time/OFF period are chosen as described in

Section 4.1.5. We use P 2,N2 as the service level. The queueing system may represent a

repair shop (the repair crew or resources of which can become unavailable from time to

time), and its type k customers may be broken machines from system k. Then, Nk is the

number of machines that system k sends to the repair shop when they fail. If the repair

shop is a profit center, one would like to serve more type k customers (higher Nk). In

return, system k may require P k,Nk, i.e., the proportion of time all Nk machines to be

functional (out of the repair shop) to be high. In other words, higher P 2,N2 indicates a


higher service level provided to class 2 customers. Therefore, we obtain the maximum

number of class 2 customers (N2) the server can serve while keeping P 2,N2 above certain

targeted levels. We design our numerical experiments to explore how the customer arrival

rates (λ1, λ2), the variability of interruption (D) and service time r.v.s for class 1 and

class 2 customers affect the maximum N2 that the server can handle for a given service

level.

In all the examples, the server becomes unavailable from time to time with rate

α = 0.05. We fix N1 = 10 when we focus on class 2 customers (we consider additional N1

values when we explore the performance of class 1 customers), and assume that there is no

setup time, i.e., U = 0. We choose class 1 and class 2 service time distributions from the

following four distributions, each having a mean of 1, but a different squared-coefficient

of variation, c2S (variance to squared-mean ratio):

1. The 2-stage hyperexponential distribution (H2(a = 0.9, µ1 = 20, µ2 = 0.1047)) with

c2S = 17.245 and density

b(x) = aµ1e−µ1x + (1− a)µ2e

−µ2x.

2. The gamma distribution (Gamma(µ = 0.2, k = 0.2)) with c2S = 5 and density

b(x) =µ(µx)k−1e−µx

Γ(k).

3. The exponential distribution (Exponential(µ = 1)) with c2S = 1 and density

b(x) = µe−µx.

4. The 5-stage Erlang distribution (Erlang(µ = 5, k = 5)) with c2S = 0.2 which is

equivalent to Gamma(5, 5).

We choose the distribution of the down time r.v. D (let its squared-coefficient of variation

be denoted by c2D) from the following four distributions, each having a mean of E[D] = 2:


H2(0.9,10,0.05236) with c2D = 17.245, Gamma(0.1, 0.2) with c2D = 5, Exponential(0.5)

with c2D = 1, and Erlang(2.5, 5) with c2D = 0.2. For each problem, we use the label

“Dist1/Dist2/Distr3” for the distributions of the interruption and service time r.v.s for

classes 1 and 2, respectively. For example, Erlang/Gamma/H2 refers to the interruption

time following Erlang(2.5, 5), class 1 service time following Gamma(0.2,0.2), and class

2 service time following H2(0.9, 20, 0.1047) distributions. Additionally, we consider

λ1 ∈ {0.01, 0.05}, λ2 ∈ {0.01, 0.02, 0.05, 0.1}, and increment N2 in steps of 1 from 1 to

40 and compute P k,Nkfor classes k = 1, 2. We start our numerical analysis in the next

section by considering the impact of customer arrival rates.

4.4.1 The Impact of Customer Arrival Rates

By intuition, one can foresee the degrading impact of a higher arrival rate for any of the

classes at the class 2 service level. Figure 4.1 presents four curves corresponding to four

λ2 values considered for the Gamma/Gamma/Gamma case when λ1 = 0.01. For other

combinations of interruption and service time distributions, we observe similar curves.

For low λ2 values (e.g., 0.01, 0.02), P 2,N2 decreases almost linearly with increasing N2 up

to a critical N2 value beyond which it sharply drops to 0. In contrast, for higher λ2 values,

the degradation of the service level for class 2 displays almost an exponential decay. The

increase in failure rate rapidly reduces the maximum number of class 2 customers that

can be served if the targeted service level is high. In Figure 4.1, we see that at most two

type 2 customers can be served when λ2 = 0.1 as compared to 19 when λ2 = 0.01 if one

aims to have P 2,N2 ≥ 0.7.

Figure 4.2 is like Figure 4.1 except that class 1 customers have a higher arrival rate,

λ1 = 0.05. Here, the server capacity remaining for class 2 reduces so sharply that not

a single class 2 customer can be served when λ2 = 0.1 if the target service level is

P 2,N2 ≥ 0.7. Higher λ1 causes the service level for class 2 to degrade faster too.

When it comes to the impact of variability in the interruption and service time r.v.s,


0 5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

N2

Ser

vice

Lev

el

λ2=0.01

λ2=0.02

λ2=0.05

λ2=0.1

0.7

Figure 4.1: Gamma/Gamma/Gamma case: Service level (P 2,N2) for class 2 customers

when λ1 = 0.01 for different λ2 values

0 5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

N2

Ser

vice

Lev

el

λ2=0.01

λ2=0.02

λ2=0.05

λ=0.1

0.7

Figure 4.2: Gamma/Gamma/Gamma case: Service level (P 2,N2) for class 2 customers

when λ1 = 0.05 for different λ2 values

one might expect that higher variability in any of these would decrease the maximum

number of type 2 customers that can be served, given that the service level should

exceed a targeted level. However, the following sections make several counterintuitive

observations.

4.4.2 The Impact of Class 2 Service Time Distribution

We start by observing the impact of class 2 service time variability on the service level.

We choose the same distribution type for the interruption and class 1 service time r.v.s.,

and form a total of 32 groups of examples (4 different class 1 service time (interruption)


distributions × 2 different λ1 values × 4 different λ2 values). In each group of examples,

for each one of the four distributions assumed for class 2 service time, we find the max-

imum N2’s for five target levels for P 2,N2 . In none of the example groups does lowering

class 2 service time variability increase the maximum N2 that can be served. The max-

imum N2 either remains unchanged or becomes lower with less variance. Table 4.1 lists

two examples.

Table 4.1: The maximum N2 that can be served when class 2 service time distribution

changes and λ1 = 0.01, λ2 = 0.02.

Case P 2,N2≥ 0.9 P 2,N2

≥ 0.8 P 2,N2≥ 0.7 P 2,N2

≥ 0.5 P 2,N2≥ 0.2

Gamma/Gamma/H2 2 6 10 20 34

Gamma/Gamma/Gamma 2 5 9 19 33

Gamma/Gamma/Expo 2 5 9 18 33

Gamma/Gamma/Erlang 2 5 9 18 33

Expo/Expo/H2 3 7 11 21 34

Expo/Expo/Gamma 3 7 11 20 33

Expo/Expo/Expo 3 7 10 19 33

Expo/Expo/Erlang 3 7 10 19 32

For the examples presented in Table 4.1, we can also compute the average number of

class 2 customers out of the system as E[NO2 ] =

∑N2

i=0 iP 2,i; these are listed in Table 4.2.

Note that E[NO2 ] can be used as a secondary service level measure. It may be a more

important service level than P 2,N2; for example, if class 2 customers are machines at a

plant that fail from time to time, and are repaired by the unreliable server in our model,

higher E[NO2 ] corresponds to a higher production rate. If we compare Tables 4.1 and 4.2,

we see that when N2 remains the same, lower variance in service time variability increases

E[NO2 ]. In all 32 groups of examples, we observed that when N2 was fixed, lower variance

in service time for class 2 customers maximized E[NO2 ] (due to space limitations, we do

not present them all here). However, further observations to be discussed in Section 4.4.3


hint that this may not be true all the time, i.e., less variable service times may increase

E[NO2 ] even when N2 is fixed.

Table 4.2: Average number of class 2 customers out of the system (E[NO2 ]) when class 2

service time distribution changes and λ1 = 0.01, λ2 = 0.02.

Case P 2,N2≥ 0.9 P 2,N2

≥ 0.8 P 2,N2≥ 0.7 P 2,N2

≥ 0.5 P 2,N2≥ 0.2

Gamma/Gamma/H2 1.9070 5.6132 9.1639 17.2691 20.7502

Gamma/Gamma/Gamma 1.9120 4.7509 8.4734 17.3614 27.4903

Gamma/Gamma/Expo 1.9139 4.7709 8.5491 16.8653 27.7362

Gamma/Gamma/Erlang 1.9143 4.7752 8.5660 16.9520 28.6757

Expo/Expo/H2 2.8814 6.5951 10.1484 18.2319 19.5825

Expo/Expo/Gamma 2.8970 6.7059 10.4401 18.4700 27.1411

Expo/Expo/Expo 2.9031 6.7511 9.6156 18.0445 27.4068

Expo/Expo/Erlang 2.9044 6.7611 9.6385 18.1517 30.1370

If different types of class 2 service time distributions yield different maximum N2

values, things become more complicated. For instance, for P 2,N2 ≥ 0.5, the cases with

Gamma class 2 service time outperform the cases in their respective groups with the

less variable exponential and Erlang distributions: more specifically, one more class 2

customer is served and a higher E[NO2 ] is achieved when class 2 service times are gamma

r.v.s.

These observations force us to ask two questions regarding the impact of class 1 service

time distribution. First, does its variability have a similar effect on class 1 customers

as the variability of class 2 service time distribution has on the performance of class 2

customers? Second, how are the class 2 customer service levels affected by the high-

priority service time variability? We address these questions in the next section.


4.4.3 The Impact of Class 1 Service Time Distribution

The variability of class 1 service time can have an impact on both classes. Although

our focus is on class 2 service levels, we start by observing how class 1 service levels are

affected. For N1 = 10, we have seen that the average number of class 1 customers out

of the system (E[NO1 ]) increases as the variability of class 1 service time decreases. As

in Section 4.4.3, we first try to determine the maximum N1 that the unreliable server

can serve given a target level for P 1,N1 . We form 8 groups of examples (4 different

interruption distributions × 2 different λ1 values). In each group of examples, for each

one of the four distributions assumed for class 1 service time, we find the maximum N1’s

for five target levels for P 1,N1 . In none of the example groups does lowering class 1 service

time variability increase the maximum N1 that can be served. The maximum N1 either

remains unchanged or becomes lower with less variance. Table 4.3 lists two examples.

These observations for class 1 and its service time r.v. are the same as those for class 2

and its service time r.v. discussed in Section 4.4.2.


changes and λ1 = 0.01.

Case P 1,N1≥ 0.9 P 1,N1

≥ 0.8 P 1,N1≥ 0.7 P 1,N1

≥ 0.5 P 1,N1≥ 0.2

H2/H2 5 13 23 34 36

H2/Gamma 5 13 23 34 35

H2/Expo 5 13 23 34 35

H2/Erlang 5 13 23 34 35

Gamma/H2 7 15 24 34 35

Gamma/Gamma 7 15 24 34 35

Gamma/Expo 7 15 24 33 35

Gamma/Erlang 7 15 24 33 35

For the examples presented in Table 4.3, we compute the average number of class


1 customers out of the system, E[NO1 ] =

∑N1

i=0 iP 1,i, and present them in Table 4.4.

If we compare Tables 4.3 and 4.4, we see that in most of the cases when N1 is the

same, lower variance in class 1 service time variability increases E[NO1 ]. However, there

are exceptions. When N1 = 34 (if we ignore the H2/H2 case) and N1 = 35, higher

class 1 service variability increases E[NO1 ]. In summary, if higher service time variability

increases the expected number of customers out of the systems – unlike in the traditional

M/G/1 queue with constant arrival rates– it will simultaneously decrease the expected

queue lengths.

Table 4.4: Average number of class 1 customers out of the system (E[NO1 ]) when class 1

service time distribution changes and λ1 = 0.01.

Case P 1,N1≥ 0.9 P 1,N1

≥ 0.8 P 1,N1≥ 0.7 P 1,N1

≥ 0.5 P 1,N1≥ 0.2

H2/H2 4.8564 12.5075 21.8308 29.4580 13.1906

H2/Gamma 4.8686 12.6074 22.1721 29.5487 20.6607

H2/Expo 4.8730 12.6439 22.3015 28.9600 18.2805

H2/Erlang 4.8739 12.6515 22.3291 27.8858 17.8359

Gamma/H2 6.8453 14.5303 22.9672 28.6312 21.6171

Gamma/Gamma 6.8722 14.6712 23.3565 28.3546 18.6321

Gamma/Expo 6.8819 14.7231 23.5053 31.3935 16.7175

Gamma/Erlang 6.8839 14.7341 23.5372 31.5300 15.3754

Now we examine how class 1 service time distribution affects class 2 customers. By

choosing the same type of distributions for the interruption and class 2 service time r.v.s,

we generate 32 groups of examples (4 class 2 service time (interruption) distributions ×

2 different λ1 values × 4 different λ2 values). In each group of examples, for each one of

the four distributions assumed for class 1 service time, we find the maximum N2’s for five

target levels for P 2,N2 . Table 4.5 presents two groups of examples in which we see that

lower class 1 service time variability tends to increase the maximum N2. With higher

arrival rates, we see this improvement to vanish.



changes and λ1 = λ2 = 0.01.

Case P 2,N2≥ 0.9 P 2,N2

≥ 0.8 P 2,N2≥ 0.7 P 2,N2

≥ 0.5 P 2,N2≥ 0.2

Gamma/H2/Gamma 4 10 17 33 35

Gamma/Gamma/Gamma 5 11 19 33 35

Gamma/Expo/Gamma 5 12 20 33 35

Gamma/Erlang/Gamma 5 12 20 33 35

Expo/H2/Expo 4 11 18 33 34

Expo/Gamma/Expo 5 12 20 33 34

Expo/Expo/Expo 6 13 21 33 34

Expo/Erlang/Expo 6 14 21 33 34

In 13 groups of examples, for P 2,N2 ≥ 0.2, we observe that higher variance in class 1

service time leads to better performance. Table 4.6 presents two examples where we see

that lowering class 1 service time variability is not helpful to serve more class 2 customers.


changes.

(λ1,λ2) Case Max N2

(0.05, 0.05) Erlang/H2/Erlang 10

Erlang/Gamma/Erlang 9

Erlang/Expo/Erlang 8

Erlang/Erlang/Erlang 8

(0.05, 0.1) H2/H2/H2 8

H2/Gamma/H2 7

H2/Expo/H2 7

H2/Erlang/H2 7


Since class 2 customers perceive class 1 service times plus the original interruption

time as the effective interruption time (see Section 4.1.5), we expect that the original

interruption time distribution will have an effect on class 2 customers that is similar to

the class 1 service time distribution. This is discussed in the next section.

4.4.4 The Impact of Interruption Time Distribution

To see the impact of the variability in the interruption time (server down time/OFF

period) r.v. on class 2 service levels, we choose the same service time distributions for

both classes and form 32 groups of examples (4 class 1 (class 2) service time distributions

× 2 different λ1 values × 4 different λ2 values). In each group of examples, for each one of

the four distributions assumed for the interruption time r.v., we find the maximum N2’s

for five target levels for P 2,N2 . In most of the example groups, we observe that lowering

interruption time variability enables the system to serve more class 2 customers. Table

4.7 presents two groups of examples in which we see a benefit of less variable interruption

time distribution. In examples that we do not present here due to space limitations, we

observe that higher customer arrival rates for both classes leave less space for performance

improvement through reducing interruption time variability. While in Table 4.7, for

P 2,N2 ≥ 0.7, the system can serve three more class 2 customers by having interruption

times follow an Erlang distribution instead of H2, when arrival rates increase, the increase

can be due to one or no additional class 2 customers.

Similar to the discussion on the impact of class 1 service time distribution in Section

4.4.3, in 5 groups of examples, when P 2,N2 ≥ 0.2, we observe that less variance can

degrade the performance. Table 4.8 lists two groups of examples in which less variance

in interruption times lowers the number of class 2 customers that can be served.


Table 4.7: The maximum N2 that can be served when interruption time distribution

changes and λ1 = λ2 = 0.01.

Case P 2,N2≥ 0.9 P 2,N2

≥ 0.8 P 2,N2≥ 0.7 P 2,N2

≥ 0.5 P 2,N2≥ 0.2

H2/H2/H2 3 8 16 33 35

Gamma/H2/H2 4 10 18 33 35

Expo/H2/H2 4 11 19 33 35

Erlang/H2/H2 4 11 19 33 35

H2/Erlang/Erlang 4 11 19 33 34

Gamma/Erlang/Erlang 5 12 20 33 34

Expo/Erlang/Erlang 6 14 21 33 34

Erlang/Erlang/Erlang 7 14 22 33 34

Table 4.8: The maximum N2 that can be served when interruption time distribution

changes and P 2,N2 ≥ 0.2.

(λ1,λ2) Case Max N2

(0.01, 0.05) H2/Erlang/Erlang 15

Gamma/Erlang/Erlang 15

Expo/Erlang/Erlang 14

Erlang/Erlang/Erlang 14

(0.05, 0.02) H2/Gamma/Gamma 22

Gamma/Gamma/Gamma 21

Expo/Gamma/Gamma 21

Erlang/Gamma/Gamma 21

Chapter 5

The Multilevel Rationing Policy

We consider a repair shop that serves m classes/fleets of machines parameterized by

k = 1, ..., m. Each class k consists of Nk machines (type k machine) that fail from time

to time due to the failure of a single critical component. Failed components are repaired

at a repair shop modeled as a single server queueing system. We assume that repair times

follow an exponential distribution with rate µ. Additionally, spare components are kept

to decrease the proportion of times these classes may have down machines due to the

lack of the critical component. Times to failure, that is the periods between installation

of a new or repaired component in class k and the next failure instant of this component,

follow an exponential distribution with rate λk (implying that each repair makes the

component as good as new, and the failure rate only depends on the system using it).

When a type k machine fails, its broken component is sent immediately to the repair

shop. If there is available stock for class k, a spare component can be installed imme-

diately to replace the failed component, and the machine can stay operational without

experiencing any down time. Otherwise, the number of operational type k machines

decreases by 1, costing the system bk (down time cost) per unit time until a component

can be installed.

The centralized inventory is kept to reduce the down time cost at the expense of

100

Chapter 5. The Multilevel Rationing Policy 101

incurring a holding cost of h per unit stocked per unit time. To reduce the long-run

average cost per unit time, one must decide on a) the structure of the inventory, and

b) the allocation rule for a repaired component. Broadly stated, the structure of the

inventory indicates whether there are reserved portions of the inventory for each class

and/or a portion of the inventory shared by/serving all types of classes. The allocation

rule indicates whether repaired components are dispatched on an FCFS basis or according

to a priority rule among different classes requiring a component. Chapter 3 describes the

hybrid FCFS (HF) and the hybrid priority (HP) policies, alternatives we compare to our

proposed multilevel rationing (MR) policy.

The MR policy described in this chapter prioritizes customer classes 1 to m from

highest to lowest. Under the MR policy, there are non-decreasing threshold inventory

levels Lk, k = 1, . . . , m + 1 with L1 = 0 and Lm+1 = S where S is the base-stock level

of the inventory. When the inventory level reaches Lm+1 = S, there are no more broken

components in the repair shop. If the inventory level I is between Lk+1 and Lk (i.e.,

Lk < I ≤ Lk+1), spare components are used only for failed machines of types 1 to k. In

other words, when Lk < I ≤ Lk+1, even if there are down machines in classes k+1 to m,

the finished product is placed in inventory. When there is no positive-stock, the repaired

component is allocated to the highest-priority class with down machines.

In the literature, the MR policy has been modeled in settings where each customer

class arrives according to a homogeneous Poisson process. When this is the case, cus-

tomers can be prioritized according to their backlogging cost (corresponding to our down

time cost); that is, between two classes, the one with the higher backlogging cost has

a higher priority. In our problem setting, for each class, the customer arrival rate is

state-dependent; in other words, it varies based on the number of down machines. This

prevents us from determining the priority of a class by just considering its down time

cost (this is also true for the HP policy).

In the next section, we propose a recursive method to compute the system cost of the


MR policy.

5.1 The Multilevel Rationing Policy

Let CMR := C(L1 = 0, L2, . . . , Lm+1 = S) be the long-run average cost of the MR policy

given rationing levels L1 = 0, L2, . . . , Lm+1, which is

CMR =

m∑

k=1

bk

Nk∑

i=0

(Nk − i)P k,i + h

Lm+1∑

i=0

iπ(i), (5.1)

where π(i) and P k,i are the steady-state probabilities of having i spare parts in the

inventory, and i machines to be functional in class k, respectively. We design a recursive

method to obtain π(i) and P k,i. To do this, we construct a series of sub-systems k,

k = 1, . . . , m in which an inventory with a base-stock level of Lk+1 is kept. We denote the

steady-state probabilities of having i spare parts in inventory and i functional machines

in class j, j = 1, . . . , k in Sub-system k by πk(i) and Pk

j,i, respectively. Once πm(i) and

Pm

k,i are obtained in the last round of the algorithm for Sub-system m – which is in fact

the original system – we have π(i) = πm(i) and P k,i = Pm

k,i for k = 1, . . . , m, thus, we

can compute the cost in Eq. (5.1). The algorithm starts with the special Sub-system 0

explained below. Once the underlying distributions of Sub-system k − 1 are obtained,

the algorithm continues with Sub-system k.

Sub-system 0: We start with a system serving a single fleet of N1 machines for which

no inventory of spares is kept. Then, P0

1,i can be obtained by constructing a simple birth-

and-death process for which the states are the number of customers (broken components)

at the repair shop, and the customer arrival rates depend on the number of down type 1

machines.

Sub-system 1: When we add an inventory of L2 spares to Sub-system 0, we arrive

at Sub-system 1. Consider the sample path of Sub-system 1 given in Figure 5.1. The

positive values on the x-axis shows how many spares are on hand, and the absolute value

of the negative values how many type 1 machines are down. The Markov chain (MC)


juxtaposed on the left of the figure shows the failure rate (customer arrival rate at the

repair shop) and repair rates based on the number of units in the inventory or the number

of down machines that is given on the x-axis. Note that C1 proportion of the time – to

be determined –, there is no inventory in Sub-system 1 which reduces it to Sub-system

0. Thus, C1P0

1,i gives the steady-state probability of having i functional machines and

there is no inventory. Clearly, P1

1,i = C1P0

1,i, i = 0, . . . , N1 − 1, and because otherwise

N1 machines are up – including the periods when there are spare parts in the inventory

– P1

1,N1= 1−

∑N1−1i=0 P

1

1,i.

Figure 5.1: A Sample Path of the Single-Class Sub-system 1

Then, making use of the portion of the MC corresponding to the positive x values,

for 1 ≤ i ≤ L2,

π1(i) =

(µ

N1λ1

)i

C1P0

1,N1,

where P0

1,N1is also the proportion of time the server is idle in Sub-system 0, and

C1 =

(1 + P

0

1,N1

L2∑

i=L1+1

(µ

N1λ1

)i)−1

.

Recall that the superscript 1 in π1(i) and P1

1,i explains that these probabilities are found

for Sub-system 1.

Sub-system 2: When we introduce class 2 customers as the low-priority class in Sub-

system 1, we arrive at Sub-system 2. If no shared inventory is assumed (L3 = L2), any


broken component from class 2 can be repaired only if all machines in class 1 are up,

and the inventory (reserved for high-priority class 1) is at L2. That is, periods the server

repairs to reduce the number of down machines in class 1, or to increase the inventory

level are perceived as an interruption by class 2 customers. Given this, each time the

number of failed type 2 machines increases to 1, the corresponding broken component

either sees an idle server ready to repair if the inventory is at L2, or an interrupted server

otherwise. In the sample path given in Figure 5.2, the dashed lines show the number of

functional type 2 machines and the solid lines the number of spares in inventory. Here we

see that the first and second times a type 2 machine fails leaving N2− 1 type 2 machines

functional, the server is “idle”, and the third time it is “down” for the corresponding

type 2 arrival at the repair shop. A while after the second instance, we see that a type

1 demand takes away one unit from the inventory (lowering its level to L2 − 1). The

component being repaired for the type 2 customer up until then automatically becomes

an order for class 1 or the inventory. Only once the inventory reaches L2, the repair of

the first broken component in the repair shop is initiated for the longest awaiting type 2

down machine.

Figure 5.2: A Sample Path in Sub-system 2 when L3 = L2

These observations help us realize that from the stand point of class 2 customers

(type 2 down machines), Sub-system 2 when L3 = L2 is an M/M/1//N2 queueing system

with a single unreliable server in which they are the only customers served. From their


perspective, the server can be interrupted both when it is idle or serving a (type 2)

customer with a failure rate of Λ1 = N1λ1. Let D1 denote the interruption times starting

with a class 1 arrival reducing the inventory level to L2−1 and ending once the inventory

level reaches L2 again. In the MC shown in Figure 5.1, the first passage time from the

second state at the top (corresponding to inventory level being L2 − 1) to the state at

the top (corresponding to inventory level being L2) is perceived as an interruption by

class 2 customers. The first passage times in finite state, continuous-time MC’s (CTMC),

(and thus, D1) follow a phase-type distribution (PTD) (e.g., Kulkarni, 1989). In Chapter

4, a methodology is developed to obtain the steady-state distribution of the number of

customers out of this M/M/1//N2 queue which is also the number of functional type 2

machines in Sub-system 2 denoted by P1

2,i (where the superscript 1 refers to L3 = L2

since it is, in fact, Sub-system 1). The proportion of time the server is idle in the

M/M/1//N2 queue is (1 +E[B2](Λ1 +N2λ2))−1 where E[B2] is the expected length of a

busy period in the M/M/1//N2 queue which can be found from Eq. (4.18). Note that

P1

2,N2= (1 + E[B2](Λ1 +N2λ2))

−1.

If L3 > L2, the inventory is depleted at a rate of N1λ1 + N2λ2 until the inventory

declines to L2. Since C2 proportion of the time – to be determined –, the inventory level

is at or below L2 (i.e., in Sub-system 2 when L3 = L2), we can write for L2 < i ≤ L3,

with Λ2 = N1λ1 +N2λ2

π2(i) =

(µ

Λ2

)i−L2

C2P1

2,N2=

C21 + Λ2E[B2]

(µ

Λ2

)i−L2

,

where,

C2 =

1 +

∑L3

i=L2+1

(µ

N1λ1+N2λ2

)i

1 + Λ2E[B2]

−1

.

Using superscript 2 to denote that the relevant probabilities belong to Sub-system 2,

P2

1,i = C2P1

1,i, i = 0, . . . , N1, P2

2,i = C2P1

2,i,i = 0, . . . , N2, and π2(i) = C2π1(i), 0 ≤ i ≤ L2.

Sub-system k + n − 1: Consider Sub-system k − 1 with Lk as the base-stock of the

inventory level in which classes 1 to k − 1 are served for which we have the steady-state


probabilities of having i units in the inventory πk−1(i), i = 0, . . . , Lk, and for each class

j, the probability of having i functional machines Pk−1

j,i , j = 1, . . . , k − 1, i = 0, . . . , Nj.

When classes k to k + (n− 1) are added subject to Lk = Lk+1 = · · · = Lk+(n−1) < Lk+n,

we arrive at Sub-system k + n − 1. Observe that, under the MR policy, a separate

inventory threshold for each class does not need to be more cost effective. Instead,

the same threshold used for a group of classes may decrease costs more. If Lk = Lk+1 =

· · · = Lk+(n−1) < Lk+n, when the inventory downcrosses Lk, the system starts not feeding

classes k to k + n − 1 with spares. When the inventory level is at Lk, the repair shop

dispatches the repaired component to the highest-priority class with down machines.

For classes j = k, . . . , k+(n−1), we obtain Pk−1

j,i from a n-class priority M/M/1//N

queue (which is also Sub-system k − 1) by setting the server failure rate to Λk−1 =

∑k−1i=1 Niλi. In this priority queue, Class k has the highest and Class k + (n− 1) has the

lowest priority. In addition, the server is subject to interruptions caused by classes 1 to

k − 1. Although the server interruption time Dk−1 (subscript referring to Sub-system

k − 1) is a PTD r.v., characterizing it gets more difficult with the number of classes

increasing (in Section 5.2.1, we propose a method to approximate it). Using Dk−1, the

n-class priority M/M/1//N queue analysis in Chapter 4 provides Pk−1

j,i .

If an additional inventory of Lk+n − Lk−1 units are to be depleted by all the k +

(n − 1) classes, the rest of the analysis is similar to the one we had for Sub-system

2. Letting πk+n−1(i) be the steady-state probability of having i units on hand, the

inventory is depleted at a rate of Λk+(n−1) =∑k+(n−1)

i=1 Niλi until it hits Lk−1. This

coincides with the start of the busy period in Sub-system k + n − 1 with Lk+n = Lk−1

or the M/M/1//Nk+n−1 queue. The mean length of the busy period in this queue,

E[Bk+(n−1)], can be found Eq. (4.18), and the steady-state probability that the server is

idle is(1 + Λk+(n−1)E[Bk+n−1]

)−1. Then, for Lk−1 < i ≤ Lk+n,

πk+n−1(i) =

(µ

Λk+(n−1)

)i−Lk−1 Ck+n−1

1 + Λk+(n−1)E[Bk+(n−1)]

where Ck+n−1 is the proportion of times the inventory in Sub-system k+n−1 is less than


or equal to Lk−1, and is given by

Ck+n−1 =

1 +

∑Lk+n

i=Lk−1+1

(µ

Λk+(n−1)

)i−Lk−1

1 + Λk+(n−1)E[Bk+(n−1)]

−1

.

Note that πk+n−1(i) = Ck+n−1πk−1(i) for i = 0, . . . , Lk, and for each class j, P

k+n−1

j,i =

Ck+n−1Pk−1

j,i , j = 1, . . . , k + n− 1, i = 0, . . . , Nj.

5.2 Obtaining the Moments of the Server Interrup-

tion Time for Class k in Sub-system k

In this section, we propose an approximation for the interruption time experienced by

class k customers in Sub-system k. Suppose that the inventory has a base-stock level of

Lk+1 and spares are depleted by all classes as long as the inventory is above Lk(≤ Lk+1).

As explained in Section 5.1, from class k point of view, server interruptions, to be denoted

by Dk, start when the inventory level decreases to Lk − 1 and end when it reaches Lk

again, and occur at a rate of Λk−1 =∑k−1

j=1 Njλj. The interruption times are first-passage

times in the finite-state CTMC of the inventory level from the state of having Lk − 1

units in inventory to the state of Lk, thus, follow a PTD the characterization of which is

difficult for high values of k. Instead, we propose obtaining the first n moments of Dk,

and then fit a PTD r.v. with a simpler structure by moment-matching techniques.

To obtain these moments, we break down the interruption time for class k customers

as shown in Figure 5.3.

Once the system enters Lk−1, there are two events possible. Either, the inventory can

go up to Lk, without first hitting Lk−1 in TLk−1,Lktime units with probability QLk−1,Lk

,

or in TLk−1,Lk−1time units it declines to Lk−1 with probability QLk−1,Lk

. Interpreting

this as a Gambler’s ruin problem that starts in state Lk − 1 and ends either in state Lk


Figure 5.3: Break Down of the Interruption Time of Class k

or Lk−1, we have

QLk−1,Lk−1= 1−QLk−1,Lk

=1− µ

Λk−1

1−(

µΛk−1

)Lk−Lk−1.

If the inventory level reaches Lk first, the interruption ends. Otherwise, after hitting

Lk−1 it takes the inventory level TLk−1,Lk−1+1 time units to climb to Lk−1 + 1. From

this point in time, the system goes through a random but a finite number (since all the

states of the underlying CTMC are recurrent) of sub-cycles before it reaches Lk. With

state Lk−1+1 being the initial state, in each sub-cycle, the inventory level either reaches

Lk, without re-dropping to Lk−1, in TLk−1+1,Lktime units with probability QLk−1+1,Lk

, or

with probability QLk−1+1,Lk−1it declines again to Lk−1 in TLk−1+1,Lk−1

time units. Again

interpreting this process as a Gambler’s ruin problem that starts in state Lk−1 + 1 and

ends either in state Lk or Lk−1, we have

QLk−1+1,Lk−1= 1−QLk−1+1,Lk

=1−

(µ

Λk−1

)Lk−Lk−1−1

1−(

µΛk−1

)Lk−Lk−1.

We will use the following Theorem in obtaining the moments of the random variables

TLk−1,Lk, TLk−1+1,Lk

, TLk−1+1,Lk−1, and TLk−1,Lk−1

.


Theorem 5.1 In a continuous time Markov chain with states i ∈ {0, . . . , m} and tran-

sition probabilities of pi,j, letting states 0 and m be the absorbing states, the n-th moment

of the absorption time r.v. from state i to state 0, given that state m is avoided, denoted

by L(n)

i , for i ∈ {1, . . . , m− 1}, is

L(n)

i = E(Y ni ) +

n−1∑

l=1

n

l

(E(Y l

i )∑

k 6=0,k 6=m

Qk

Qi

pi,kL(n−l)

k

)+

∑

k 6=0,k 6=m

Qk

Qi

pi,kL(n)

k , (5.2)

where Yi is the r.v. denoting the sojourn time in state i, and Qi is the probability of

reaching state 0 starting from i.

Proof. We introduce the following events and r.v.s to present the proof.

Ai,j: The event of reaching state j from state i in a single step of transition

Ai,◦,k: The event of eventually reaching state k from state i

Xi: The time to reach state 0 or m from state i

Let I(E) denote the indicator function which equals 1 if event E is true and 0 other-

wise. Then,

Xi = XiI(Ai,◦,0) +XiI(Ai,◦,m).

Exiting state i, the system can be in any state after the first transition, which implies

that∑

k I(Ai,k) = 1. If the first state entered after leaving state i is either 0 or m, then

the remaining time to reach state 0 is zero. Otherwise it is Xk, and

XiI(Ai,◦,0) = YiI(Ai,◦,0) +∑

k 6=0,k 6=m

I(Ai,k)XkI(Ak,◦,0).

By definition, L(n)

i = E[Xni |Ai,◦,0] = E[Xn

i I(Ai,◦,0)]/Qi (recall that Qi is the probability


that Ai,◦,0 is true). Using the fact that Xni I(Ai,◦,0) = (XiI(Ai,◦,0))

n,

E[(XiI(Ai,◦,0))n] = E

[(YiI(Ai,◦,0) +

∑

k 6=0,k 6=m

I(Ai,k)XkI(Ak,◦,0)

)n]

= E[Y ni ]E[I(Ai,◦,0)

n] +∑

k 6=0,k 6=m

E[I(Ai,k)n]E[(XkI(Ak,◦,0))

n]

+

n−1∑

l=1

n

l

(E[Y l

i ]∑

k 6=0,k 6=m

E[I(Ai,k)n−l]E[Xn−l

k I(Ai,◦,0)lI(Ak,◦,0)

n−l]

).

Noting that E[I(E)l] = E[I(E)], which is the probability that event E is true, we have

E[I(Ai,◦,0)] = Qi and E[I(Ai,k)] = pi,k. Also I(Ai,◦,0)lI(Ak,◦,0)

n−l = I(Ak,◦,0). Then,

E[(XiI(Ai,◦,0))n] = E[Y n

i ]Qi +∑

k 6=0,k 6=m

pi,kE[Xnk |Ak,◦,0]Qk

+

n−1∑

l=1

n

l

(E[Y l

i ]∑

k 6=0,k 6=m

pi,kE[Xnk |Ak,◦,0]Qk

).

Dividing both sides by Qi yields Eq. (5.2).

We employ Theorem 5.1 in our problem to obtain the moments of TLk−1,Lk, TLk−1+1,Lk−1

,

TLk−1+1,Lk, and TLk−1,Lk−1

in the following Corollary.

Corollary 5.1 With m = Lk − Lk−1, in Sub-system k, we have

E[T nLk−1,Lk

] = E[T nLk−1+1,Lk−1

] = L(n)

1 ,

E[T nLk−1+1,Lk

] = E[T nLk−1,Lk−1

] = L(n)

Lk−Lk−1−1,

from Eq. (5.2) by setting

Qi =1−

(µ

Λk−1

)m−i

1−(

µΛk−1

)m ,

pi,i−1 = 1− pi,i+1 =Λk−1

µ+ Λk−1

,

E[Y ni ] = n(µ+ Λk−1)

−n, i ∈ {1, . . . , Lk − Lk−1 − 1}.


Since the moments of the sojourn times in each state i ∈ {1, . . . , Lk −Lk−1 − 1}, and

the state transition probabilities among them are not state-dependent, employing the

following Corollary L(n)

i for i ∈ {1, . . . , Lk − Lk−1 − 1} can be computed recursively.

Corollary 5.2 The following recursion gives the n-th moment of the absorption time

r.v. from state i, i ∈ {1, . . . , Lk − Lk−1 − 1}, to state 0 given that state m is avoided as

L(n)

i = a(n)i + L

(n)

i−1,

where L0 = 0 and

a(n)i−1 =

C(n)i +

(µ

µ+Λk−1

)D−1

i a(n)i

1−(

µµ+Λk−1

)D−1

i

,

where am = 0 and

C(1)i = E[Yi],

C(2)i = (E[Y 2

i ]− 2E[Yi]2) + 2E[Yi]Li,

C(3)i = E[Y 3

i ] +(3E[Y 2

i ]− 6E[Yi]2) (

Li − E(Yi])+ 3E[Yi]

(L(2)

i − E[Y 2i ]),

and

Di =Qi−1

Qi

=1−

(µ

Λk−1

)Lk−Lk−1−i+1

1−(

µΛk−1

)Lk−Lk−1−i.

Proof. Defining Pd = pi,i−1 and Pu = pi,i+1 , Eq. (5.2) for the first moment can be

simplified to

Li = E[Yi] + PdQi−1

Qi

Li−1 + PuQi+1

Qi

Li+1, i ∈ {1, . . . , m− 1} .

Hence,

L1 = E[Y1] + PuD−12 L2, (5.3)

Li = E[Yi] + PdDiLi−1 + PuD−1i+1Li+1, i ∈ {2, . . . , m− 2} , (5.4)

Lm−1 = E[Ym−1] + PdDm−1Lm−2. (5.5)


By substitution, PdDm−1 = 1. Letting am−1 denote E[Ym−1], Eq. (5.5) can be rewritten

as

Lm−1 = am−1 + Lm−2.

Given the fact that Li = ai + Li−1 holds for i = m− 1, from Eq. (5.4), we get

Li−1 = E[Yi−1] + PdDi−1Li−2 + PuD−1i (ai + Li−1.),

Li−1 = E[Yi−1] + PuD−1i ai + PdDi−1Li−2 + PuD

−1i Li−1,

Li−1 =E[Yi−1] + PuD

−1i ai

1− P uD−1i

+PdDi−1

1− P uD−1i

Li−2.

Using the first expression on the RHS of the equation above, we redefine ai as

ai =E[Yi] + PuD

−1i+1ai+1

1− P uD−1i+1

,

where am = 0. By substitution PdDi−1/(1− P uD−1i ) = 1. Therefore, we arrive at

Li = ai + Li−1,

which by induction holds for i ∈ {2, . . . , m− 1} . By using this finding in Eq. (5.3), we

obtain

L1 = E[Y1] + PuD−12

(a2 + L1

)= E[Y1] + PuD

−12 a2 + PuD

−12 L1,

L1 =E[Y1] + PuD

−12 a2

1− PuD−12

= a1 .

The proof for the other moments is similar.

Figure 5.4 shows a sample path of the interruption time experienced by class k

customers. Starting from the state Lk − 1, there are two possible trajectories before

the inventory level reaches Lk again. With Corollary 5.1, we have the moments of

E[T nLk−1,Lk

] = L(n)

1 corresponding to the time represented by the upper branch that is

realized with probability QLk−1,Lk. The time it takes on the other trajectory represented

by the lower branch, which is realized by probability QLk−1,Lk−1, is the sum of TLk−1,Lk−1

,

the time it takes the inventory level to decline to Lk−1 from Lk−1, and the r.v. denoting


the time it takes the inventory level to reach Lk starting from Lk−1. We denote the latter

by Tu. We have the moments E[T nLk−1,Lk−1

] = L(n)

Lk−Lk−1−1 from Corollary 5.1. Recalling

that Dk is the interruption time r.v. for class k denoting the time it takes the inventory

level reach Lk from Lk − 1, the following Theorem provides its moments plus the mo-

ments of Tu. Before presenting the Theorem, note that E[I(ALk−1+1,Lk)] = QLk−1+1,Lk

and E[I(ALk−1+1,Lk−1)] = QLk−1+1,Lk−1

.

Figure 5.4: A Sample path of the interruption time for class k

Theorem 5.2 The nth moment of Dk, i.e., the interruption time experienced by class k

is

E[Dnk ] = QLk−1,Lk

E[T nLk−1,Lk

] +QLk−1,Lk−1E[(TLk−1,Lk−1

+ Tu

)n], (5.6)

for which E[T nu ] can be obtained from

E[T nu ] = E

[(TLk−1,Lk−1+1 + I(ALk−1+1,Lk

)TLk−1+1,Lk+ I(ALk−1+1,Lk−1

)(Tu + TLk−1+1,Lk−1))n]

.

(5.7)

Proof. From Figure 5.4, we see that Eq. (5.6) is a direct result of the two possible

trajectories the inventory level can follow: Dk is the sum of TLk−1,Lk−1and Tu with

probability QLk−1,Lk−1, and equals TLk−1,Lk

otherwise. Figure 5.4 also shows that

Tu = TLk−1,Lk−1+1 + I(ALk−1+1,Lk)TLk−1+1,Lk

+ I(ALk−1+1,Lk−1)(Tu + TLk−1+1,Lk−1

),

from which Eq. (5.7) follows. Here, the moments of all the terms except TLk−1,Lk−1+1

are given in Corollary 5.1. Observe that, in parallel to our discussion in Section 5.1,


TLk−1,Lk−1+1 is the busy period in the M/M/1//Nk−1+1 queue with an unreliable server

that fails at a rate of Λk−2 with Dk−1 as the interruption time r.v., and λk as the arrival

rate for each of Nk−1 + 1 customers. To obtain Dk−1, we need the same recursion which

will start from the M/M/1//N1 queue serving class 1 customers for which the server

does not experience any interruptions.

Remark (1): If Lk = Lk−1 + 2, then TLk−1,Lk= TLk−1,Lk−1

=TLk−1+1,Lk= TLk−1+1,Lk−1

.

Remark (2): If Lk = Lk−1+1, then TLk−1,Lk= TLk−1,Lk−1

=TLk−1+1,Lk= TLk−1+1,Lk−1

= 0.

Remark (3): If Lk = Lk−1, then we have the Sub-system k + n− 1 in Section 5.1.

5.2.1 The MR Policy Approximation

If the actual Dk in each Sub-system k can be characterized and used in the recursive

method designed in Section 5.1, then CMR can be computed exactly. However, with

higher number of classes served, it is difficult to characterize the distribution of Dk and,

thus, to compute CMR exactly. As an alternative, we propose the MR policy approxima-

tion that modifies the algorithm presented in Section 5.1 in the following way: In each

Sub-system k, the first three moments of Dk are found from Theorem 5.2. Then, using

the moment-matching techniques (e.g., Altıok, 1997, page 52), PTD’s with simpler forms

are identified so that the first n moments of these PTD’s equal the first n moments of Dk.

In practice, n is usually taken to be two or three. When these PTD’s are employed in an-

alyzing Sub-system k, we have the MR policy approximation. Using this approximation,

we eventually compute CMR for given rationing levels L1 = 0, L2, . . . , Lm+1. In the next

section, we will test how accurate it would be to use the the MR policy approximation.



In this section, we address three questions: (i) How accurate is CMR found by using

the MR policy approximation proposed in Section 5.2.1? (ii) What is the relative per-

formance of the MR policy with respect to the HF and HP policies? Does it lead to

significant cost savings? (iii) How close is the optimal cost of the MR policy with re-

spect to the optimal cost?

To address these questions, we consider a system in which three classes with NI = 5,

NII = 10, and NIII = 15 are served. The repair shop repairs the failed components

with an exponentially distributed repair time with rate µ = 3. The system incurs a

holding cost of h = 1 for each spare part in the inventory per unit time. We choose a

different down time cost for each class from the set {10, 50, 100}. Furthermore, we set a

different failure rate for each class by equating Nkλk/µ, k = I, II, III, to a value in the

set {0.7, 0.8, 0.9}, different from the values used for other classes. This gives a total of

36 examples which are presented in Tables 5.1 and 5.2.

In Section 5.3.1, where we address our first question, the results indicate that the MR

policy approximation is highly accurate in computing CMR. In section 5.3.2, where we

address the second question, the numerical results show that the MR policy outperforms

both the HF and HP policies. Finally, in Section 5.3.3, we obtain the optimal cost of

the problem studied numerically while addressing the third question, and compare if the

MR policy is close to the optimal policy. It appears that in most of the cases, the MR

policy turns out to be optimal.

5.3.1 The Accuracy of the the MR Policy Approximation

In this section, we use the MR policy approximation proposed in Section 5.2.1 by iden-

tifying the parameters of a 2 stage Mixture of Generalized Erlang (MGE) distribution

that have the same first three moments of Dk in each Sub-system k. Note that a 2-stage


Table 5.1: Parameters of the Examples-I

No NIλI/µ NIIλII/µ NIIIλIII/µ λI λII λIII bI bII bIII

1 0.7 0.8 0.9 0.14 0.08 0.06 10 50 100

2 0.7 0.8 0.9 0.14 0.08 0.06 10 100 50

3 0.7 0.8 0.9 0.14 0.08 0.06 50 10 100

4 0.7 0.8 0.9 0.14 0.08 0.06 50 100 10

5 0.7 0.8 0.9 0.14 0.08 0.06 100 10 50

6 0.7 0.8 0.9 0.14 0.08 0.06 100 50 10

7 0.7 0.9 0.8 0.14 0.09 0.053 10 50 100

8 0.7 0.9 0.8 0.14 0.09 0.053 10 100 50

9 0.7 0.9 0.8 0.14 0.09 0.053 50 10 100

10 0.7 0.9 0.8 0.14 0.09 0.053 50 100 10

11 0.7 0.9 0.8 0.14 0.09 0.053 100 10 50

12 0.7 0.9 0.8 0.14 0.09 0.053 100 50 10

13 0.8 0.7 0.9 0.16 0.07 0.06 10 50 100

14 0.8 0.7 0.9 0.16 0.07 0.06 10 100 50

15 0.8 0.7 0.9 0.16 0.07 0.06 50 10 100

16 0.8 0.7 0.9 0.16 0.07 0.06 50 100 10

17 0.8 0.7 0.9 0.16 0.07 0.06 100 10 50

18 0.8 0.7 0.9 0.16 0.07 0.06 100 50 10

MGE r.v. is an exponential r.v. with rate µ1 (sum of two exponential r.v.s with rates µ1

and µ2) with probability a (1-a). In the rest of the discussion on numerical results, CMR

is the cost found via the MR policy approximation. Here, our objective is to test if using

2-stage MGE’s in computing CMR is accurate enough.

Since down time cost is not sufficient to determine how to prioritize classes before


Table 5.2: Parameters of the Examples-II

No NIλI/µ NIIλII/µ NIIIλIII/µ λI λII λIII bI bII bIII

19 0.8 0.9 0.7 0.16 0.09 0.047 10 50 100

20 0.8 0.9 0.7 0.16 0.09 0.047 10 100 50

21 0.8 0.9 0.7 0.16 0.09 0.047 50 10 100

22 0.8 0.9 0.7 0.16 0.09 0.047 50 100 10

23 0.8 0.9 0.7 0.16 0.09 0.047 100 10 50

24 0.8 0.9 0.7 0.16 0.09 0.047 100 50 10

25 0.9 0.7 0.8 0.18 0.07 0.053 10 50 100

26 0.9 0.7 0.8 0.18 0.07 0.053 10 100 50

27 0.9 0.7 0.8 0.18 0.07 0.053 50 10 100

28 0.9 0.7 0.8 0.18 0.07 0.053 50 100 10

29 0.9 0.7 0.8 0.18 0.07 0.053 100 10 50

30 0.9 0.7 0.8 0.18 0.07 0.053 100 50 10

31 0.9 0.8 0.7 0.18 0.08 0.047 10 50 100

32 0.9 0.8 0.7 0.18 0.08 0.047 10 100 50

33 0.9 0.8 0.7 0.18 0.08 0.047 50 10 100

34 0.9 0.8 0.7 0.18 0.08 0.047 50 100 10

35 0.9 0.8 0.7 0.18 0.08 0.047 100 10 50

36 0.9 0.8 0.7 0.18 0.08 0.047 100 50 10

computing the system cost, for each problem, we have 6 different ways of prioritization of

the classes (call it priority sequencing), and compute the corresponding optimal rationing

levels and costs as follows: For a given L4 = 0, . . . , 12, from 0 to L4, L2 can assume

L4 + 1 values. Given L4 and L2, L3 can assume L4 − L2 + 1 values. This gives a total

(L4/2 + 1)(L4 + 1) combinations of L3 and L2, i.e., a total 445 threshold sets for each


priority sequencing in each example. Using the algorithm in Section 5.2.1, we obtain

the optimal MR policy with the threshold set and priority sequencing that give the

minimum cost. These are presented in Tables 5.3 and 5.4. In each example, the best

priority sequence orders classes with respect to their down time costs. For instance, in

example 1, the highest priority class 1 is class III, and the lowest priority class 3 is class

I. In all the examples, the base-stock level (L4) is either 10 or 11, and L2 = 0. In each

case, the MR policy allows all classes to deplete the inventory until the inventory level

hits L3. If the inventory level is positive but less than or equal to L3, if a machine from

class 3 fails, no spare part is sent from the inventory. If there is no inventory, repaired

component is sent out to the highest priority class with down machines.

To assess the accuracy of the MR policy approximation, we simulated each problem

using the optimal inventory thresholds/rationing levels listed in Tables 5.3 and 5.4 and

obtained the estimate CsimMR to be interpreted as the actual cost of the MR policy. Each

problem was simulated with 10 replications with each replication having a run length

of 87600 time units. Each run of the simulation took around one hour on a desktop

computer with a 2.33GHz CPU. We present the 95% confidence interval (CI) around

the cost estimate CsimMR found from the simulation in Tables 5.5 and 5.6. We see that

the ratio of the CI half-width to CsimMR is less than 2% in all the examples (with an

average of 1.25%). Based on this, we conclude that the 95% CI’s are tight enough and

conclude that the simulation set up was appropriate to use CsimMR as reference values. In

all the examples, C∗MR values from Tables 5.3 and 5.4, which are also listed in the second

column of Tables 5.5 and 5.6, are contained in the corresponding 95% CI’s. Moreover,

the difference between CMR and CsimMR is less than 1% in all 36 examples, with an average

of 0.13%. In summary, we can state that using the 2-stage MGE r.v. by matching the

first three moments of Dk does not cause a significant error in computing CMR.


Table 5.3: The Optimal Inventory Rationing Levels and C∗MR of the MR Policy-I

No L2 L3 L4 C∗MR

1 0 3 10 10.966

2 0 3 10 10.906

3 0 3 11 11.129

4 0 2 11 11.063

5 0 3 11 11.062

6 0 2 11 11.023

7 0 3 10 10.853

8 0 3 10 10.984

9 0 2 10 10.722

10 0 2 11 11.431

11 0 2 10 10.690

12 0 2 11 11.306

13 0 3 10 10.485

14 0 3 10 10.357

15 0 3 11 11.413

16 0 2 10 10.946

17 0 3 11 11.428

18 0 2 11 11.067

5.3.2 Relative Performances of the Policies

To compare the relative performances of the MR, HF and HP policies, we computed

∆MRHF ≡

C∗HF − C∗

MR

C∗HF

, ∆MRHP ≡

C∗HP − C∗

MR

C∗HP

, ∆HPHF ≡

C∗HF − C∗

HP

C∗HF

,


Table 5.4: The Optimal Inventory Rationing Levels and C∗MR of the MR Policy-II

No L2 L3 L4 C∗MR

19 0 3 10 10.314

20 0 3 10 10.479

21 0 2 10 10.581

22 0 2 11 11.703

23 0 2 10 10.729

24 0 2 11 11.686

25 0 3 10 10.024

26 0 3 9 9.972

27 0 3 11 11.290

28 0 2 11 11.173

29 0 3 11 11.453

30 0 2 11 11.406

31 0 3 9 9.949

32 0 3 10 10.026

33 0 2 10 10.888

34 0 2 11 11.547

35 0 3 11 11.107

36 0 3 11 11.733

in which C∗HF and C∗

HP are the optimal costs of the system under the HF and HP policies,

respectively, discussed in Chapter 3. We present C∗HF and C∗

HP in Tables 5.8 and 5.9.

The ratios ∆MRHF and ∆MR

HP measure the cost decrease due to using the optimal MR policy

instead of the optimal HF and HP policies, respectively. The ratio ∆HPHF captures how

much more cost reduction the HP policy makes when compared to the HF policy.


Table 5.5: Comparison of C∗MR with Csim

MR -I

No C∗MR Csim

MR 95% C.I.C∗

MR−Csim

MR

CsimMR

%

1 10.966 11.027 (10.792,11.262) 0.56

2 10.906 10.914 (10.769,11.059) 0.07

3 11.129 11.159 (11.03,11.289) 0.27

4 11.063 11.163 (11.029,11.298) 0.90

5 11.062 11.075 (10.935,11.215) 0.12

6 11.023 11.127 (10.998,11.255) 0.93

7 10.853 10.890 (10.762,11.018) 0.34

8 10.984 11.060 (10.854,11.266) 0.69

9 10.722 10.714 (10.574,10.854) -0.08

10 11.431 11.457 (11.323,11.591) 0.23

11 10.690 10.720 (10.602,10.838) 0.28

12 11.306 11.289 (11.102,11.476) -0.15

13 10.485 10.459 (10.356,10.562) -0.25

14 10.357 10.356 (10.246,10.465) -0.01

15 11.413 11.360 (11.254,11.465) -0.46

16 10.946 10.942 (10.758,11.127) -0.03

17 11.428 11.380 (11.266,11.493) -0.43

18 11.067 11.108 (10.982,11.234) 0.36

In Table 5.7, we see remarkable cost savings under the MR policy when compared to

the HF policy. The HF policy also underperforms with respect to the HP policy. The HP

policy performs relatively better since it increases the system cost by an average of 6% if

used instead of the MR policy. Tables 5.8 and 5.9 present the optimal inventory control

parameters and costs of the HF and HP policies. The HF policy stores more spare parts


Table 5.6: Comparison of C∗MR with Csim

MR-II

No C∗MR Csim

MR 95% C.I.C∗

MR−Csim

MR

CsimMR

%

19 10.314 10.362 (10.291,10.433) 0.46

20 10.479 10.504 (10.379,10.629) 0.24

21 10.581 10.583 (10.44,10.727) 0.02

22 11.703 11.774 (11.632,11.915) 0.60

23 10.729 10.765 (10.606,10.925) 0.33

24 11.686 11.638 (11.456,11.821) -0.41

25 10.024 10.046 (9.949,10.144) 0.22

26 9.972 9.982 (9.914,10.05) 0.10

27 11.290 11.290 (11.121,11.458) -0.01

28 11.173 11.231 (11.14,11.322) 0.51

29 11.453 11.483 (11.319,11.647) 0.26

30 11.406 11.400 (11.212,11.587) -0.06

31 9.949 9.940 (9.843,10.036) -0.09

32 10.026 10.052 (9.944,10.16) 0.25

33 10.888 10.856 (10.712,11) -0.30

34 11.547 11.509 (11.304,11.715) -0.33

35 11.107 11.103 (10.978,11.228) -0.03

36 11.733 11.675 (11.48,11.87) -0.50

than the other two policies. The shared inventory S is never 0, and sometimes reserved

inventories are kept for one or two classes. The HP policy also prioritizes the classes based

on their down time costs. The columns S1 to S3 are the reserved inventories for class 1

(with highest down time cost) to class 3 (lowest down time cost). The shared inventory

S is never 0, and the HP policy sometimes keeps reserved inventories for classes 1 and


Table 5.7: The minimum, mean, median and maximum values of cost reduction of the

MR policy compared to the HF and HP policies.


∆MRHF 14 17 18 21

∆MRHP 4 6 6 8

∆HPHF 10 12 13 14

2. The total number of spares of the optimal HP policy is never less than the number

of spares of the optimal MR policy. In the optimal MR policy, we recall from Tables 5.3

and 5.4 that there are no spare parts reserved solely for class 1. Instead, classes 1 and 2

share 2 to 3 units while the inventory level is less than or equal to L3. As a result of this

flexibility, the MR policy outperforms the HP policy in reducing the system cost.

5.3.3 The Comparison of the MR and Optimal Policies

As for the final question, regarding the performance of the MR policy with respect to

the optimal policy, we start by reminding that the optimal policy is yet unknown in this

problem. However, the optimal cost can be computed numerically. To do this, we model

the system as a semi-Markov decision process using the average cost criterion. Here, an

action can be decided on either when a new component fails, or a repair is over. The

possible actions after a failure instant are either dispatching an available spare part from

the inventory, or taking no action. Assuming a repaired component immediately joins

the inventory, the possible actions are dispatching the component to one of the classes

with at least one down machine, or taking no action and letting the component stay in

the spare parts inventory. With the assumption that a repaired component first enters

the inventory, the possible actions at both decision epochs become the same. We define

the state of the system as the number of down machines in each class and the inventory


Table 5.8: The Optimal HF and HP policies-I

No S SI SII SIII C∗HF S S1 S2 S3 C∗

HP

1 8 0 2 3 13.554 9 1 1 0 11.834

2 8 0 3 2 13.499 8 1 2 0 11.792

3 9 1 0 3 13.447 9 1 1 0 11.752

4 10 1 2 0 13.293 9 1 1 0 11.591

5 9 2 0 2 13.437 9 1 1 0 11.777

6 10 2 1 0 13.317 9 1 1 0 11.620

7 8 0 2 3 13.395 9 1 1 0 11.673

8 8 0 3 2 13.546 9 1 1 0 11.929

9 9 1 0 3 13.076 9 1 1 0 11.285

10 9 1 3 0 13.618 10 1 1 0 12.017

11 9 2 0 2 13.118 9 1 1 0 11.370

12 10 2 1 0 13.550 9 1 1 0 11.953

13 8 0 2 3 13.132 9 1 1 0 11.400

14 8 0 3 2 13.026 8 1 2 0 11.206

15 11 0 0 2 13.627 10 1 1 0 12.023

16 10 1 2 0 13.057 9 1 1 0 11.431

17 9 2 0 2 13.805 10 1 1 0 12.229

18 10 2 1 0 13.321 9 1 1 0 11.690

level as

i = (n1, n2, . . . , nm, l), 0 ≤ nk ≤ Nk, k = 1, . . . , m, 0 ≤ l ≤ S,

and the possible actions are

a ∈ A (i) = {0, 1, . . . , m} ,


Table 5.9: The Optimal HF and HP policies-II

No S SI SII SIII C∗HF S S1 S2 S3 C∗

HP

19 7 0 2 3 12.880 9 1 1 0 11.132

20 8 0 3 2 13.080 9 1 1 0 11.437

21 9 1 0 2 12.830 9 1 1 0 11.119

22 11 0 2 0 13.657 10 1 1 0 12.212

23 10 2 0 1 13.157 9 1 1 0 11.417

24 10 2 1 0 13.785 10 1 1 0 12.318

25 7 0 2 3 12.512 8 1 1 0 10.848

26 7 0 3 2 12.495 8 1 1 0 10.793

27 11 0 0 2 13.376 9 1 1 0 11.797

28 10 1 2 0 13.146 9 1 1 0 11.646

29 10 2 0 1 13.797 10 1 1 0 12.274

30 10 2 1 0 13.566 10 1 1 0 12.089

31 7 0 2 3 12.416 8 1 1 0 10.707

32 7 0 3 2 12.507 8 1 1 0 10.894

33 10 1 0 2 13.003 9 1 1 0 11.359

34 11 0 2 0 13.436 10 1 1 0 12.038

35 10 2 0 1 13.451 9 1 1 0 11.878

36 10 2 1 0 13.798 10 1 1 0 12.391

such that if a = 0, then no action is taken, and if a = k, a component is dispatched

to class k. Therefore, at each decision epoch the system may move into m + 1 possible

states as a result of a failure or a repair completion. We assume a limited capacity of S

for the inventory, i.e., when the inventory level increases to S + 1 after the completion

of a repair, taking no action is not allowed, and the component must be dispatched to


a class. Let ci(a) and τi(a) be the expected costs and the expected time until the next

decision epoch if action a is chosen in the state i. Then,

ci(a) =

∑mk=1 nkbk−ba+(l−a)h

µ−λa+∑m

k=1(Nk−nk)λk,∑m

k=1 nk(S − l) > 0,

∑mk=1 nkbk−ba+(l−a)h

−λa+∑m

k=1(Nk−nk)λk, otherwise,

or

τi(a) =

(µ− λa +∑m

k=1(Nk − nk)λk)−1

,∑m

k=1 nk(S − l) > 0,

(−λa +∑m

k=1(Nk − nkλk)−1

, otherwise,

where b0 = 0 and λ0 = 0.

In this finite-state semi-Markov decision process model, a stationary deterministic

average optimal policy exists (see Theorem 11.4.6, page 557, Puterman, 2005). We first

convert the model into a discrete-time Markov decision model, and then a version of

the value-iteration algorithm (Tijms, 2003) is employed to find a policy within 0.01%

of the optimal policy (ε-optimal policy) in the numerical examples, i.e., the algorithm

stops after finitely many iterations with policy R(n) whose average cost function g (R(n))

satisfies

0 ≤g (R(n))− g∗

g∗≤ ε,

where g∗ denotes the optimal average cost. In all the examples, the difference between

the cost functions in two consecutive iterations is less than 0.0012. The algorithm run-

time is around 24 hours for each example on a desktop computer with a 2.33GHz CPU.

The final policies coincide with the MR policy in the priority sequencing of classes and

inventory thresholds. Each resultant policy determines an action for each state, and there

are between 10560 and 12672 states in each example. Tables 5.10 and 5.11 demonstrate

the cost of ε-policy, C∗ε , and a near perfect match between the MR policy and the ε-

optimal policy and the MR policy based on the number of states with equal actions in

both policies. The difference between C∗MR and C∗

ε is less than 0.16% in all 36 examples,

with an average of 0.12%.


Table 5.10: Comparison of ε-Optimal Policy and the MR Policy-I

No No of Iterations No of States % Matches C∗MR C∗

εC∗

MR−C∗

ε

C∗

MR

%

1 432 11616 99.99 10.966 10.952 0.12

2 439 11616 99.99 10.906 10.893 0.12

3 521 12672 100.00 11.129 11.116 0.12

4 585 12672 99.95 11.063 11.053 0.08

5 533 12672 100.00 11.062 11.049 0.12

6 592 12672 99.96 11.023 11.014 0.08

7 432 11616 100.00 10.853 10.840 0.12

8 438 11616 99.99 10.984 10.970 0.12

9 479 11616 99.94 10.722 10.705 0.16

10 602 12672 99.94 11.431 11.418 0.11

11 491 11616 99.97 10.690 10.675 0.14

12 612 12672 99.94 11.306 11.297 0.08

13 412 11616 100.00 10.485 10.474 0.11

14 414 11616 100.00 10.357 10.345 0.11

15 542 12672 100.00 11.413 11.398 0.13

16 554 11616 99.97 10.946 10.934 0.11

17 560 12672 100.00 11.428 11.414 0.12

18 592 12672 99.95 11.067 11.058 0.08


Table 5.11: Comparison of ε-Optimal Policy and the MR Policy-II

No No of Iterations No of States % Matches C∗MR C∗

εC∗

MR−C∗

ε

C∗

MR

%

19 406 11616 100.00 10.314 10.302 0.11

20 419 11616 100.00 10.479 10.468 0.11

21 481 11616 99.96 10.581 10.569 0.11

22 618 12672 99.73 11.703 11.685 0.15

23 491 11616 99.94 10.729 10.712 0.16

24 630 12672 99.90 11.686 11.669 0.14

25 388 11616 100.00 10.024 10.015 0.09

26 365 10560 100.00 9.972 9.961 0.12

27 538 12672 100.00 11.290 11.276 0.13

28 600 12672 99.97 11.173 11.164 0.08

29 558 12672 100.00 11.453 11.439 0.12

30 609 12672 99.91 11.406 11.395 0.10

31 358 10560 100.00 9.949 9.937 0.12

32 394 11616 100.00 10.026 10.017 0.09

33 494 11616 99.94 10.888 10.874 0.13

34 618 12672 99.76 11.547 11.538 0.08

35 530 12672 100.00 11.107 11.094 0.11

36 628 12672 100.00 11.733 11.718 0.13

Mean 509 12085 99.97 10.941 10.929 0.12

Chapter 6

Conclusions

Although capacity pooling has been well studied in the context of production systems,

it has received less attention in repair systems. The fundamental differences between

production and repair systems do not permit an easy extension of production system

findings to repair systems. In this Thesis, we model and analyze multiple repair systems.

In each system, a repair shop is responsible for repairing the failed components that are

used in fleets, with each fleet consisting of a certain number of machines. To increase

the availability of machines, spare parts are stored in each system. We study the effect

of resource pooling for these systems. We consider different topologies for spare part

inventories and apply static and dynamic policies for dispatching repaired components.

In Chapter 2, we consider a system of fleets of machines at different locations and

assume that each machine is subject to failure due to a critical repairable component. To

minimize down time costs, a spare part inventory for that fleet is kept at each location. We

address whether these fleets should be served by smaller on-site repair shops dedicated to

them or by a centralized repair shop serving all fleets on an FCFS basis. In the literature,

the latter, i.e., repair shop pooling is shown to be beneficial when transportation times

and costs are negligible. Our contribution is to develop a model which explores when

repair shop pooling is preferable if transportation times and costs are not negligible.

129

Chapter 6. Conclusions 130

To this end, we model a system with a centralized single-server repair shop as a closed

queueing network and obtain the steady-state distribution of the number of functional

components at each location using the MVA. In our extensive numerical study comparing

the two systems, we find that when transportation costs are not prohibitively high, repair

shop pooling is beneficial, and fleets located quite far away from the repair shop can be

serviced. If we pool the repair shops into a multi-server repair shop, rather than a faster

single server, we expect similar results for high utilizations. However, when utilization is

low, the benefit of pooling might be diminished.

In Chapter 3, we analyze two repair shop/inventory models, namely, the Hybrid-

Priority (HP) and the Hybrid-FCFS (HF) models, for a spare part provisioning problem.

The arrival rates at the repair shop are state-dependent since the systems served are

k-out-of-n:G systems. Our analysis for the FCFS policy can be easily applied to different

Markovian settings involving multi-classes of customers with state-dependent arrival rates

as long as the service rate is the same for all classes. With our extension, the preemptive-

resume priority policy can be also handled when there are more than two classes of

customers sharing the same server. As far as the setting in this chapter is concerned,

we demonstrate via our numerical study that both repair shop pooling and a hybrid

inventory structure with shared and reserved inventories for each system decrease the

overall cost significantly. The hybrid inventory structure can have more potential areas

of application, as for example, production/inventory systems. Finally, we observe that

the HP policy is better if the minimum availability expected from each system is not

close to the minimum availability of another system.

In Chapter 4, we develop a method to obtain the exact steady-state system size distri-

bution and conduct the busy period analysis of the M/G/1//N queue with an unreliable

server subject to operation-independent interruptions. We expand the classical queue-

ing model by incorporating a general OFF period, service and setup time distributions,

and consider multiple classes of customers. Including non-exponential distributions to


model times between customer arrivals and/or times between server interruptions re-

mains challenging and is an open research question. We provide an alternative analysis

of the M/M/1//N queue with exponential service times where setup times are negligible.

The numerical analysis points to the impact of the service and OFF period distributions

on customer service levels, which in certain cases can be counterintuitive. We observe

that higher variability in service time does not necessarily increase the expected queue

lengths. We also analyze the single-class M/G/1//N queue where the distributions of

times to interruptions and down times change depending whether the server is in its idle

or busy period.

In Chapter 5, we analyze a system with a pooled repair shop and inventory that

is operated under the MR policy while serving multiple classes of machines. The MR

policy prioritizes classes and rationalizes the inventory based on these priorities. In other

words, when the inventory level is below the inventory threshold identified for a class,

that class is not served. When no inventory is available for a class, down machines in

this class have to wait until all shortages in classes with higher priority are cleared, and

the inventory level reaches the identified threshold. At this point, a repaired component

is sent to this class if it has down machines. Due to the complexity of the MR policy

when it is used with finite source populations, we propose an approximation to compute

its cost. Our findings in Chapter 4 are used to develop this approximation method, the

accuracy of which has been shown to be high by comparing it to simulation. We also use

Markov Decision Processes (MDP) to obtain ε-optimal policy for the system with pooled

repair shop and spare part inventory. Our numerical findings in this chapter indicate

that the MR policy performs as good as the ε-optimal policy and outperforms the hybrid

policies.

In Chapter 2, we analyze the pooling of the repair shop with significant transporta-

tion cost and delay. In our future work, we hope to determine whether the developed

model can be extended to incorporate a centralized inventory of spare parts next to


the repair shop to serve all fleets at different locations. Introducing more cost efficient

static or dynamic prioritization dispatching rule among fleets is another interesting re-

search issue. In Chapter 3, through a numerical experiment for k-out-of-n systems, we

identify conditions under which one hybrid policy outperforms another. Characterizing

these conditions analytically can contribute to the literature. In Chapter 4, we analyze

multi-class M/G/1//N queues with unreliable servers and employ it to approximate the

average cost of the MR policy in Chapter 5 with exponentially distributed repair times.

In future, we plan to consider general repair times. Instead of the approximation, then,

an exact method can be formulated to compute the average cost of the MR policy. An

algorithm can then be formulated for an exact method. We also show that the MR policy

is as good as the ε-optimal policy. A challenging problem is to develop a theoretical proof

that an optimal policy has the structure of the MR policy. Finally, the hybrid policies

of Chapter 2 and the MR policy of Chapter 5 should be compared in the presence of

transportation delays; we foresee analytical difficulties here, which if overcome will allow

us to make a significant contribution to the field.

Bibliography

[1] Abboud, E. N. and J. N. Daigle. 1997. “A Little’s result approach to the service

constrained spares provisioning problem for repairable items,” Operations Research,

Vol. 45, No. 4, 577–583.

[2] Abouee-Mehrizi, H., B. Balcıoglu, and O. Baron. 2011. “Strategies for a centralized

single product multi-class M/G/1 make-to-stock queue”, under review.

[3] Almasi, B. and J. Sztrik. 2004. “Reliability investigations of heterogeneous terminal

systems using MOSEL”, Journal of Mathematical Sciences, Vol. 123, No. 1, 3795–

3801.

[4] Altıok, T. 1997. Performance Analysis of Manufacturing Systems, Springer-Verlag,

New York, NY.

[5] Atencia, I., G. Bouza, and P. Moreno. 2008. “An M [X]/G/1 retrial queue with server

breakdowns and constant rate of repeated attempts,” Annals of Operations Research,

Vol. 157, No. 1, 225–243.

[6] Avi-Itzhak, B. and P. Naor. 1963. “Some queueing problems with the service station

subject to breakdown”, Operations Research, Vol. 11, No. 3, 303–320.

[7] Baker, K. R., M. J. Magazine, and H. L. W. Nuttle. 1986. “The effect of commonality

on safety stock in a simple inventory model”, Management Science, Vol. 32, No. 8,

982–988.

133

Bibliography 134

[8] Balcıoglu, B., D. L. Jagerman, and T. Altıok. 2007. “Approximate mean waiting

time in a GI/D/1 queue with autocorrelated times to failures”, IIE Transactions,

Vol. 39, 985-996.

[9] Barlow, R. E. and F. Proschan. 1965. Mathematical theory of reliability, John Wiley

and Sons, New York

[10] Baskett, F., K. M. Chandy, R. R. Muntz, and F. G. Palacios. 1975. “Open, closed,

and mixed networks of queues with different classes of customers”, Journal of the

Association for Computing Machinery, Vol. 22, No. 2, 248–260.

[11] Benjaafar, S.,1995. “Performance bounds for the effectiveness of pooling in multi-

processing systems”, European Journal of Operational Research, Vol. 87, 375-388.

[12] Benjaafar, S., W. L. Cooper, and J. S. Kim. 2005. “On the benefits of pooling in

production-inventory systems”, Management Science, Vol. 51, No. 4, 548–565.

[13] Bhat, U. N. and G. K. Miller. 2002. Elements of applied stochastic processes, Wiley,

New Jersey.

[14] Bitran, G. and R. Caldentey. 2002. “Two-class priority queueing system with state-

dependent arrivals”, Queueing Systems, Vol. 40, 355–82.

[15] Boxma, O. J. 1986. “A queueing model of finite and infinite source interaction,”

Operations Research Letters Vol. 5, 245–254.

[16] Breuer, L. and D. Baum. 2006. An introduction to queueing theory and matrix-

analytic methods, Springer, The Netherlands.

[17] Buzen, J. P. 1973. “Computational algorithms for closed queueing networks with

exponential servers”, Communications of the ACM, Vol. 16, No. 9, 527-531.

Bibliography 135

[18] Caggiano, K. E., P. L. Jackson, J. A. Muckstadt, and J. A. Rappold. 2009. “Efficient

computation of time-based customer service levels in a multi-item, multi-echelon

supply chain: A practical approach for inventory optimization”, European Journal

of Operational Research, Vol. 199, 744–749.

[19] Chakravarthy, S. R. and A. Agarwal. 2003. “Analysis of a machine repair prob-

lem with an unreliable server and phase type repairs and services”, Naval Research

Logistics, Vol. 50, No. 5, 462–480.

[20] Chandra, M. J. 1986. “A Study of multiple finite-source queueing models”, Journal

of the Operational Research Society, Vol. 37, 275–283.

[21] van Doorn E. A. and G. J. K. Regterschot. 1988. “Conditional PASTA”, Operations

Research Letters, Vol. 7, No. 5, 229–232.

[22] Dshalalow, J. 1991. “O single-server closed queues with priorities and state depen-

dent parameters”, Queueing Systems, Vol. 8, 237–53.

[23] Eppen, G. D. 1979. “Effects of centralization on expected costs in a multi-location

newsboy problem”, Management Science, Vol. 25, No. 5, 498–501.

[24] Fawzi, B. B. and A. G. Hawkes. 1991. “Availability of an R-out-of-N system with

spares and repairs”, Journal of Applied Probability, Vol. 28, No. 2, 397–408.

[25] Federgruen, A. and L. Green. 1986. “Queueing systems with service interruptions”,

Operations Research, Vol. 34, No. 5, 752–768.

[26] Federgruen, A. and L. Green. 1988. “Queueing systems with service interruptions

II”, Naval Research Logistics, Vol. 35, 345–358.

[27] Fiems, D., T. Maertens, and H. Bruneel. 2008. “Queueing systems with different

types of server interruptions”, European Journal of Operational Research, Vol. 188,

No. 3, 838–845.

Bibliography 136

[28] Frostig, E. and B. Levikson. 2002. “On the availability of R out of N repairable

systems”, Naval Research Logistics, Vol. 49, 483–498.

[29] Gaver, D. P. 1962. “A waiting line with interrupted service, including priorities”,

Journal of the Royal Statistical Society, Vol. 24, No. 1, 73–90.

[30] Gerchak, Y. and Q. M. He. 2003. “On the relation between the benefits of risk pooling

and the variability of demand”, IIE Transactions, Vol. 35, No. 1, 1027–1031.

[31] Gordon, W. J. and G. F. Newell. 1967. “Closed queuing systems with exponential

servers”, Operations Research, Vol. 15, No. 2, 254–265.

[32] Graves, S. C. and J. Keilson. 1983. “System balance for extended logistic systems”,

Operations Research, Vol. 31, 234–252.

[33] Graves, S. C. 1985. “A multi-echelon inventory model for a repairable item with

one-for-one replenishment”, Management Science, Vol. 31, No. 10, 1247–1256.

[34] Gross, D. and C. M. Harris. 1998. Fundamentals of Queueing Theory, John Wiley

& Sons, New York.

[35] Gupta, H. and J. Sharma. 1981. “State transition matrix and transition diagram

of k-out-of-n-:G system with spares”, IEEE Transactions on Reliability, Vol. R-30,

No. 4, 395–396.

[36] Gupta, U.C. and T.S.S. Srinivasa Rao. 1996. “Computing the steady state proba-

bilities in λ(n)/G/1/K queue”, Performance Evaluation, Vol. 24, 265–275.

[37] Ha, A. 1997a. “Inventory rationing policy in a make-to-stock production system with

several demand classes and lost dales”, Management Science Vol. 43, 1093–1103.

[38] Ha, A. 1997b. “Stock-rationing policy for a make-to-stock production system with

two priority classes and backordering”, Naval Research Logistics, Vol. 44, 457–472.

Bibliography 137

[39] Haque, L. and M. J. Armstrong. 2007. “A survey of the machine interference prob-

lem”, European Journal of Operational Research, Vol. 179, No. 2, 469-482.

[40] Hui, E. Y. Y. and A. H. C. Tsang. 2004. “Sourcing strategies of facilities manage-

ment”, Journal of Quality in Maintenance Engineering, Vol. 10, No. 2, 85–92.

[41] Iravani, S. M. R., and B. Kolfal. 2005. “When does the cµ rule apply to finite-

population queueing systems?”, Operations Research Letters, Vol. 33, 301–304.

[42] Iravani, S. M. R., V. Krishnamurthy, and G. H. Chao. 2007. “Optimal server

scheduling in nonpreemptive finite-population queueing systems”, Queueing Sys-

tems, Vol. 55, 95-105.

[43] Jaiswal, N. K. 1968. Priority Queues, Academic Press.

[44] Kerner, Y. 2008. “The conditional distribution of the residual service time in the

Mn/G/1 queue,” Stochastic Models, Vol. 24, 364–375.

[45] Kulkarni, V. G. 1989. “A new class of multivariate phase type distributions”, Oper-

ations Research, Vol. 37, No. 1, 151–158.

[46] Kumar, R. and T. Markeset. 2007. “Development of performance-based service

strategies for the oil and gas industry: A case study”, Journal of Business and

Industrial Marketing, Vol. 22, No. 4, 272–280.

[47] Kutanoglu, E., and M. Mahajan. 2009. “An inventory sharing and allocation method

for a multi-location service parts logistics network with time-based service levels”,

European Journal of Operational Research, Vol. 194, 728-742.

[48] Lawler, E. L. and M. D. Bell. 1966. “A method for solving discrete optimization

problems”, Operations Research, Vol. 14, 1098–1112.

Bibliography 138

[49] Levitin, G. and S. V. Amari. 2010. “Approximation algorithm for evaluating time-to-

failure distribution of k-out-of-n system with shared standby elements”, Reliability

Engineering and System Safety, Vol. 95, 396–401.

[50] Louit, D., R. Pascual, D. Banjevic, and A.K.S. Jardine. 2011. “Optimization models

for critical spare parts inventories— a reliability approach”, Journal of the Opera-

tional Research Society, Vol. 62, 992–1004.

[51] Messinger, M. and M. L. Shooman. 1970. “Techniques for optimum spares allocation:

A tutorial review”, IEEE Transactions on Reliability, Vol. R-19, No. 4, 156–166.

[52] Miller, D. R. 1981. “Computation of steady-state probabilities for M/M/1 priority

queues”, Operations Research, Vol. 29, 945–958.

[53] Mitrany, I. L. and B. Avi-Itzhak. 1968. “A many server queue with service interrup-

tions”, Management Science, Vol. 25, 849–861. Operations Research, Vol. 16, No. 3,

628–638.

[54] Neuts, M. F. and D. M. Lucantoni. 1979. “A Markovian queue with N servers subject

to breakdowns and repair”, Management Science, Vol. 25, No. 9, 849–861.

[55] Pena Perez, A., and P. Zipkin. 1997. “Dynamic Scheduling Rules for a Multiproduct

Make-to-Stock Queue”, Operations Research, Vol. 45, No. 6, 919–930.

[56] Puterman, M. L. 2005. Markov Decision Processes, John Wiley & Sons, New Jersey.

[57] Reiser, M. and S. S. Lavenberg. 1980. “Mean-value analysis of closed multichain

queuing networks”, Journal of the ACM, Vol. 27, No. 2, 313-322.

[58] Sasaki, M., S. Kaburaki, and S. Yanagi. 1977. “System availability and optimum

spare units”, IEEE Transactions on Reliability, Vol. 26, No. 3, 182–188.

Bibliography 139

[59] Shanthikumar, J. G. and U. Sumita. 1985. “On the busy-period distributions of

M/G/1/K queues with state-dependent arrivals and FCFS/LCFS-P service disci-

plines”, Journal of Applied Probability, Vol. 22, No. 4, 912–919.

[60] Silver, E. A., D. F. Pyke, and R. Peterson. 1998. Inventory Management and Pro-

duction Planning and Scheduling, 3rd. Edition, John Wile & Sons.

[61] Smith, D. R. and W. Whitt. 1981. “Resource sharing for efficiency in traffic systems”,

The Bell System Technical Journal, Vol. 60, 39–55.

[62] de Smidt-Destombes, K. S., M. C. van der Heijden, and A. van Harten. 2004. “On

the availability of a k-out-of-N system given limited spares and repair capacity under

a condition based maintenance strategy”, Reliability Engineering and System Safety,

Vol. 83, 287–300.

[63] Stecke, K. E. and J. E. Aronson. 1985. “Review of operator/machine interference

models”, International Journal of Production Research, Vol. 23, No. 1, 129–151.

[64] Stidham, S. 1970. “On the optimality of single-server queueing systems”, Operations

Research, Vol. 18, 708–732.

[65] Sztrik, J. and T. Gal. 1990. “A recursive solution of a queueing model for a multi-

terminal system subject to breakdowns”, Performance Evaluation, Vol. 11, No. 1,

1–7.

[66] Taylor, J. and R. R. P. Jackson. 1954. “An application of the birth and death process

to the provision of spare machines”, Operational Research Quarterly, Vol. 5, 95–108.

[67] Thiruvengadam, K. 1963. “Queueing with breakdown”, Operations Research,

Vol. 11, 62–71.

[68] Tijms, H. C. 2003. A First Course in Stochastic Models, John Wiley & Sons Ltd,

West Sussex, England.

Bibliography 140

[69] Veran, M. 1984. “Exact analysis of a priority queue with finite source”, Proceedings of

the International Seminar on Modelling and Performance Evaluation Methodology,

Paris.

[70] de Vericourt, F., F. Karaesmen, and Y. Dallery. 2001. “Assessing the Benefits of

Different Stock-Allocation Policies for a Make-to-Stock Production System”, Man-

ufacturing & Service Operations Management, Vol. 3, 105–121.

[71] de Vericourt, F., F. Karaesmen, and Y. Dallery. 2002. “Optimal Stock Allocation

for a Capacitated Supply System”, Management Science, Vol. 48, 1486–1501.

[72] Wang, K.-H. 1990. “Profit analysis of the machine-repair problem with a single

service station subject to breakdowns”, Journal of the Operational Research Society,

Vol. 41, No. 12, 1153–1160.

[73] Wang, K.-H and Kuo, M.-Y. 1997. “Profit analysis of the M/Ek/1 machine repair

problem with a non-reliable service station”, Computers and Industrial Engineering,

Vol. 32, No. 3, 587–594.

[74] Wang, J., J. Cao, and Q. Li. 2001. “Reliability analysis of the retrial queue with

server breakdowns and repairs”, Queueing Systems, Vol. 38, No. 4, 363–380.

[75] Waters, D. 2003. Logistics: An Introduction to Supply Chain Management, Palgrave

Macmillan, New York.

[76] White, H. and L. Christie. 1958. “Queueing with preemptive priorities or with break-

down”, Operations Research, Vol. 6, No. 1, 79–95.

[77] Yanagi, S., M. Sasaki, and K. Umazume. 1981. “Optimal inventory problem of a re-

pairable K-out-of-N :G system”, IEEE Transactions on Reliability, Vol. R-30, No. 5,

478–480.

Bibliography 141

[78] Yang, H. and L. Schrage. 2009. “Conditions that cause risk pooling to increase

inventory”, European Journal of Operational Research, Vol. 192, 837–851.

[79] Yu, Y., S. Benjaafar, and Y. Gerchak. 2009. “Capacity pooling and cost allocation

among independent firms in the presence of congestion”, Under Review.

the benefit of capacity pooling for repairable spare parts · the beneﬁt of capacity pooling for...

Documents