1-s2.0-s0305054807000366-main

32
Computers & Operations Research 35 (2008) 3530 – 3561 www.elsevier.com/locate/cor Dynamic modeling and control of supply chain systems: A review Haralambos Sarimveis a , , Panagiotis Patrinos a , Chris D. Tarantilis b , Chris T. Kiranoudis a a School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechniou str. Zografou Campus, 15780 Athens, Greece b Department of Management Science and Technology, Athens University of Economics and Business, 47A Evelpidon Street/33 Lefkados Street, Athens 113-62, Greece Available online 7 February 2007 Abstract Supply chains are complicated dynamical systems triggered by customer demands. Proper selection of equipment, machinery, buildings and transportation fleets is a key component for the success of such systems. However, efficiency of supply chains mostly depends on management decisions, which are often based on intuition and experience. Due to the increasing complexity of supply chain systems (which is the result of changes in customer preferences, the globalization of the economy and the stringy competition among companies), these decisions are often far from optimum. Another factor that causes difficulties in decision making is that different stages in supply chains are often supervised by different groups of people with different managing philosophies. From the early 1950s it became evident that a rigorous framework for analyzing the dynamics of supply chains and taking proper decisions could improve substantially the performance of the systems. Due to the resemblance of supply chains to engineering dynamical systems, control theory has provided a solid background for building such a framework. During the last half century many mathematical tools emerging from the control literature have been applied to the supply chain management problem. These tools vary from classical transfer function analysis to highly sophisticated control methodologies, such as model predictive control (MPC) and neuro-dynamic programming. The aim of this paper is to provide a review of this effort. The reader will find representative references of many alternative control philosophies and identify the advantages, weaknesses and complexities of each one. The bottom line of this review is that a joint co-operation between control experts and supply chain managers has the potential to introduce more realism to the dynamical models and develop improved supply chain management policies. 2007 Elsevier Ltd. All rights reserved. Keywords: Supply chain management; Control; Dynamic modeling; Review; Dynamic programming; Model predictive control 1. Introduction A supply chain is a network of facilities and distribution entities (suppliers, manufacturers, distributors, retailers) that performs the functions of procurement of raw materials, transformation of raw materials into intermediate and finished products and distribution of finished products to customers. A supply chain is typically characterized by a forward flow of materials and a backward flow of information. Recently, enterprises have shown a growing interest for efficient supply chain management. This is due to the rising cost of manufacturing and transportation, the globalization of market economies and the customer demand for diverse products of short life cycles, which are all factors that Corresponding author. Tel.: +30 210 7723237; fax: +30 210 7723138. E-mail address: [email protected] (H. Sarimveis). 0305-0548/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2007.01.017

Upload: myself-noor

Post on 21-Oct-2015

20 views

Category:

Documents


1 download

DESCRIPTION

ok

TRANSCRIPT

Computers & Operations Research 35 (2008) 3530–3561www.elsevier.com/locate/cor

Dynamic modeling and control of supply chain systems: A review

Haralambos Sarimveisa,∗, Panagiotis Patrinosa, Chris D. Tarantilisb, Chris T. Kiranoudisa

aSchool of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechniou str. Zografou Campus, 15780 Athens, GreecebDepartment of Management Science and Technology, Athens University of Economics and Business,

47A Evelpidon Street/33 Lefkados Street, Athens 113-62, Greece

Available online 7 February 2007

Abstract

Supply chains are complicated dynamical systems triggered by customer demands. Proper selection of equipment, machinery,buildings and transportation fleets is a key component for the success of such systems. However, efficiency of supply chains mostlydepends on management decisions, which are often based on intuition and experience. Due to the increasing complexity of supplychain systems (which is the result of changes in customer preferences, the globalization of the economy and the stringy competitionamong companies), these decisions are often far from optimum. Another factor that causes difficulties in decision making is thatdifferent stages in supply chains are often supervised by different groups of people with different managing philosophies. From theearly 1950s it became evident that a rigorous framework for analyzing the dynamics of supply chains and taking proper decisions couldimprove substantially the performance of the systems. Due to the resemblance of supply chains to engineering dynamical systems,control theory has provided a solid background for building such a framework. During the last half century many mathematical toolsemerging from the control literature have been applied to the supply chain management problem. These tools vary from classicaltransfer function analysis to highly sophisticated control methodologies, such as model predictive control (MPC) and neuro-dynamicprogramming. The aim of this paper is to provide a review of this effort. The reader will find representative references of manyalternative control philosophies and identify the advantages, weaknesses and complexities of each one. The bottom line of thisreview is that a joint co-operation between control experts and supply chain managers has the potential to introduce more realismto the dynamical models and develop improved supply chain management policies.� 2007 Elsevier Ltd. All rights reserved.

Keywords: Supply chain management; Control; Dynamic modeling; Review; Dynamic programming; Model predictive control

1. Introduction

A supply chain is a network of facilities and distribution entities (suppliers, manufacturers, distributors, retailers)that performs the functions of procurement of raw materials, transformation of raw materials into intermediate andfinished products and distribution of finished products to customers. A supply chain is typically characterized by aforward flow of materials and a backward flow of information. Recently, enterprises have shown a growing interest forefficient supply chain management. This is due to the rising cost of manufacturing and transportation, the globalizationof market economies and the customer demand for diverse products of short life cycles, which are all factors that

∗ Corresponding author. Tel.: +30 210 7723237; fax: +30 210 7723138.E-mail address: [email protected] (H. Sarimveis).

0305-0548/$ - see front matter � 2007 Elsevier Ltd. All rights reserved.doi:10.1016/j.cor.2007.01.017

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3531

increase competition among companies. Efficient supply chain management can lead to lower production cost, in-ventory cost and transportation cost and improved customer service throughout all the stages that are involved in thechain.

Various alternative methods have been proposed for modeling supply chains. According to Beamon [1], they can begrouped into four categories: deterministic models where all the parameters are known, stochastic models where at leastone parameter is unknown but follows a probabilistic distribution, economic game-theoretic models and models basedon simulation, which evaluate the performance of various supply chain strategies. The majority of these models aresteady-state models based on average performance or steady-state conditions. However, static models are insufficientwhen dealing with the dynamic characteristics of the supply chain system, which are due to demand fluctuations,lead-time delays, sales forecasting, etc. In particular, they are not able to describe, analyze and find remedies for amajor problem in supply chains, which recently became known as “the bullwhip effect”.

The “bullwhip” phenomenon is the amplification of demand variability as we move from a downstream level to anupstream level in a supply chain. Lee et al. [2] identified four major causes of the bullwhip effect:

1. Demand forecasting which often is performed independently by each element in the supply chain based on itsimmediate customers.

2. Batching of orders to reduce processing and transportations costs.3. Price fluctuations due to special promotions like price discounts and quantity discounts.4. Supply shortages, which lead to artificial demands.

Two recent publications provide excellent reviews on the subject of the bullwhip effect. They both report additionalcauses of this undesired behavior and pinpoint methods for eliminating the problem. In particular, according toMiragliotta [3] there are conflicting approaches to describe and analyze the bullwhip phenomenon between academi-cians and managers. A new taxonomy was proposed, which shares the scientific rigor of the former with the practicalattitude of the latter. According to Geary et al. [4] “human” factors, such as ignorance, arrogance and indifference arecontributing to the bullwhip effect, but proper re-engineering of the supply chain (such as smooth production strategies)can eliminate the causes of this undesired phenomenon.

From the above discussion, it is clear that consideration of the dynamic characteristics offers a competitive advantagein modeling supply chain systems. It is not surprising that dynamic analysis and design of supply chain systems as awhole has attracted a lot of attention, both from the academia and the industry. A recent review paper [5] focused on thealternative approaches that have been proposed for modeling the dynamics of supply chains, which were categorizedas follows: continuous-time differential equation models, discrete-time difference models, discrete event models andclassical operational research methods.

Control theory provides sufficient mathematical tools to analyze, design and simulate supply chain managementsystems, based on dynamic models. In particular, control theory can be used to study and find solutions to the “bullwhip”phenomenon. The aim of this paper is to review the research efforts of the last half century regarding the application ofcontrol theory to the supply chain management problem. We believe that this work will help researchers and practitionersthat would like to get involved in this exciting scientific area, to gain knowledge about the major developments thathave emerged throughout the years and get informed about the state-of-the-art methods of today. We should notethat excellent reviews on the application of control theory to the production and inventory control problem have beenpublished previously [6–8]. However, the first two review papers [6,7] are almost two decades old, so they do notcover the major advances that have taken place since then. The paper of Ortega and Lin [8] is recent, but it focuseson classical control methodologies. The present work can be considered as a complement to the paper of Ortega andLin, since it presents an extensive review of the application of advanced control methodologies to the production andinventory control problem. For the sake of completeness, a major section of this paper is devoted to classical controlapplications. The classical control section is updated by the recent advances that have emerged over the last few years.

The rest of the review paper is synthesized as follows: the next section contains applications of classical controlto the supply chain modeling problem, where most of the analysis concerns linear systems and is performed in thefrequency domain. In particular, Laplace transfer functions and z transfer functions are used to model the dynamics ofcontinuous and discrete linear systems in the frequency domain. Standards analysis tools, such as the Bode and Nyquistplots, the Routh and Hurwitz stability criteria and transient responses are used to analyze and evaluate the alternativedesigns.

3532 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

Section 3 is devoted to the application of advanced control theory, where the system dynamics are examined in thetime domain and are described by state space models. Advanced control methodologies are basically optimal controlmethods aiming at the optimization of an objective function that describes the performance of the system. Dynamicprogramming and Hamilton–Jacobi–Bellman (HJB) equations [9,10] are prevalent in optimal control theory. Anothermajor issue that is often taken into account in advanced control theory is the presence of uncertainties, which complicatesthe process of making effective decisions regarding production, storage and distribution of products. Uncertainties areinvolved in future demand prediction, lead time estimation, estimation of failure probabilities, etc. Many problemsin supply chain theory can be cast as stochastic optimal control problems. Therefore, much of the literature thatconsiders supply chain networks from a system theoretic point of view is largely based on optimal control and dynamicprogramming.

Due to the curse of dimensionality, many models that are based on dynamic programming and optimal control cannotbe solved analytically. Eventually one must resort to some kind of approximation. Model predictive control (MPC)[11] and the rolling horizon concept is a viable approach to cope with intractability in optimal closed-loop feedbackcontrol design. Its main idea is to solve on-line a finite-horizon open-loop optimal control problem considering thecurrent state as the initial state for the problem. The problem is formulated and solved at each discrete-time instance.MPC techniques have recently been applied in supply chain problems and are reviewed in Section 4.

Another way of dealing with uncertainty is to model it as a deterministic uncertain-but-bounded quantity. In thiscase, no information regarding probability information of the disturbances is required. For example, future demand canbe bounded between lower and upper limits, without needing to define the likelihood of occurrence of each possibleevent within these limits. Systems in which disturbances are described as uncertain-but-bounded quantities are themain concern of robust control. Specifically, robust optimal control [12] seeks a feedback controller that minimizes theworst-case value of a cost criterion over all possible realizations of the uncertain parameters. Furthermore, constraintsregarding the operation of the system must be fulfilled for every possible value of the uncertain parameters. Articlesdescribing applications of robust control theory to supply chain management problems are reviewed in Section 5.

In Section 6 we review alternative methods that have been proposed to combat the curse of dimensionality indynamic programming and the lack of an accurate model for the stochastic system under investigation. These methodsare usually based on some form of approximation of the value function combined with simulations. They grew out of theartificial intelligence community and they are usually referred to as reinforcement learning techniques, neuro-dynamicprogramming or approximate dynamic programming [13].

The paper ends with the concluding remarks and some suggestions for further research.

2. Classical control theory

The utilization of classical control techniques in the supply chain management problem can be traced back to the early1950s when Simon [14] applied servomechanism continuous-time theory to manipulate the production rate in a simplesystem involving just a single product. The idea was extended to discrete-time models by Vassian [15] who proposedan inventory control framework based on the z-transform methodology. A breakthrough, however, was experiencedin the late 1950s by the so-called “industrial dynamics” methodology, which was introduced by the pioneering workof Forrester [16,17]. The methodology, later referred to as “system dynamics” used a feedback perspective to model,analyze and improve dynamic systems, including the production-inventory system. The scope of the methodology waslater broadened to cover complex systems from various disciplines such as social systems, corporate planning andpolicy design, public management and policy, micro- and macro-economic dynamics, educational problems, biologicaland medical modeling, energy and the environment, theory development in the natural and social sciences, dynamicdecision-making research, strategic planning and more [18]. The book written recently by Sterman [19] is an excellentsource of information on the “system dynamics” philosophy and its various applications and includes special chapterson the supply chain management problem.

Forrester’s work was appreciated for providing powerful tools to model and simulate complex dynamical phenomenaincluding nonlinear control laws. However, the “industrial dynamics” methodology was criticized for not containingsufficient analytical support [20] and for not providing guidelines to the systems engineers on how to improve per-formance [21]. Motivated by the need to develop a new framework that could be used as a base for seeking newnovel control laws and/or new feedback paths in production/inventory systems, Towill [21] presented the inventoryand order based production control system (IOBPCS) in a block diagram form, extending the work of Coyle [22].

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3533

Ga(s/z)

-+

+

++ Gp(s/z)

-+

-+

e−lTms / z−lGw(s/z)

-+ Gi(s/z)

Gd(s/z)

k

Demand Policy

Pipeline Policy

Inventory Policy

CONS

AINV

Lead Time

Target Stock Setting

AVCON

ORATE COMRATE

DINV

DWIP

EWIP

AWIP

EINV

Fig. 1. The family of IOBPCS models.

It was considered that the system deals with aggregate product levels or alternatively it reflects a single product. Thesystem was subject to many modifications and improvements in subsequent years including extensions to discrete-timesystems, thus leading to the IOBPCS family that is presented in a block diagram form in Fig. 1. Standard nomenclatureused in industrial dynamics is adopted to represent input, output and intermediate signals in the block diagram:

AINV actual inventory holdingAVCON average consumptionAWIP actual WIP holdingCOMRATE completion rateCONS consumption or market demandDINV desired inventory levelDWIP desired work in progressEINV error in inventory holdingEWIP error in work in progressORATE order rate

Using control terminology, actual inventory level (AINV) is the controlled variables, while market demand (CONS)is a disturbance and ORATE (ORATE) is the manipulated variable. The two integrators are used to accumulate theinventory and work in process (WIP) deficits over time.

Each member of the IOBPCS family is constructed by defining some or all of the following five components [23]:

• The lead time, which represents the time between placing an order and receiving the goods into inventory. Inmanufacturing sites, lead time incorporates production delays. Alternatively, this component can be interpreted as aproduction smoothing element, representing how slowly the production unit adapts to changes in ORATE [24].

• The target stock setting, which can be either fixed or a multiple of current average sales rates.• The demand policy, which in essence is a forecasting mechanism that averages the current market demand. The

demand policy is a feed-forward loop within the replenishment policy.• The inventory policy, which is a feedback loop that controls the rate at which inventory deficit (difference between

desired stock setting and AINV) is recovered.

3534 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

• The pipeline policy, which is a feedback loop that determines the rate at which WIP deficit (difference betweendesired WIP level and actual WIP level) is recovered.

The lead time is a characteristic of the system to be controlled. Although, the designer of the control system cannotmanipulate the lead time, it is important to model the delay in the best possible way. A generic lead time model withtwo parameters was proposed by Winker [25] in the continuous-time formulation but can be easily extended to thediscrete-time formulation:

Gp(s) = 1

((Tp/n)s + 1)n. (1)

Three are the common choices for the parameter n:

n = 1: First order delay.n = 3: Third order delay.n → 8: Pure (infinite order) delay.

For the first two choices Tp is the average lead time of the unit, while for the last choice Tp is the fixed lead time. Forthe last choice of the parameter n, the transfer functions Gp(s/z) can be written as follows:

Gp(s) = e−Tps , Gp(z) = z−q , (2)

where Tp = qT m and Tm is the sampling interval in the discrete-time case.The designer has to decide on how the target stock will be set (fixed value or multiple of average sales) and select

the three policies (demand policy, inventory policy and pipeline policy), in order to optimize the system with respectto the following performance objectives:

(a) Inventory level recovery.(b) Attenuation of demand rate fluctuations on the ordering rate.

The second objective aims at the reduction of the “bullwhip” effect. The term “bullwhip” was only recently introducedas mentioned in the introduction, but the phenomenon where a small random variation in sales at the marketplaceis amplified at each level in the supply chain was already identified by the pioneering work of Forester in industrialdynamics [17]. This was later postulated by Burbidge under the “Law of Industrial Dynamics” [26]. The utilization ofcontrol engineering principles in tackling the problem by providing supply chain dynamic modeling and re-engineeringmethodologies was soon recognized as reported by Towill [27].

The two performance objectives are conflicting. Thus, for each particular supply chain, the control system designerseeks for the best inventory level and ordering rate trade-off. A qualitative look at the two extremes scenarios (perfectsatisfaction of each one of the two objectives) clearly shows that a compromise is needed to arrive at a well designedcontrol system. If a fixed ordering rate is used then large inventory deviations are observed, since inventory levelsfollow any demand variation. This policy (known as Lean Production in manufacturing cites) obviously results in largeinventory costs. On the other hand a fixed inventory level (known as Agile Production in manufacturing cites [28])results in highly variable production schedules and hence, large production costs.

Standard control metrics are used in the literature to quantify the performance of alternative control policies withrespect to the aforementioned objectives. Regarding the first objective, the dynamic behavior of the system when a stepinput is introduced to the demand rate is studied. The inventory response in then evaluated with respect to performancecriteria such as the rise time, the settling time and the maximum overshoot to name a few. Another useful metricto quantify inventory recovery is the integral of time ∗ absolute error (ITAE) criterion. Frequency response tests aretypically used to evaluate the performance of the system with respect to the second objective. For a particular transferfunction G(s), frequency response (Bode) plots draw the magnitude and the phase angle of the complex number G(j�)

as a function of �. The frequency response plots provide valuable information, since when a sinusoidal input is presentedto the system, the output is a sine wave of the same frequency �. The ratio of the amplitude of the output signal overthe amplitude of the input signal (amplitude ratio, AR) is equal to the magnitude of G(j�), while the phase shift isequal to the angle of G(j�). Based on the frequency responses, the noise bandwidth metric can be easily computed to

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3535

Table 1Demand, inventory and pipeline policies for several models in the IOBPCS family

Model Target stock setting Demand policy Inventory policy Pipeline policy

IBPCS Inventory based productioncontrol system

Constant Ga(s) = 0

Ga(z) = 0

Ga(s) = 1Ti

Ga(z) = 1Ti

Gw(s) = 0

Gw(z) = 0

IOBPCS Inventory and order basedproduction control system

Constant Ga(s) = 1Tas+1

Ga(z) = a

1−(1−a)z−1

Ga(s) = 1Ti

Ga(z) = 1Ti

Gw(s) = 0

Gw(z) = 0

VIOBPCS Variable inventory and or-der based production controlsystem

Multiple of averagemarket demand

Ga(s) = 1Tas+1

Ga(z) = a

1−(1−a)z−1

Ga(s) = 1Ti

Ga(z) = 1Ti

Gw(s) = 0

Gw(z) = 0

APIOBPCS Automatic pipeline, inven-tory and order based produc-tion control system

Constant Ga(s) = 1Tas+1

Ga(z) = a

1−(1−a)z−1

Ga(s) = 1Ti

Ga(z) = 1Ti

Gw(s) = 1Tw

, Gd(s) = Tp

Gw(z) = 1Tw

, Gd(z) = Tp

APVIOBPCS Automatic pipeline, variableinventory and order basedproduction control system

Multiple of averagemarket demand

Ga(s) = 1Tas+1

Ga(z) = a

1−(1−a)z−1

Ga(s) = 1Ti

Ga(z) = 1Ti

Gw(s) = 1Tw

, Gd(s) = Tp

Gw(z) = 1Tw

, Gd(z) = Tp

quantify the noise amplification (bullwhip effect). This is defined as the area under the squared frequency response ofthe system. Disney and Towill [29] showed that the noise bandwidth metric divided by � is equivalent to the variationratio measure (variance of ordering rate over the variance of the demand rate), which was proposed by Chen et al. [30]to quantify the “bullwhip” effect.

It can be shown mathematically and experimentally that using the feed-forward component (demand policy), wecan achieve zero steady-state error between actual and desired inventory level when a step change is introduced in theconsumption rate, even if no integral action is included in the inventory policy. However, if market demand is used inthe forward component without any form of averaging, i.e. if we set Ga(s/z) = 1, excessive fluctuations are observedin the production completion/order rates, thus failing to satisfy the second performance objective. This is alleviated byutilizing an average measure of current market demand.

The inventory policy defines the rate at which inventory deficits are recovered by manipulating the ORATE. Theinventory policy should take into account the dynamics of the system and mainly the lead time. A decision made at agiven time instance on the ORATE, will result in an actual modification of the inventory level only after a time periodhas passed, which is equal to the lead time. An inventory policy that aims at recovering all the inventory deficit in asingle time period, will result in a significant excess WIP on the shop floor and eventually in an oscillatory behavior,as far as both the completion rate (COMRATE) and the inventory level are concerned. The consequences of such adynamic behavior are: higher handling/production costs (since ORATE is not smooth), higher inventory costs (whenthere is a surplus of inventory) and poor customer service (when actual inventory is below the target value). Moreover,a higher capacity is required in both the production and storage facilities. Therefore, only a fraction of the inventorydiscrepancy should be recovered by the inventory policy.

The pipeline policy is a correction mechanism which uses information that is not included in the AINV. In essence,the WIP signal cancels out the inventory signal and increases the contribution of AVCON in reaching a steady state.The WIP deficit is formulated by subtracting the actual WIP signal from the desired WIP level, which is producedbased on the average measure of current market demand. The pipeline policy aims at the reduction of this discrepancyand is the third element (along with the demand policy and the inventory policy) that is used in the construction ofORATE. Compared to the inventory policy, the pipeline policy identifies faster the need to increase or reduce ORATE,especially when sudden changes are observed in the market demand. In general, the inclusion of the WIP control loopreduces the rise time and increases the percentage overshoot of ORATE, but increases the time required to reach thesteady state.

A number of models belonging to the IOBPCS family are presented in Table 1, along with the respective definitionsof the four components that the designer can manipulate. When a first order lag is used as the demand smoothing policy,

3536 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

the link between the tuning parameter Ta in the s-domain and the parameter a in the z-domain can be approximated asfollows:

a = 1

(1 + (Ta/Tm)), (3)

where Tm is the sampling interval as indicated above.The IOBPCS model was the first to be studied extensively in its continuous-time format by Towill [21]. From Table 1,

we can observe that the WIP feedback loop is not considered in the IOBPCS model. Moreover, the target inventory levelis fixed as it is not influenced by modification/fluctuations in customer demands. The completed production (lead time)is a delayed version of the ORATE and is modeled by a first order lag (time constant Tp), while the demand-averagingprocess (demand policy) is also represented by a first order lag (time constant Ta). The ordered production (inventorypolicy) is computed as the summation of the average consumption and a fraction (1/Ti) of the inventory deficit. Thetransfer functions between the variables COMRATE, AINV and the disturbance CONS are given below:

AINV

CONS= −Ti

TaTps2 + (Ta + Tp)s

(Tas + 1)(TiTps2 + Tis + 1), (4)

COMRATE

CONS= (Ta + Ti)s + 1

(Tas + 1)(TiTps2 + Tis + 1). (5)

The characteristic equation is common for the two transfer functions. It is a third order polynomial, defined as theproduct of a first order term and a quadratic term. Since all coefficients of both terms are positive, the transfer functionsare stable for all non-zero choices of the tuning parameters. It is important to note that the tuning parameter of the feed-forward component Ta is not involved in the quadratic term and thus it does not affect the generation of any oscillatorybehavior. Regarding the first performance objective, selection of the parameter Ti in the range [Tp–2Tp] leads to welldesigned second order systems, since this corresponds to damping ratios 0.5–0.707. Based on the calculation of thenoise bandwidth of transfer function (5) using the assumption that the disturbance signal is white noise, it was foundthat both the feed-forward and feedback signals are contributing to the attenuation of consumption rate fluctuations, byincreasing the values of the tuning parameters Ta , Ti. However, higher values of Ti are more effective in disturbanceattenuation. In order to compromise between inventory recovery and random disturbance rejection, transformation ofthe transfer function (5) to a third order coefficient plane model proved extremely useful. Based on this analysis, it wasfound that both objectives are met successfully, by selecting the time-to-adjust inventory and demand averaging timeof comparable magnitude to the production delay time.

The IOBPCS was also studied by Agrell and Wikner [31], using a multi-criteria decision-making (MCDM) approach.They used the model in a case study, where the proposed generic MCDM design method for dynamical systems wasapplied, by considering the response speed and response smoothness as different objectives to be optimized. Morespecifically three criteria were utilized, namely the rise time of COMRATE, the overshoot of COMRATE and theundershoot of the inventory level (AINV) following a step change in the market demand (CONS). Several sets of thetwo tuning parameters Ta , Ti were obtained, according to the importance that is given to the three conflicting criteria.

The only difference of the variable inventory and order based production control system (VIOBPCS) compared tothe IOBPCS model is that instead of using a fixed desired inventory level, DINV is set as a multiple k of current averagesales rates (AVCON). This way the target inventory stock is reduced in a falling market and conversely it is increasedin a rising market. The two models were compared by frequency analysis tools [32]. It was found that for a reasonablechoice of the tuning parameters, the IOBPCS model shows lower ARs and larger phase shifts. These observations inthe frequency domain are translated to lower production capacity requirements, lower risks for stock-outs, but at thesame time slower responses of the IOBPCS model compared to the VIOBPCS model.

The automatic pipeline inventory and order based production control system (APIOBPCS) model [33] uses a constantinventory level set point and utilizes all three control policies (demand, inventory, pipeline policies) to determineORATE. Compared to the IOBPCS model, the addition of the WIP controller allows us to decouple the dampingratio from the natural frequency. This in turn leads to more successful results, including a more effective filtering ofhigh frequency noise which is present in customer demand signals, although at the expense of a slight increase inCOMRATE. In the APIOBPCS model, the designer has to choose values for three tuning parameters, namely Ta , Ti,Tw. Table 1 indicates that the target WIP level is formulated by multiplying the average sales with Tp, which is the time

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3537

constant associated with the first order time lag representing the lead time. Pure time delays in the WIP feedback loopare usually due to inaccuracies in the recording of WIP on the real world shop floor. Disney et al. [34] selected the tuningparameters, so that robustness to production lead times and pipeline level information fidelity and system selectivityare achieved. More specifically, a single objective function was formulated describing the performance of the systemin terms of the aforementioned criteria for various lead times, orders of approximations of the production delays andtime delays in the WIP feedback loop. The optimization problem was solved using a standard genetic algorithm andled to similar values for the tuning parameters with those derived by more conventional techniques [33]. The procedurewas repeated several times by giving more emphasis on the inventory recovery as opposed to the capacity costs andvice versa. The results showed that when attenuation of demand fluctuations becomes more important, larger values ofthe Ti and Tw tuning parameters are obtained, meaning that the inventory and most importantly the WIP informationbecome negligible. The opposite happens when more emphasis is given on the inventory recovery objective.

Disney and Towill [29] studied a special case of the discrete-time APIOBPCS model, where Ti is set equal to Tw.It was named DE-APIOBPCS model due to Deziel and Eilon, who first studied this case [35]. This particular choice ofthe tuning parameters simplifies considerably the model since the lead time cancels out of the ORATE/CONS transferfunction. Moreover, the system is guaranteed to be stable and is robust to a number of nonlinear effects. It was foundthat the bullwhip effect can be reduced by increasing the average age of forecast and by reducing lead times.

Riddalls and Benett presented a modified version of the APIOBPCS model in the continuous-time domain [36],where instead of the standard exponential smoothing forecasting mechanism, a moving average forecasting approachwas employed. Infinite order (pure delay) was used to model the lead time component. The authors of the paper obtainedstability criteria for the APIOBPCS model by recasting it into a Smith predictor and using the Bellman and Cooke’stheorem [37]. They showed that if the ratio Ti/Tw is greater than 0.5 the model is stable independent of the delay.In the opposite case, stability is achieved only if the lead time is below an upper bound which is defined as a functionof Ti and Tw. The stability boundary was later corrected by Warburton et al. [38], who verified their results using asecond order Padé approximation.

Based on the APIOBPCS model, Zhou et al. presented a hybrid system containing both manufacturing and reman-ufacturing [39]. Remanufacturing of products has received a lot of academic and industrial interest during the lastyears, because of the considerable savings it offers to companies and the significant environmental benefits that help inmeeting the strict environmental legislations. Zhou et al. [39] studied the effect of including a remanufacturing processin the dynamics of the system. The manufacturing process was modeled by a typical continuous-time APIOBPCSmodel. In particular, the lead time was modeled by a first order delay and the standard exponential smoothing was usedas the forecasting mechanism. In the remanufacturing loop, a kanban policy [40] was employed to represent a pullsystem. The kanban policy is designed specifically to replenish inventory in just-in-time manufacturing. It was shownthat the kanban policy can be modeled as an inventory based production control system (IBPCS) (see Table 1). It wasassumed that the products produced by the remanufacturing process are as good as new, so that a common finishedgoods inventory is used to store the production of both processes. The common inventory level is the trigger for themanufacturing and remanufacturing processes with a predefined routing probability, which for the remanufacturingprocess is equal to the return yield. The performance of the system was analyzed in terms of a step change in thecustomer demand with respect to all the tuning parameters that now contain the ones of the remanufacturing loop.It was found that good settings determined in previous studies for the single loop APIOBPCS, provide satisfactorytransient responses. The most interesting result, however, was that by including the remanufacturing process, we canachieve faster responses to market demands and lower risk of stockout and over-ordering. The hybrid system wasproven robust to changes in the return yield and the lead times of the two processes. However, the performance of thesystem was investigated only with respect to the first objective stated in the beginning of this section. It was not testedto other than step input changes, such as random inputs which are closer to real world applications.

Dejonckheere et al. studied the AR of the ORATE/CONS transfer function in single stage discrete-time systems [41].They showed that using order-up-to replenishment policies, AR is high in all frequencies regardless of the demandforecasting mechanism and this leads to the generation of the bullwhip effect. More specifically, in any order-up-topolicy, ordering decision is as follows:

ORATE(t) ={

S(t) − (AINV(t) + AWIP(t)), if AINV(t) + AWIP(t) < S(t),

0, if AINV(t) + AWIP(t)�S(t),(6)

3538 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

where S(t) is the time varying order-up-to level and the summation in the parenthesis (AINV and actual WIP) is theso-called inventory position. Various order-up-to policies differ on the way the order-up-to level is updated with time.When this level is estimated using an exponential smoothing demand forecast, the AR of the ORATE/CONS transferfunction is greater than 1 in all frequencies. This means that any demand pattern will be amplified. The bullwhip effectincreases as the parameter Ta decreases. When a moving average demand forecast is used, S(t) is calculated as follows:

S(t) =∑L−1

i=0 CONS(t − i)

L. (7)

For this type of forecasting mechanism, the AR plot has a sinusoidal shape. For some frequencies the AR is below 1,while for some others it is greater than 1. Obviously the bullwhip generated by this type of forecasting mechanism ismuch less compared to the one generated when exponential forecasts are used. Finally a “demand signal processing”[42] was studied where the order-up-to level is updated as follows:

S(t) = S(t − 1) + �(CONS(t) − CONS(t − 1)). (8)

The frequency response plot for this policy showed that the AR is greater than 1 in all frequencies but increasesproportionally with frequency, meaning that high frequency noise signals are highly amplified. It was shown that theorder-up-to policy is a special case of the automatic pipeline variable inventory and order based production controlsystem (APVIOBPCS) model, where the tuning parameters are selected equal to 1. With proper tuning of the twoparameters involved in the APVIOBPCS model, a desired frequency response plot can be obtained, where the AR isgreater than 1 only in small frequencies. For the remaining of the frequency spectrum, which represents undesirednoisy signals, the AR is smaller than one, thus decreasing the bullwhip effect.

The results of the above paper were extended by Dejonckheere et al. [43] to the case of a centralized supply chain,where customer demand data is shared throughout the chain. It was found that for order-up-to level replenishment rules,the bullwhip effect is reduced, but not completely eliminated. This finding agreed with the results reported by Chen etal. [30]. On the contrary, the APVIOBPCS model with a proper selection of the design parameters is able to reducevariance of demand and have a smoothing or dampening impact.

Lalwani et al. [44] presented discrete-time state space representations (matrices A, B, C, D) for several models in theIOBPCS family. This allows the analysis of IOBPCS models and the development of control strategies using advancedcontrol methods that are presented in subsequent sections. The state space models were derived by first transforming thediscrete-time transfer functions into the “control canonical form” [45], which is a block diagram involving z only as thedelay operator z−1. Particular emphasis was given to the state space representation of the APVIOBPCS model, whichwas checked for stability based on the eigenvalues of the A matrix. The results matched those obtained by applyingthe Routh criterion, after the Tustin transformation was applied to map the z-domain into the w-domain [46]. The statespace model also passed the controllability and observability tests. This is, however, expected, since the model wasderived from the discrete transfer function, which contains only the controllable and observable part of the system.

Several other modifications of the different components constituting the IOBPCS family of models were proposedby various researchers. White [47] showed that a more sophisticated inventory control policy, such as utilization ofa proportional-integral-derivative (PID) inventory controller can reduce stock levels by 80% and hence reduce cost.It is important to note that if the PID approach is adopted, the feedforward forecasting unit is no longer necessaryto eliminate the discrepancy between DINV and AINV. This is now accomplished by the integral action offered bythe I element of the controller. However, the PID approach has not received much attention in the literature, probablybecause it does not correspond to what is actually performed in real production–inventory systems, where forecast ispresented explicitly. Moreover, the addition of the integral and derivative elements complicates the tuning effort, sincetwo more tuning parameters are included in the model. A PID approach was also presented in Wikner et al. [24].

Both the APIOBPCS and APVIOBPCS yield successful results, based on the assumption that the lead time isestimated with accuracy. Otherwise, zero inventory offset cannot be achieved. However, this assumption is unrealisticin many situations. There are several sources of uncertainty involved in the lead time estimation, especially when themodel describes the dynamics in a manufacturing site [48]. Examples are: lack of raw materials, inconsistencies inthe human decision-making process, variations in shop-floor lead time due to the large number of products flowingthrough the shop floor, etc. In order to remove this assumption from the APIOBPCS model, an additional feedbackloop was proposed in the continuous-time format of the model [48,49]. The additional lead-time loop is nonlinear and

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3539

time-varying and is used to provide updated estimates of current lead time, which in turn updates the desired level ofWIP, DWIP. It was shown that significant advantage is gained by adapting the system to lead-time changes. Moreover,it was shown that by including an integral element in the inventory policy, we can avoid long-term stock drifts andimprove the system performance during a lead-time increase.

As far as the demand policy is concerned, except from the standard exponential smoothing presented in Table 1, severaldifferent approaches have been proposed. Dejonckheere et al. [50] investigated the utilization of a linear (Type II) orquadratic (Type III) instead of a constant (Type I) exponential smoothing forecasting mechanism in the continuous-timeAPIOBPCS model. The transfer functions between AVCON and CONS for the two alternative forecasting mechanismsare the following:

Type II forecasting mechanism:

Ga(s) = 2Tas + 1

T 2a s2 + 2Tas + 1

. (9)

Type III forecasting mechanism:

Ga(s) = 3T 2a s2 + 3Tas + 1

T 3a s3 + 3T 2

a s2 + 3Tas + 1. (10)

The two mechanisms are able to produce zero steady-state offsets in AVCON for a ramp and parabolic change in theinput variable CONS. However, as we move to higher order models, we observe more oscillatory transient responseswhen the same value of the parameter Ta is used. The demand amplification problem (“bullwhip” effect) can beresolved by adjusting the value of the parameter Ta downwards as we move to higher order systems. Calculationof the COMRATE/CONS transfer functions corresponding to the three different forecasting techniques shows thatno significant benefits are obtained by using more sophisticated forecasting methods. All three configurations trackadequately step and ramp input changes and provide a constant error when a parabolic change is given in the inputvariable CONS. The only benefit of using a higher order forecasting mechanism is that by a careful selection of theparameter Ta the production adaptation and inventory costs are slightly reduced.

Grubbström and Wikner [51] studied traditional inventory replenishment systems in terms of control theory. In thesesystems, inventory is replenished in batches after a certain lead time, when the stock levels reaches or falls below atrigger level, which is the reorder point. It was shown that inventory trigger control policies can be mathematicallydescribed by difference or differential equations involving Heaviside and Dirac impulse functions, which are able toreproduce the typical sawtooth inventory pattern.

Based on the funnel model and the theory of the logistic operating curve, Wiendahl and Breithaupt [52] developeda continuous-time model for a single working center, which contains four input and four output variables. However,since there are dependencies among the output variables which are linked through the funnel formula, only two controlloops are required to control the system. More specifically, the first controller adjusts the capacity of the work systemto reduce the backlog to zero as fast as possible. In the second control loop the target WIP is the reference value whilethe input rate of the system is the respective manipulated variable.

A quite different approach from the IOBPCS family of models has been developed by Grubbström’s group fordeciding on the production schedule in a single working center. In contrast to previously mentioned techniques, thisapproach explicitly takes into account costs and/or revenues. More specifically the following problem was posed:

Determine the optimal sequence of production quantities over a finite horizon, with respect to the number of batches,the batch sizes and their timings, assuming that:

(a) Production takes place in batches of possibly different sizes.(b) External demand is a stochastic process where stochastic events are separated by stochastic time intervals with a

given probability function.(c) The production lead times are deterministic.

The problem was solved for one-level systems [53] (i.e. assembly of products in the working center does not requireother products from the same center), under the assumption that demand follows the Poisson process, which is thesimplest possible stochastic process. The objective function to be maximized was the annuity stream which is a variation

3540 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

of the net present value (NPV). More precisely, it is the constant stream of payments corresponding to a given NPVdetermined from the cash flow within the finite horizon. The cash flow in turn is made up by the in-payment for sold unitsand the out-payments for set-up costs and variable production costs. Laplace transforms were found extremely usefulin solving the problem, since they were used to model the dynamics of the system, capture the stochastic propertiesby serving as generating functions and assess the resulting cash flows when adopting the NPV principle. In a previouspublication [54] the system was optimized with respect to a different objective function, consisting of the set-up cost,the inventory holding cost and the backlog cost.

This approach together with the input–output analysis is suitable for describing multi-level, multi-stage (MLMS)production–inventory systems. An extra degree of complexity is introduced in those systems due to the fact that oftenthere is a high degree of commonality of components and materials between products at different stages. Thus, everyexternal order generates internal orders that have to be accounted for. Input–Output analysis is a technique used todescribe in a matrix form the multi-item production case with a linear or proportional dependence [55]. Preliminaryresults using simple ordering policies, such as fixed order quantities are presented by Grubbström and Ovrin [56].Recently, publications from the same group presented results for MLMS capacity constrained systems with zero leadtimes and stochastic demands [57] or non-zero lead times and deterministic demands [58]. Dynamic programming wasadopted as the solution procedure in both cases. An extensive overview of publications focusing on MLMS systemsusing input–output analysis and Laplace transforms can be found in the paper of Grubbström and Tang [59].

Popplewell and Bonney [60] used discrete time linear control theory for the analysis and simulation of MLMSsystems. They considered each level and stage as a different element whose dynamics are represented by a z-transferfunction. Inputs and outputs to each element were considered as time-series signals, also represented by z-transforms.Two different ordering policies were considered: the re-order cycle system, where the only input to each element isdemand arising from production schedules at the next level and a material requirement planning (MRP)-type system,where in addition each element receives as input the external demands. The models were checked for stability and wereused to provide transient responses and responses to random noise.

Wikner [61] presented a methodology that introduces structure dependencies of MLMS systems in the IOBPCSproduction control framework. The methodology uses matrix representation to account for multiple informationalchannels. It was shown that for a single-level single-stage system, the model is reduced to the standard IOBPCSformat. The extended model has the capability to describe the dynamics of both pull-driven (base stock, kanban) andpush-driven (MRP) policies.

Burns and Sivazlian [62] considered a multi-echelon supply chain, where each echelon uses a typical discrete-timedecision rule for placing orders, consisting of a replenishment term and an inventory adjustment term. The first termequals the orders received during the same time period, while the second term removes a fraction of the gap between thedesired and the actual inventory. The inventory adjustment term involves a forecasting mechanism that exponentiallysmoothes the demand from the previous supply point. The order received by the first supply point in the chain, constitutesthe demand imposed upon the total system. Using z-transforms, the transfer functions between the discrete-time signalsrepresenting the orders placed by one echelon and the orders received by the same echelon were derived. The dynamicresponse of the system was tested by introducing unit step changes and uniformly distributed random numbers to theexternal demand. The results showed that even in a two-echelon system, minor variations in the consumer demand areamplified into major disturbances into the last supply point. The disturbances are becoming more severe as we increasethe number of echelons that constitute the system. The amplification is due to the unavoidable inventory adjustment,but also to an unwanted “false order” effect. The latter arises because adjustment is based solely on the order receivedform the next lower level, which in turn contains the adjustments from all lower levels. Based on the discrete-timetransfer function, a recovery operator was proposed, so that in each echelon the original input received by the firstsupply point is recovered. The application of this recovery operator finally led to the derivation of a new decisionrule that suppresses the “false order” effect and experiences fewer stock-outs, lower average inventories and smootheramplifications throughout the supply chain.

Wikner et al. [63] considered a three-echelon simplified Forrester production system, which was transformed intoa block diagram representation in order to test methods for improving total dynamic performance. Five alternativestrategies for smoothing the supply chain dynamics were tested in terms of the response of the three echelons to a stepinput in the market demand: tuning the existing echelon rules, reducing time delays, removing an echelon from thesupply chain, changing individual echelon rules by taking into account pipeline information and making available truemarket demand to all the echelons in the supply chain. All five strategies which can be used in combinations of two or

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3541

more improved the dynamic response of the system. The most effective, however, was the last strategy, thus illustratingthe importance of information flow.

Disney and Towill [64] studied a simple vendor managed inventory (VMI) supply chain consisting of one produc-tion unit and one distributor. In VMI systems all supply points in the chain have access to stock positions for settingproduction and distribution targets. The discrete-time APIOBPCS model was used to describe the dynamics of themanufacturing unit. Pure delay was initially utilized to model the production delay. The only difference to the API-OBPCS structure presented previously is that instead of the demand signal CONS, the manufacturing facility receives a“virtual” consumption signal. This is produced by adding in each time period the demand signal received by the distrib-utor to the difference between the current time period and the previous period reorder point. The reorder point is timevarying and is defined as a multiple of the average (exponentially smoothed) demand received by the distributor. Thesystem was checked for stability. The stability criteria that were produced are also valid for the standard APIOBPCSmodel, since the distributor’s policy described previously is a stable feed-forward element. It was found that whenTi = Tw the stability of the system is guaranteed. If the equality does not hold, stability criteria were obtained, whichare expressed as inequalities involving the tuning parameters Ti, Tw. Another important result is that when Ti = Tw thesystem is robust to changes in the distribution of the production delay. However, setting Ti = Tw leads to conservativedesign.

An integrated continuous-time approach for modeling the dynamics in supply chain management systems was pre-sented by Perea et al. [65,66]. The framework they presented considers the entire supply chain, including manufacturingsites with single-unit multi-product processes, multi-product multi-stage distribution network and the end customers.The dynamic behavior of the supply chain is captured by modeling separately the flow of materials and information. Thedelivery rate between nodes is modeled so that the amplification “bullwhip” effect can be reproduced. A heuristic pro-duction policy is chosen, according to which each time we have to make a decision on the product to be manufactured,the product with the highest backlog is selected. The plant continues making this product until the batch is completed.The production policy in general is not optimal, but ensures that the system is stable. The orders placed between nodesare considered as manipulated variables and the ordering policies as control laws. Four different ordering policies wereanalyzed with respect to their influence on the dynamic behavior of supply chain systems. The first (base) policy isthe standard policy where the set points for the inventory levels are fixed. According to the second policy, the orderingrate of each node to its upstream product node is proportional to the total amount of orders received by the node. Withthe third policy, the ordering rate is proportional to the difference between the inventory level and the total existingbacklog at the node. Finally, the fourth policy increases the ordering rate of the third policy by a term which accountsfor providing all nodes with information about the actual orders from the customers. The four policies were testedin case studies where step changes or periodic changes were introduced in the customer demands. The policies werecompared in terms of quantitative indices describing the total operational (storage and production) cost and customersatisfaction level. The “bullwhip” effects of the four policies were also examined. The results show that policies 1 and4 offer a higher customer satisfaction level at an extra storage cost. The base policy proved more successful as far asdampening of amplification is concerned. Finally, there was agreement with previous studies on the improvement of theperformance of the system when information about end customers’ demand is available throughout the entire supplychain.

Lin et al. [67] presented a discrete-time model of a supply chain system, using z-transforms to obtain the transferfunctions for each unit. The supply chain is assumed to have no branches, so that each logistic echelon has only oneupstream node and one downstream node. The dynamics of each particular node in the chain was modeled in detailin a block diagram form. The bullwhip effect was studied by obtaining the transfer function between the order signalplaced to the upstream node and the signal representing the customer demand. By using a proportional controller onthe difference between the desired set point and the actual inventory position, the authors found that the bullwhip effectcan be avoided when the gain is less than 1. When the inventory position target is not fixed, but changes according to aforecasted demand based on an exponential filter, an even smaller gain is required to suppress the bullwhip effect. Forthis case (variable inventory position target) and assuming a stochastic demand from downstream orders, three orderingpolicies were examined: P and PI controllers acting on the difference between the set point and the actual inventoryposition and a cascade PI controller where the input is the filtered trend of the same difference. The performance of thealternative ordering policies was evaluated with respect to the transient response, the bullwhip effect, the back-orderlevel and the excess inventory caused when stochastic changes are introduced in the customer demand. It was foundthat the P controller was not able to drive the system to the desired set point. The two PI controllers improved all

3542 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

the performance criteria. In particular, the cascade PI structure was found superior in meeting customer demand andsuppressing the bullwhip effect, but as far as the two remaining performance criteria are concerned it was slightlyinferior to the standard PI scheme.

3. Dynamic programming and optimal control

Due to their dynamic and uncertain nature, production/inventory problems can be naturally formulated as dynamicprograms. Dynamic programming is the standard procedure to obtain an optimal state feedback control law for stochasticoptimal control problems.

For sake of completeness, we will briefly describe the dynamic programming philosophy, in the discrete-time, finite-horizon setting [9]. In the continuous-time framework the philosophy is basically the same, although more technicalitiesare involved. Consider the discrete-time dynamic system:

x(t + 1) = f (x(t), u(t), d(t)), t = 0, . . . , T − 1. (11)

The state x(t) is constrained to lie in a set X ⊆ Rn, while the control (also called the vector of manipulated variables) u(t)

must belong to U ⊆ Rm. The exogenous disturbance d(t) is a random vector characterized by a probability distributionwith support D ⊆ Rp that may depend explicitly on x(t) and u(t), but not on prior disturbances. Furthermore, consider afunction � : X×U×D �→ R representing the one-stage cost. An admissible state feedback control law (or, equivalently,control policy or decision rule) � is a sequence � = {k0, . . . , kT −1}, where kt are vector functions mapping states x(t)

into controls u(t) = kt (x(t)) and are such that kt (x(t)) ∈ U. Finally, we denote by �(x)�{� : x(0) = x, kt (x(t)) ∈U, f (x(t), kt (x(t)), d(t)) ∈ X, t = 0, . . . , T − 1} the set of all admissible policies. The finite-horizon cost associatedwith an admissible control policy � starting from a given initial state x(0) is

V�(x(0)) = E

[T −1∑t=0

�(x(t), kt (x(t)), d(t))

]. (12)

Our goal is to find an optimal control policy �∗, i.e. a policy that minimizes the finite-horizon cost (we make thesimplifying assumption that such a policy exists, thus we can use min instead of inf in the following equations):

V�∗(x(0)) = min�∈�

V�(x(0)). (13)

Notice that the control policies we are interested in are closed-loop policies, in the sense that they determine values forthe manipulated variables once the state of the system becomes known at each time period. This is the main difference offeedback policies resulting from closed-loop optimization, as opposed to open-loop policies, which determine values(and not functions of the state) for the manipulated variables over the time horizon. Closed-loop policies can takeadvantage of the extra information revealed in each time period and thus they lead to lower costs than open-looppolicies.

Dynamic programming is based on the principle of optimality to solve the optimal control problem. The principle ofoptimality simply states that if a policy �∗ ={k∗

0 , . . . , k∗T −1} is optimal for the optimal control problem over the interval

t=0, . . . , T −1 then it is necessarily optimal over the subinterval t=�, . . . , T −1 for any � ∈ {0, 1, . . . , T −1}. Dynamicprogramming uses this concept to formulate the problem as a recurrence relation. Thus, the dynamic programmingalgorithm decomposes the optimal control problem by solving the associated sub-problems starting from the last timeperiod and proceeding backwards in time. Mathematically, the algorithm is described by the following equations:

VT (x(T )) = 0,

Vt (x(t)) = minu(t)∈Ut (x(t))

Edt [�(x(t), u(t), d(t)) + Vt+1(f (x(t), u(t), d(t)))], t = 0, 1, . . . , T − 1. (14)

Hence, applying the dynamic programming algorithm we get the optimal cost V0(x(0)). Furthermore, if u∗(t)=k∗t (x(t))

minimizes the right-hand side of (14) for each x(t) and t, the policy �∗ ={k∗0 , . . . , k∗

T −1} is optimal. In this case, in each

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3543

iteration the dynamic programming algorithm gives the optimal cost-to-go for every possible state, which we denoteby V ∗

t .The concept of dynamic programming has been prevalent in formulating, analyzing and solving the very first problem

from inventory theory, which is the basic building block of supply chain models. We will now present the simplestapplication of dynamic programming in inventory management [68]. We consider a single-echelon, single-productsystem where the problem is to optimally select orders u(t) of the product in order to meet uncertain demand d(t),while minimizing the total expected purchasing, inventory and shortage cost. The dynamics of the system are describedby the following state space equation:

x(t + 1) = x(t) + u(t) − d(t), (15)

where x(t) is the inventory level at the beginning of tth period. It is also the state variable of the dynamic system sinceit summarizes the sufficient information we need to make a decision. The order u(t) is the control or manipulatedvariable and the demand d(t) is the exogenous disturbance. Furthermore, we assume that disturbances between stagesare independent random variables and that excess demand is backlogged and filled as soon as additional inventorybecomes available. The one-stage cost function is

�(x, u, d) = cu + h(x + u − d)+ + p(x + u − d)−, (16)

where c, h and p are purchasing, holding and shortage unit costs, respectively, z+ =max(0, z) and z− =max(0, −z). Bymaking the transformation y(t)= x(t)+u(t) and applying the dynamic programming algorithm, through an inductiveargument one can show that the cost-to-go functions are convex and minimizing scalars y∗(t) := S(t) exist for theunconstrained problem. Thus, an optimal policy is determined by the sequence of scalars {S(0), S(1), . . . , S(T − 1)}and has the form:

k∗t (x(t)) =

{S(t) − x(t) if x(t) < S(t),

0 if x(t)�S(t).(17)

This type of policy is often referred as a base-stock or order-up-to policy (please also see Eq. (6)).Clark and Scarf [69] were the first to show that the optimal feedback law for a multi-echelon system is a base-stock

policy for each echelon when the demand volumes in different periods are independent and identically distributed.They formulated the problem as a discrete-time finite-horizon optimal control problem and showed that the optimalbase-stock levels can be solved by a series of single location inventory problems with appropriately adjusted penaltyfunctions. Scarf [70] derived the optimal ordering policy for a single facility facing uncertain demand and with setupcosts associated with inventory orders. By formulating the respective dynamic programming equations for the finite-horizon problem and showing that the value function is K-convex, he came up with the so-called (s, S) policy withs < S, meaning “order-up-to S whenever the inventory level falls to or below s”. Iglehart [71] proves that a stationary(s, S) policy is optimal for the infinite-horizon problem. These results were of great theoretical and practical importanceand served as the precursors of modern supply chain theory.

Following the pioneering work of Clark, Scarf and coworkers, many researchers extended this work by treating morecomplicated problems which are closer to reality. Hausman and Peterson [72] considered a single echelon, capacity-constrained multi-product system with terminal demand, where the forecasts for total sales follow a lognormal model.They formulated the problem as a dynamic programming problem and provided heuristic solutions. Federgruen andZipkin [73] extended the multi-echelon model of Clark and Scarf [69] to the infinite-horizon case. They concludedthat the infinite-horizon discounted cost case is simpler to solve than the long-term average cost case, a result thatis typical in dynamic programming literature. They also provided a closed form solution for a two-echelon systemwhen inventory holding and shortage cost are linear and customer demand follows a normal distribution. However, thecomputations for systems with more than two echelons become prohibitively complex.

On the other hand, many researchers have recognized the dynamic nature of the demand process in many cases, andthus attempted to incorporate it to production inventory models, digressing from stationary demand models. Iglehartand Karlin [74] examined a discrete-time, continuous state formulation of the single location inventory problem, wheredemand is a function of a discrete-time, finite-state Markov chain. They proved that a state feedback base-stock policy is

3544 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

optimal for the discounted infinite-horizon problem. Song and Zipkin [75] studied the continuous-time, discrete-state,discounted-cost model with fixed costs and by formulating the problem as a dynamic program with two state variables(inventory position and exogenous information), they showed that a state feedback (s, S) policy is optimal. Sethi andCheng [76] proved the optimality of a state feedback (s, S) policy for a single location discrete-time system with amore general cost structure than that of Song and Zipkin [75], in the case where demand follows a Markov processand fixed costs are present. Beyer and Sethi [77] proved the existence of an optimal state dependent (s, S) policy forthe long-run average cost problem. Based on the theory of impulse control, Bensoussan et al. [78] proved that an (s, S)

policy is optimal in a continuous-time stochastic inventory model with a fixed ordering cost when demand is a mixtureof a diffusion process and a compound Poisson process with exponentially distributed jump sizes or a mixture of aconstant demand and a compound Poisson process. The Bellman equation of dynamic programming for such a problemreduces to a set of quasi-variational inequalities (QVI). Note that all the above papers discuss problems involving asingle location.

Dong and Lee [79] used approximations for the induced penalty term introduced by Clark and Scarf [69] to providean approximate, simple closed-form solution to the multi-echelon inventory problem. They extended their results inthe case of time-correlated demand processes using the Martingale model of forecast evolution (MMFE) of Heath andJackson [80] and Graves et al. [81] to model the forecast process.

In order to exploit advanced information about customer demand that some companies have the ability to obtain,dynamic programming models were developed by various authors taking into account this extra flow of information.These studies manage to further reduce costs by successfully incorporating forecast updates in the stochastic dynamicmodels. Gallego and Ozer [82,83] studied optimal replenishment policies for single and multi-echelon uncapacitatedinventory systems with advance demand information. They modeled the forecast evolution as a super martingaleand proved optimality of a state-dependent (s, S)-type policy. Ozer and Wei [84] established optimal policies for acapacitated inventory system with advance demand information. Sethi et al. [85] developed a model of forecast updatesthat is analogous to peeling layers of an onion in the sense that for each demand period more information is revealed astime passes. They also considered the ability of ordering from a fast and a slow supply source, the former with largerordering cost. They established the existence of an optimal Markov policy by showing that the value function is convexand by utilizing the measurable selection theorem appearing in Bensoussan et al. [86] under appropriate assumptions.They further proved that the optimal policy has the structure of a forecast update dependent base-stock policy. Sethiet al. [87] extended the work of Sethi et al. [85] to include fixed costs ending up with a forecast update dependent(s, S)-type optimal policy, while in Feng et al. [88] delivery modes were considered.

Simchi-Levi and Zhao [89] analyzed the value of information sharing of the retailer and the producer in a two-echelon supply chain with a finite-production capacity over a finite-horizon. They examined three scenarios, each witha different level of information sharing and optimality. They used dynamic programming to derive qualitative resultsfor each scenario and concluded through computational experiments that information sharing can be very beneficial,especially when the manufacturer has excessive capacity.

It is worth noticing that for the above references, dynamic programming serves as a tool for proving existence ofoptimal feedback control laws and characterizing their general form. However, it is not employed as a computationaltool due to the curse of dimensionality which is prevalent even for simplified, medium-scale supply networks. Inorder to solve these complex stochastic control problems, some kind of simplifying assumptions, decompositions orapproximations need to be considered.

Another production problem that has been investigated thoroughly by the control research community is the produc-tion planning of manufacturing systems with unreliable machines. Olsder and Suri [90] were the first to formulate theproblem of controlling manufacturing systems with unreliable machines, as a stochastic control problem. The relatedoptimal control problem falls under the general class of systems with jump Markov disturbances [91,92] or piecewisedeterministic processes [93]. In their model, each machine is subject to random breakdown according to a homogeneousMarkov process. However, they recognized that the problem was intractable due to the difficulty of solving the HJBequations characterizingng the optimal control.

We will now briefly describe the generic model of a flexible manufacturing system (FMS) with m machines producingn parts. We assume that the machines are completely flexible meaning that there are no setup costs and times involved.We also assume that machines are failure prone and repairable. We denote by {sj (t), t �0} the stochastic processdescribing the operational mode of machine j, j = 1, . . . , m. We have sj (t) = 1 when machine j is in the functionalmode and sj (t) = 0 when machine j is under repair, i.e. sj (t) ∈ Sj := {0, 1}. Now, the operational mode of the

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3545

stochastic FMS can be described by the random vector s(t)= (s1(t), . . . , s2(t))′ describing the mode of the system and

taking values in S = S1 × · · · × Sm. Obviously the set S has 2m states. We further assume that the stochastic process{s(t), t �0} is modeled by a continuous-time Markov chain with a known generator Q, where Q = {qab}, a, b ∈ S isa 2m × 2m matrix such that qab �0 for a �= b and qaa = −∑

a �=b qab. The production flow dynamics are described bythe following state equations:

x(t) = u(t) − d, x(0) = x, (18)

where x is the initial vector of surplus levels, x(t) ∈ Rn is the surplus level (inventory/shortage) at time t, u(t) ∈ Rn+denotes the production rate at time t such that ui(•) = ∑m

j=1 uij (•), i = 1, . . . , n, with uij describing the productionrate of product i on machine j and d ∈ Rn is a known constant positive vector denoting the demand rate.

The set of the feasible feedback control policies depends on the stochastic process {s(t), t �0} and is given by

U(a) = {u(t) ∈ Rn, 0�uj (t)�ujmax(a), j = 1, . . . , n},

where ujmax(s(t)) = ∑m

i=1 umaxij is the maximal production rate of part type j at mode a. We denote by Ft the natural

filtration generated by {s(t), t �0}, i.e. Ft = �(s(�): 0��� t). A control u(·) = {u(t), t �0} is said to be admissibleif u(·) is Ft -measurable and u(t) ∈ U(s(t)) for all t �0. A Ft -measurable function u(x, s) is an admissible feedbackcontrol if for any given initial x, the state equations have a unique solution and u(·) = u(x(·), s(·)) is an admissiblecontrol. It is easy to see that the system dynamics are described by a hybrid state containing both a discrete and acontinuous component.

In the first class of problems considered in the literature, the objective is to find an admissible feedback controllaw u(·) ∈ U(·) so as to minimize the expected discounted cost of production and inventory/backlog cost over aninfinite-horizon, given by

J d(a, x, u(·)) = E

[∫ ∞

0e−�t (h(x(t)) + c(u(t))) dt |x(0) = x, s(0) = a

], (19)

where � is the discount factor and h(·, ·) denotes the instantaneous production and inventory/backlog cost function.The second class of problems deals with the long-run average cost of the form:

J a(a, x, u(·)) = lim supT →∞

1

TE

[∫ ∞

0e−�t (h(x(t)) + c(u(t))) dt |x(0) = x, s(0) = a

]. (20)

We define the value function of the infinite-horizon discounted stochastic optimal control problem, given initial surplusvector x and initial mode a, i.e.

V (x, a) = infu(·)∈U(a)

J (a, x, u(·)). (21)

It can be shown that the optimal value function satisfies the following set of HJB equations [91]:

�V (x, a) = minu∈U(a)

{(u − d)′Vx(x, a) + c(u)} + h(x) +∑

b

qabV (x, b), (22)

where fx denotes the partial derivative of the function f with respect to x.The HJB equation associated with the long-run average cost criterion is of the following form:

= minu∈U(a)

{(u − d)′Wx(x, a) + c(u)} + h(x) +∑

b

qabW(x, b), (23)

where is a constant, and W is the so-called potential function. A solution to the HJB is a pair (, W).To this end, let us stress the fundamental difference of these two problem classes. The discounted-cost criterion

considers short-term costs to be more important than long-term costs. On the other hand, the average cost criterionignores the short-term costs and considers only the distant future costs.

3546 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

Based on these concepts, Akella and Kumar [94] studied the infinite-horizon discounted-cost optimal control problemof a single, failure prone machine producing a single product (m= 1, n= 1), in order to fulfill a deterministic demand.The inventory and shortage costs are supposed to be linear functions where (h(x) = c+x+ + c−x−) and there isno production cost (c(u) = 0). The transition between the “functional” and the “breakdown” state is described by acontinuous-time Markov chain. Based on the HJB equation for the system under investigation, they derived a simpleformula of the optimal policy which involves the determination of the optimal inventory level, known as the “hedgingpoint”. In simple words the optimal policy is as follows: when the state of the machine is “functional” and the currentinventory level exceeds the hedging point, do not produce at all; if the current inventory level is below the hedgingpoint, produce at the maximum rate; if it is exactly equal, produce on demand. The idea behind this policy is that somenon-negative production surplus should be maintained at the times of excess capacity to hedge against future capacityshortages. Furthermore they proved that the value function is continuously differentiable. Note that this solution isvalid only when the inventory and shortage costs are linear. In the case of general convex costs and more than twomachine states, the explicit solution cannot be obtained. Eventually one has to resort to a numerical approach to findan approximate value function by solving the associated HJB equation. Bielecki and Kumar [95] extended the work ofAkella and Kumar [94] for the case of long-run average cost obtaining an optimal hedging point policy as well. Kimemiaand Gershwin [96] studied the multi-machine multi-part type problem and showed that the optimal feedback control isa hedging point policy. They recognized that for such complex problems, the derivation of the optimal policy in a closedform is not possible, thus they proposed an approximation scheme that calculates an approximate value function basedon off-line discretization, followed by the on-line solution of a linear program whenever the manufacturing systemchanges an operational state.

Sethi et al. [97] investigated the general infinite-horizon discounted problem for multiple machines and multiple parttypes where the demand and the capacity level are finite-state continuous-time markov chains and the inventory/shortageand production costs are general convex functions, using the viscosity solution technique [98]. They proved that thevalue function is convex and continuously differentiable. Finally, through rigorous arguments they showed that thevalue function is the unique viscosity solution for the associated HJB equation and, based on this fact, they definedthe turnpike sets as the attractors of the optimal trajectories for the system under investigation. It turned out that thehedging point policy Akella and Kumar [94] is a special case of the result of Sethi et al. [97].

Presman et al. [99] investigated the problem of optimal feedback production planning in a machine flowshop. Sincethe number of parts in the internal buffers of the flowshop cannot be negative, state constraints must be imposed andcertain boundary conditions need to be taken into account. Thus, the authors formulated the HJB equations in termsof directional derivatives at inner and boundary points. They also proved existence and uniqueness of an optimalfeedback control. The optimal feedback control for this kind of problems does not possess the structure of hedgingpoint policies. Presman et al. [100] extended these results for flowshops with limited storage capacities. Sethi andZhou [101] obtained the explicit solution for the deterministic two-machine flowshop problem with infinite-horizondiscounted cost criterion.

Gharbi and Kenne [102] examined the production control problem for a multiple-part multiple-machine manufac-turing system. Due to the inherent complexity of the HJB equations they resorted to the heuristic, simulation-baseddetermination of the parameters of a modified hedging point policy which gives the best approximation of the valuefunction.

Boukas and Haurie [103], addressed the optimal control problem of a single-product, multiple-machine manufac-turing system where the failure probabilities and the repair times are age dependent, and they added the possibilityof performing preventive maintenance in order to increase the availability of the production system. However, thecomplexity of the stochastic optimal control problem is increased due to the additional state variables representingmachine ages and the additional control variables representing transitions from the “functional mode” to “preventivemaintenance”. Therefore they proposed a numerical method based on Markov chain approximation [104] in order tosolve approximately the underlying HJB. This method is based on the discretization of the corresponding continuous-time, continuous state HJB equation, formulating an approximate discrete-time, discrete-state Markov decision problemand then solving it by policy iteration. However, the proposed numerical scheme still suffers from the curse of dimen-sionality, thus its use is restricted to small scale problems. In Boukas and Yang [105] a somewhat different problemis considered, in the sense that the maintenance procedure is executed while the machine is operating. They showedthat the optimal production policy is described by a critical surface. However, their results are valid only for the singlemachine problem. Recognizing the increased complexity of optimal control problems addressing both production and

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3547

maintenance, Boukas et al. [106], Kenne et al. [107], Kenne and Gharbi [108,109], Gharbi and Kenne [102] developedsub-optimal age-dependent hedging point policies. However, the determination of the threshold levels seems to be adifficult task involving simulation and heuristic arguments.

All the above papers rely on the assumption that the machines are completely flexible, meaning that no set-up costs orsetup times incur while switching from one product to another. This assumption was relaxed by Sethi and Zhang [110],where stochastic manufacturing systems with setups have been considered. Their study focused on the characterizationof the exact optimal policy via viscosity solutions of HJB equations. However, the optimal policies cannot be explicitlycomputed and one has to resort to numerical schemes in order to obtain a sub-optimal feedback controller. Yan andZhang [111] developed a numerical method for the solution of the optimal production and setup scheduling problem fora single machine producing multiple parts based on the Markov chain approximation scheme of Kushner and Dupuis[104]. They proved that the resulting policy is asymptotically optimal as the length of the finite-difference intervalapproaches zero. Boukas and Kenne [112] developed near-optimal control policies for production, maintenance andset-up scheduling for age-dependent failure prone machines using the discretization methods of Kushner and Dupuis[104]. Liberopoulos and Caramanis [113] investigated numerically several examples in order to characterize the valuefunction as well as the optimal production and set-up policy of the problem. Bai and Elhafsi [114] provided a suitable,heuristic production and set-up policy structure, the so-called Hedging Corridor policy (HCP). Gharbi et al. [115]proposed the modified HCP that guarantees lower cost than that of HCP. In order to calculate the optimal set of valuesfor the parameters of the proposed policy, they proposed a heuristic scheme based on stochastic optimal control theory,discrete event simulation and experimental design. Through two case studies they showed superiority of their proposedpolicy against HCP.

Demand uncertainty is one of the major factors affecting the decision making in production and control. Feng andYan [116] incorporated a Markovian demand in a discrete-state version of Akella and Kumar [94]. They considered anoptimal inventory/production control problem in a stochastic manufacturing system with a discounted cost criterionin which the demand, the capacity of production, and the processing time per unit are random variables. Song andSun [117] considered the optimal control problem of a serial production line with failure-prone machines and randomdemand. They showed that the optimal policy is a bang–bang control and that it can be determined by a set of switchingmanifolds. Based on the structure of these switching manifolds, they proposed sub-optimal policies which are easy toimplement in real systems.

Boukas and Liu [118] investigated a continuous-time inventory–production planning problem, where the productsdeteriorate and their value reduces with time. Examples of this kind of products are electronic devices (due to rapidtechnological changes), clothing and of course perishable goods such as foodstuff and cigarettes. In this case the stateequation (1) becomes

x(t) = −�x+(t) + u(t) − d, x(0) = x (24)

with � = diag(�1, . . . , �n) the matrix of deterioration rates for products i = 1, . . . , n and x+(t) = max(x(t), 0). Theyproved that the value function is convex and Lipschitz in x and that it is the unique viscosity solution of the underlyingHJB equation. For the special case of one machine (m=1) producing one product (n=1) they showed that the optimalpolicy is a modified hedging policy, presented by Akella and Kumar [94]. However, they too emphasize the fact thatthe closed form solution for general complex problems is very difficult to obtain.

Research progress on the problem with long-run average cost criterion was largely based on the explicit solutionof Bielecki and Kumar [95] for the single-machine single-product problem. Sharifnia [119] extended the Bielecki andKumar model to a machine with more than two machine states and showed how to calculate the optimal hedging point.Using Sharifnia’s method, Liberopoulos and Hu [120] dealt with an extension of the Bielecki–Kumar model in whichthere are more than two machine states and the transition rates of the machine states depend on the production rate.However, the preceding papers were mostly based on heuristic arguments and on the ability of being able to explicitlycompute the value function.

Perkins and Srikant [121] investigated the problem of a single failure prone machine producing multiple parts withthe objective of minimizing a long-run average cost. They restricted their investigation to the class of linear switchingcurve (LSC) and prioritized hedging point (PHP) policies. They provided a characterization of the set of problemparameters for which a Just-In-Time policy is optimal.

Sethi et al. [122] used the vanishing discount approach to prove optimality of the hedging point policy for convexsurplus cost and linear or convex production cost. Sethi et al. [123] examined the optimal production planning problem

3548 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

Stochastic Optimal Control Problem(Stochastic HJB)

Limiting deterministic Optimal ControlProblem

(Deterministic HJB)

AsymptoticallyOptimal Control

SingularPerturbation

Fig. 2. Hierarchical control methodology.

for multi-product FMS. They developed the appropriate dynamic programming equations and they proved an existencetheorem and a verification theorem for optimality, by starting from the discounted cost problem and using a vanishingdiscount approach. Sethi and Zhang [124] provided the explicit form of the optimal control policy for the long-runaverage cost problem of a single machine producing multiple-part types when inventory and shortage costs are equal.The policy they came up with can be considered as a variant of the kanban policy.

One can see that sufficient progress has taken place concerning theoretical results for FMSs with unreliable machines.However, the explicit solution is available only for the simplest problems. Unfortunately, today’s manufacturing systemsare large scale and complex, characterized by several decision subsystems with different time scales. The purpose ofhierarchical control methods is to approximately solve such kind of problems by exploiting their structure. The main ideaof hierarchical control is to reduce the problem’s complexity by replacing fast evolving processes with their mean values.By fast evolving processes we mean these processes that reach their stationary distributions in a time period duringwhich there are few, if any, fluctuations in the other processes. For example, in the above problem the stochastic processs(t) is fast evolving if the rate of change in the machine states is much larger than the rate at which the cost is discounted.This way, a deterministic limiting problem is constructed which is computationally more tractable. Then, based on thesolution of the limiting problem we can construct controls for the original system which are asymptotically optimalas the fluctuation rate of the fast evolving processes goes to infinity. However, it is not clear how to construct controlsfor the original systems. Usually, a lot of experimentation and intuition is required [125,126]. The essence behindthis approach is that strategic level management can ignore the day-to-day fluctuation in machine capacities, or moregenerally the details of the shop floor events, in carrying out long-term planning decisions. The lower operational levelmanagement can then derive approximate optimal policies for running the actual (stochastic) manufacturing systems.A schematic representation of hierarchical control methods appears in Fig. 2. A detailed exposition of hierarchicalcontrol method is presented in Sethi and Zhang [127], while a recent review presenting up-to-date results is that ofSethi et al. [128]. Extensive numerical results and comparisons with heuristic policies are presented in Samaratungaet al. [129].

4. Model predictive control

MPC or receding horizon control has now become a standard control methodology for industrial and process systems.Its wide adoption from the industry is largely based on the inherent ability of the method to handle efficiently constraintsand nonlinearities of multi-variable dynamical systems. MPC is based on the following simple idea: at each discrete-time instance the control action is obtained by solving on-line a finite-horizon open-loop optimal control problem,using the current state of the system as the initial state. A finite-optimal control sequence is obtained, from whichonly the first element is kept and applied to the system (Fig. 3). The procedure is repeated after each state transition[130–132]. Its main difference from stochastic dynamic programming and optimal control is that the control input is notcomputed a priori as an explicit function of the state vector. Thus, MPC is prevalent in the control of complex systemswhere the off-line solution of the dynamic programming equations is computationally intractable due to the curse ofdimensionality. However, when the optimal control problem is stochastic in nature, one can only obtain suboptimalsolutions, due to the open-loop nature of the methodology. A formulation of the MPC on-line optimization problem

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3549

Set Point

Prediction Horizon

x(t)

t+P

predictions

t t+1 time

FuturePast

Open loop inputs

x t+1 t )(x t+l t)(

u t+l t)(

u t+l t)(u t t)(

Fig. 3. Model predictive control philosophy.

can be written as follows. At time t we solve the following finite-horizon optimal control problem:

min{u(t+i|t)}M−1

i=0

{P∑

i=0

‖y(t + i|t) − ySP‖2Qx

+P∑

i=0

‖u(t + i|t) − uSP‖2Qu

+P∑

i=0

‖�u(t + i|t)|2Q�u

}

s.t. x(t + l + 1|t) = f (x(t + l|t), u(t + l|t), d(t + l|t)), l = 0, 1, . . . , P − 1,

y(t + l|t) = g(x(t + l|t), u(t + l|t), e(t + l|t)), l = 0, 1, . . . , P ,

umin � u(t + l|t)�umax, l = 0, 1, . . . , P − 1,

ymin � y(t + l|t)�ymax, l = 0, 1, . . . , P ,

�umin � u(t + l|t)��umax, l = 0, 1, . . . , P − 1. (25)

where uSP, ySP are the set-points (desired values) of input and output vectors, d(·|t), e(·|t) are predictions for thedisturbances, x(·|t), y(·|t) are predictions for the state and output vector, f (·, ·, ·) and g(·, ·, ·) are the functions of thestate space model describing the discrete-time dynamics of the system, � is the backward difference operator and P isthe prediction horizon. ‖ · ‖Q denotes the Q-weighted norm, i.e. ‖x‖Q = √

xTQx. Notice that the objective functionpenalizes deviations from the desired values of input and output as well as excess movement of the control vector.The matrix Q�u is usually termed as the move suppression penalty matrix. When the state space model is linear theoptimization problem reduces to a quadratic program for which efficient algorithms exist for its solution. To this end,let us mention that this is the simplest form of an MPC configuration with no terminal constraints and terminal costs,which are required so as to ensure stability of the closed-loop system. These types of features have not yet been usedin applications of MPC for supply chain management problems.

The significance of the basic idea implicit in the MPC has been recognized a long-time ago in the operations man-agement literature as a tractable scheme for solving stochastic multi-period optimization problems, such as productionplanning and supply chain management, under the term rolling horizon [133–136]. For a review of rolling horizonsin operation management problems and interesting trade-offs between horizon lengths and costs of forecasts, we referthe reader to Sethi and Sorger [137] and Chand et al. [138].

Kapsiotis and Tzafestas [139] were the first to apply MPC to an inventory management problem, for a single inventorysite. They included a penalty term for deviations from an inventory reference trajectory in order to compensate forproduction lead times. Tzafestas et al. [140] considered a generalized production planning problem that includes

3550 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

both production/inventory and marketing decisions. They employed a linear econometric model concerning sales asa function of advertisement effort so as to approximate a nonlinear Vidale–Wolfe process. The dynamics of salesare coupled with an inventory balance equation. The optimal control problem is formulated as an MPC, where thecontrol variables are the advertisement effort and the production levels. The objective function penalizes deviationsfrom desired sales and inventory levels.

Perea-Lopez et al. [141] employed MPC to manage a multi-product, multi-echelon production and distributionnetwork with lead times, allowing no backorders. They formulated the optimal control problem as a large scale mixedinteger linear programming (MILP) problem, due to discontinuous decisions allowed in their model. In their formulationthe demand is considered to be deterministic. They tested their formulation in a quite complex supply chain producingthree products and consisting of three factories, three warehouses, four distribution centers and 10 retailers servicing20 customers. They compared their centralized approach against two decentralized approaches. The first decentralizedapproach optimizes distribution only and uses heuristic rules for production/inventory planning. The second approachoptimizes manufacturing while allowing the distribution network to follow heuristic rules. Through simulations, theyinferred that the centralized approach exhibits superior performance.

Seferlis and Gianellos [142] developed a two-layered hierarchical control scheme, where a decentralized inventorycontrol policy is embedded within an MPC framework. Inventory levels at the storage nodes and backorders at theorder receiving nodes are the state variables for the linear state space model. The control variables are the productquantities transferred through the network permissible routes and the amounts delivered to the customers. Backordersare considered as output variables. Deterministic transportation delays are also included in the model. The cost functionof the MPC consists of four terms, the first two being inventory and transportation costs, the third being a quadraticfunction that penalizes backorders at retailers and the last term being a quadratic move suppression term that penalizesdeviations of decision variables between consecutive time periods. In order to account for demand uncertainty, theyemployed an autoregressive integrated moving average (ARIMA) forecasting model for the prediction of future productdemand variation. Based on historical demand they performed identification of the order and parameters of the ARIMAmodel.

PID controllers were embedded for each inventory node and each product. These local controllers are responsiblefor maintaining the inventory levels close to the pre-specified target levels. Hence, the incoming flows to the inventorynodes are selected as the manipulated variables for the PID controllers. This way a decoupling between inventorylevel maintenance and satisfaction of primary control objectives (e.g. customer satisfaction) is achieved, permittingthe MPC configuration to react faster to disturbances in demand variability and transportation delays. However, tuningof the localized PID controllers requires a time consuming trial-and-error procedure based on simulations. In theirexperiments, assuming that demand is deterministic and performing a step change, they observed an amplification ofset point deviations for upstream nodes (bullwhip). For stochastic demand variation, they noted that the centralizedapproach requires a much larger control horizon to achieve a comparable performance with their two-layered strategy.

Braun et al. [143] developed a linear MPC framework for large scale supply chain problems resulting from thesemiconductor industry. Through experiments, they showed that MPC can handle adequately uncertainty resultingfrom model mismatch (lead times) and demand forecasting errors. Due to the complexity of large scale supply chains,they proposed a decentralized scheme where a model predictive controller is created for each node, i.e productionfacility, warehouse and retailers. Inventory levels are treated as state variables for each node, the manipulated variablesare orders and production rates, and demands are treated as disturbances. The goal of the MPC controller is to keepthe inventory levels as close as possible to the target values while satisfying constraints with respect to productionand transportation capacities. Their simulations showed that using move suppression (i.e. the term in the objectivefunction that penalizes large deviations on control variables between two consecutive time instants), backorders canbe eliminated. It is well known in the MPC community that the move suppression term has the effect of makingthe controller less sensitive to prediction inaccuracies, although usually at the price of degrading set point trackingperformance. Through simulations, Braun et al. [144] and Wang et al. [145] justified further the significance of movesuppression penalties as a means for increased robustness against model mismatch and hedging against inaccuratedemand forecasts.

Wang et al. [146] treated demand as a load disturbance and they considered it as a stochastic signal driven byintegrated white-noise (the discrete-time analog of Brownian motion). They applied a state estimation-based MPC inorder to increase the system performance and robustness with respect to demand variability and erroneous forecasts.Assuming no information on disturbances, they employed a Kalman filter to estimate the state variables, where the

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3551

filter gain is a tuning parameter based on the signal-to-noise ratio. Through simulations they concluded that when thereis a large error between the average of actual demands and the forecast, a larger filter gain can make the controllercompensate for the error sufficiently fast.

Dunbar and Desa [147] applied a recently developed distributed/decentralized implementation of nonlinear MPC[148] to the problem of dynamic supply chain management problem, reminiscent of the classic MIT “Beer Game”[19]. By this implementation, each subsystem is optimized locally for its own policy, and communicates the mostrecent policy to those subsystems to which it is coupled. The supply network consists of three nodes, a retailer, amanufacturer and a supplier. Information flows (i.e., flows moving upstream) are assumed to have no time delays(lead times). On the other hand, material flows (i.e., flows moving downstream) are assumed to have transportationdelays. The proposed continuous-time dynamic model is characterized by three state variables, namely, inventory level,unfulfilled orders and backlog for each node. The control inputs are the order rates for each node. Demand rates andacquisition rates (i.e., number of items per day acquired from the upstream node) are considered as disturbances. Thecontrol objective is to minimize the total cost, which includes avoiding backorders and keeping unfulfilled ordersand inventory levels low. Their model demonstrates bidirectional coupling between nodes, meaning that differentialequation models of each stage depend upon the state and input of other nodes. Hence, cycles of information dependenceare present in the chain. These cycles complicate decentralized/distributed MPC implementations since at each timeperiod coupled stages must estimate states and inputs of one another. To address this issue, the authors assumed thatcoupled nodes receive the previously computed predictions from neighboring nodes prior to each update, and rely onthe remainder of these predictions as the assumed prediction at each update. To bound the discrepancy between actualand assumed predictions, a move suppression term is included in the objective function. Thus, with the decentralizedscheme, an MPC controller is designed for each node, which updates its policy in parallel with the other nodesbased on estimates regarding information for interconnected variables. Through simulations, they concluded that thedecentralized MPC scheme performs better than a nominal feedback control derived in Sterman [19], especially whenaccurate forecasts regarding customer demand exist. However, both approaches exhibit non-zero steady-state error withrespect to unfulfilled demands when a step increase is applied to the customer demand rate. Furthermore, the bullwhipeffect is observed in their simulations.

Based on the model of Lin et al. [67], Lin et al. [149] presented a minimum variance control (MVC) system, wheretwo separate set points are posed: one for the AINV and one for the WIP level. The system is in essence an MPCconfiguration where the objective function to be minimized consists of the deviations of the predicted inventory andWIP levels from the desired set points over two (different in general) prediction horizons and the order changes over acontrol horizon. An ARIMA model is used as a mechanism to forecast customer demands. The system proved superiorto other approaches such as the order-up-to-level policy, PI control and the APVIOBPCS model in maintaining properinventory levels without causing the “bullwhip” effect, whether the customer demand trend is stationary or not.

Yildirim et al. [150] studied a dynamic planning and sourcing problem with service level constraints. Specifically, themanufacturer must decide how much to produce, where to produce, when to produce, how much inventory to carry, etc.,in order to fulfill random customer demands in each period. They formulated the problem as a multi-period stochasticprogramming problem, where service level constraints appear in the form of chance constraints [151]. In order to obtainthe optimal feedback control one should be able to solve the resulting stochastic dynamic program. However, due to thecurse of dimensionality the problem is computationally intractable. Thus, in order to obtain a sub-optimal solution theyformulated the problem as a static deterministic optimization problem. They approximated the service level chanceconstraints with deterministic equivalent constraints by specifying certain minimum cumulative production quantitiesthat depend on the service level requirements [152]. The rolling horizon procedure is applied on-line following the MPCphilosophy, i.e. by solving the resulting mathematical programming problem at each discrete-time instance, applyingonly the first decision and moving to a new state, where the procedure is repeated. The authors compared their approachto certain threshold subcontracting policies yielding similar results.

5. Robust control

Describing uncertainties in a stochastic framework is the standard practice used by the operations research community.For example, in the majority of papers reviewed so far, uncertainties concerning customer demands, machine failuresand lead times were mostly described by probability distributions and stochastic processes. However, in many practicalsituations one may not be able to identify the underlying probability distributions or such a stochastic description may

3552 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

simply not exist. On the other hand, based on historical data or experience one can easily infer bounds on the magnitudeof the uncertain parameters.

Having realized this fact a long-time ago, the control engineering community has developed the necessary theoreticaland algorithmic machinery for this type of problems, the so-called robust control theory [153–155]. In this framework,uncertainties are unknown-but-bounded quantities and constraints dictated by performance specifications and physicallimitations are usually hard, meaning that they must be satisfied for all realizations of the uncertain quantities. In therobust control framework, models can be usually “infected” with two types of uncertainty; exogenous disturbances(e.g. customer demands) and plant-model mismatch, that is, uncertainties due to modeling errors.

In a series of papers, Blanchini and coworkers [156–158] studied the general dynamic production/distribution problemdescribed by the following discrete-time state space model:

x(t + 1) = x(t) + Bu(t) + Ed(t), (26)

where the state x(t) is the vector of inventory levels in the distribution network nodes, the control u(t) is the vectorof resource flows between the nodes and the exogenous disturbance d(t) is the vector of demands. State and controlvectors must satisfy the following constraints:

x(t) ∈ X = {x ∈ Rn: xmin �x�xmax}, (27)

u(t) ∈ U = {u ∈ Rm: umin �u�umax}. (28)

The demands are unknown but belong to

d(t) ∈ D = {d ∈ R�: dmin �d �dmax}. (29)

Blanchini et al. [156] studied the existence of a feedback controller K: X × [0, ∞) and a set Xf ⊆ X such that forall x(0) ∈ Xf and for all d(t) ∈ D we have u(t) = K(x(t), t) ∈ U and x(t) ∈ X for all times. In words, they studiedthe problem of keeping inventory levels inside prescribed bounds for all possible demands by using flows that aresubject to hard bounds. They proved the existence of such a controller if two necessary and sufficient conditions hold,the first one being ED ⊂ −BU and the second one being xmin − min �xmax − max where min = mind∈D Eid andmax =maxd∈D Eid. The first condition involves the controlled admissible flow which must dominate the uncontrolledflow. The second involves the inventory capacities which must be large enough to be able to fulfill current customerdemands. It should be mentioned that the authors did not consider any cost function, that is, they did not seek foran optimal controller. They only examined necessary and sufficient conditions for stabilizability. Their procedure islargely motivated by the work of Bertsekas and Rhodes [159] and Bertsekas [160] target tube reachability.

For the same problem, Blanchini et al. [157] studied the problem of driving the system to the least worst-caseinventory level. Furthermore, they derived a way of calculating the least inventory levels. The least inventory level isactually the steady state to which we want to steer the dynamic production/inventory system. They also provided anupper bound for the time periods required to reach that level. It turns out that the feedback controller is a periodic-review,order-up-to level policy. The order-up-to level is equal to the least inventory level. Finally, they provided a methodbased on linear programming for computing on-line the optimal control strategy. Blanchini et al. [158] extended thework of Blanchini et al. [157] by considering lead times in their model.

Blanchini et al. [161] investigated the same problem, this time in a continuous-time setting. They showed thatexistence of an admissible feedback controller is guaranteed as long as the first of the two sufficient and necessaryconditions of the discrete-time counterpart holds. They also investigated the set point tracking problem, i.e. finding anadmissible feedback controller that steers the inventory levels to target values. They showed that such a controller existsand is discontinuous in the sense that at any time, any controlled process is required to work either to its maximal or to itsminimal intensity. This bang–bang controller has some noteworthy properties, such as decentralization and robustnessagainst failures. Blanchini et al. [162] extended the work of Blanchini et al. [161] by taking into consideration setuptimes.

Blanchini et al. [163] considered the problem of optimally controlling the system (26)–(29) with respect to somefinite-horizon integral cost criterion. The cost function they considered depends only on the inventory levels at thenodes. This is a min–max problem where the control goal is to minimize the worst-case cost over all admissible valuesof customer demands. They showed that the optimal cost of a suitable auxiliary problem with no uncertainties is always

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3553

an upper bound for the original problem. Finally they provided a numerical method for the implementation of theguaranteed cost control.

Boukas et al. [164] considered a FMS where machines are subject to failure and demand is unknown-but-bounded.They cast the system in the framework of Eqs. (18) and (20), but this time demand is not a constant vector. Theyconsidered the objective function that is the supremum of the expected discounted cost over all demand realizations,a problem which is closely related to H∞ control. In terms of dynamic game theory, H∞ optimal control dealswith zero-sum games where the controller can be considered as the minimizing player and the disturbance as themaximizing player. The controller resulting from this minimax approach is certainly more conservative in terms ofperformance. Boukas et al. [164] derived the Hamilton–Jacobi–Isaacs equations for this problem. They proved that thevalue function is convex, hence locally Lipschitz and almost everywhere differentiable. Furthermore, they provided averification theorem which gives sufficient conditions that an optimal feedback controller must satisfy. As an example,they considered the problem of a single machine producing a single part and they derived the optimal controller inclosed form. However, for complex problems derivation of closed form solutions is almost impossible.

Boukas et al. [165] investigated a continuous-time production–inventory problem with deteriorating items, similarto that of Boukas and Liu [118]. The deterioration rate of the products depends on the demand rate which in turn isa function of a continuous-time Markov chain. They cast the model under the framework of systems with Markovianjumps. They considered the problem of minimizing a finite-horizon quadratic production and inventory/shortage cost.In order to solve the optimal control problem, they derived the HJB equation and solved for the optimal feedbacklaw as a function of the partial derivative (with respect to the continuous state) of the value function, assuming thatthe value function is continuously differentiable. Then, they substituted back into the HJB and obtained a first orderpartial differential equation for the value function. Under the standard procedure of guessing an expression for thevalue function, they derived a set of coupled Ricatti equations. They also dealt with the infinite-horizon problem. Inthat case, in order to guarantee existence of the optimal solution more stringent conditions are required, i.e. stochasticstabilizability and stochastic detectability. In their approach, except from the stochasticity due to uncertain demands,they addressed model uncertainty, that is, uncertainty corresponding to modeling errors. They showed how to design acontroller which guarantees stochastic quadratic stability for the closed-loop system and achieves a guaranteed adequatelevel of performance at the same time. They presented a tracking problem with one failure prone machine producingone deteriorating item. By solving the coupled Ricatti equations, they derived the explicit piecewise linear feedbackcontroller. Through simulations they showed that the tracking error converges to a neighborhood of the origin. However,in their example they did not consider uncertainty due to plant-model mismatch.

Boukas et al. [166] studied an inventory–production system with uncertain processing time and delay in control.Demand rate is composed of a constant term plus an unknown time-varying component with finite energy: d(t)=d+w(t)

where∫ ∞

0 w2(�) d� < ∞. The time delay uncertainty was assumed to lie in a measurable domain (� and m known):

� = {�(t): 0��(t)� �, �(t)�m < 1}.The state equations, in the case of multiple machines producing multiple products are

x(t) = Ax(t) + B0u(t − �(t)) − D + B1w(t), x(t) = 0, t �0, (30)

where x(t) is the number of different products in the stock level, u(t) is the vector of the production rates at time t, Dis the vector of the constant demand rates, w(t) is the vector of disturbances of the demand rate at time t and A, B0, B1are constant matrices. They also impose the following constraint ‖u(t)‖2 �a‖x(t)‖2, ∀t for the input.

Their main goal was to render the closed-loop system asymptotically stable and satisfy anH∞ performance criterion.In order to achieve this goal, they designed a memoryless linear state feedback controller based on sufficient Ricatti-likeconditions. They used Schur complements to derive sufficient linear matrix inequality (LMI) conditions [167] for thesatisfaction of input constraints for the feedback controller. LMIs are very popular in robust control, since numerousstability conditions can be stated as LMIs and very efficient algorithms exist for their solution [168].

They also extended their results to the problem of robust H∞ control where the uncertain system under considerationis modeled as follows:

x(t) = [A + �A(t)]x(t) + Bv(t − �(t)) + B1w(t), x(t) = 0, t �0,

z(t) = Cx(t) + Gv(t − �(t)), (31)

3554 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

where the matrix �A(t) is real time varying representing norm-bounded parameter uncertainty. In both cases theparameters of the controller are computed via LMI techniques.

Boukas et al. [169] modeled the inventory problem with deteriorating items as a switched linear system [170]. Theswitching variable is the inventory level. Specifically, the systems dynamics are described by the following piecewiseaffine dynamics:

x(t) = −�ix(t) + u(t) − d(t), i = 1, 2, (32)

where �1 = � (the deteriorating rate) if x(t) > 0 and �2 = 0 if x(t) < 0 (backorders).They considered the problem of rendering the system quadratically stable while keeping the inventory level close to

zero when there are fluctuations in the demand, using H∞ control theory. The demand rate is modeled as in Boukaset al. [166]. Therefore, their goal is to minimize the L2-induced norm from w(t) to x(t). A piecewise affine statefeedback controller was designed based on the solution of a sufficient Lyapunov-like LMI condition.

Laumanns and Lefeber [171] modeled supply networks as directed connected graphs. They considered informationflows and material flows flowing through the arcs of the graph while the vertices represent the facilities of the supply chainnetwork. Transportation delays were modeled by adding auxiliary nodes to the graph. It was shown that the dynamicsfor the entire system can be written as a linear state space model with the customer demands as external unknown-but-bounded disturbances. Eventually, a constrained robust optimal control problem was formulated, where the worst-casecost is minimized. This problem has been recently solved analytically through multi-parametric programming [172].The optimal state feedback law was shown to be piecewise-affine. Laumanns and Lefeber [171] used this formulationto the classical beer distribution game and realized that the optimal control law is the well known order-up-to policy.

6. Approximate dynamic programming

As we have already noted, dynamic programming is a very elegant framework for analyzing optimal control problemsrelating to production/inventory/distribution systems. However, it is mostly used in a theoretical level to prove existenceof solutions and characterize the optimal policy. In order to utilize dynamic programming as a practical tool for dynamicdecision making in supply chain management a way of combating the curse of dimensionality needs to be found. Thisis the main goal of approximate dynamic programming techniques. Starting from the field of artificial intelligence[173], a cornucopia of algorithms based on dynamic programming, simulation and some form of approximation hasbeen suggested in the literature for the solution of large scale discrete-time stochastic optimal control problems.These methods usually combine classical dynamic programming algorithms like value and policy iteration with anapproximation architecture for the value function (critic methods) or the optimal policy (actor methods). The dataneeded for training the models are provided through simulations. Early encouraging results, the most noticeable beingTesauro’s backgammon player [174], have drawn the attention of researchers from operations research and controltheory. The book of Bertsekas and Tsitsiklis [175] describes thoroughly such algorithms and analyzes their performanceboth theoretically and through simulations. However, a lot of experimentation is required with regard to the selectionof the approximation architecture and the tuning parameters of the method. In fact, no algorithm has been proven tobe of general use. Thus, the selection of the appropriate algorithm, approximation architecture and tuning parametersis problem specific and requires a lot of experimentation.

In this section we will present some selected applications of approximate dynamic programming to supply chainmanagement problems. Van Roy et al. [176] studied a multi-retailer/inventory system with random demands andtransport delays. Their goal was to minimize the discounted infinite-horizon storage, shortage and transportation costs.If a customer order cannot be fully satisfied by the retailer, he has the option of requesting a special delivery, that is, adirect delivery from the warehouse to the customer. The problem was formulated as a dynamic programming problemand three test cases were examined: a system with one warehouse and one-retailer having three state variables, a systemwith one warehouse and 10 stores having 33 state variables, and a problem with the same number of warehouses andstores having 46 state variables. The increased number of state variables is due to longer transportation delays. In all thecase studies, the control vector consisted of two order-up-to levels: one for the warehouse and one for the stores. Thusthe policy resulting from their formulation was an “order-up-to” with a threshold value that depends on the currentstate. The authors found that on-line temporal difference learning with feature-based linear approximation of the valuefunction and active exploration performs better than approximate policy iteration and temporal-difference learningcoupled with multi-layer perceptrons. The policy resulting from the proposed neuro-dynamic programming scheme

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3555

cuts off costs about 10% relative to an optimized stationary “order-up-to” policy. However, the selection of features,that is, states that capture the necessary information for the system dynamics are quite judicious and require a lot ofexperimentation.

Patrinos and Sarimveis [177] developed an optimistic variant of policy iteration coupled with least-squares policyevaluation [178]. They used a radial basis function network as an approximation of the value function. The centers ofthe gaussian basis functions were selected so as to span the entire state space. Once this is accomplished, the networkcan combine good approximation properties through the entire state space while retaining its simple linear structurewith respect to the weight parameters. Through simulations, it was shown that the resulting policy performs better thanan optimized “order-up-to” policy.

Powell and coworkers [179–181] developed an interesting approximate dynamic programming framework for largescale dynamic finite-horizon resource allocation problems, such as supply chain management. Their methods combinedynamic programming, mathematical programming, simulation and stochastic approximation [182]. They usuallyemploy linear or separable concave approximations of the value function [183]. Another main characteristic is thereformulation of the dynamic programming equation around the post-decision state. Topaloglu and Powell [181]presented a comparison of their approximate dynamic programming methods with the rolling horizon approach (orMPC). Through simulations they showed that their methods combined with separable concave approximations of thevalue function perform better than rolling horizon. For more information regarding this approach we refer the readerto the forthcoming book of Powell [184].

Bauso and coworkers [185,186] considered a multi-retailer inventory system where each retailer shares a limitedamount of information with other retailers so as to coordinate their decisions and share set-up costs. Thus, their maingoal was to design a consensus protocol [187]. Each retailer chooses a threshold policy with the threshold being thenumber of active retailers. That is, a retailer decides to order only if at least a certain number of retailers are willingto do the same. In order to compute locally the threshold level for each retailer depending on the inventory level andthe expected demand, they developed a distributed neuro-dynamic programming algorithm. The proposed algorithmis a variant of approximate policy iteration with linear function approximation, where the policy evaluation step isperformed using quasi-Monte Carlo simulations and temporal difference learning. They also used active explorationat the initial iterations of the algorithm so as to explore the state space sufficiently and to avoid getting stuck in localminima. The resulting feedback law is of the form:

�(Iki ) ={

Ski − xk

i if ak ��ki ,

0 if ak < �ki ,

(33)

where xki is the inventory level of retailer i, ak is the transmitted information through the consensus protocol regarding

active retailers, �ki ∈ {1, . . . , n} is the threshold of active retailers and Sk

i is the order-up-to level for retailer i at time k.They considered an example with three retailers facing a random demand that follows a Poisson distribution. Through

simulations, they show that the algorithm converges to a Nash equilibrium in six iterations.

7. Conclusions

The aim of this review paper was to present alternative control philosophies that have been applied to the dynamicsupply chain management problem. Representative references were provided that can guide the reader to explorein depth the methodologies of his/her choice. The efforts started in the early 1950s by applying classical controltechniques where the analysis was performed in the frequency domain. More recently, highly sophisticated optimalcontrol methods have been proposed mainly based on the time domain. However, many recent reports state that themajority of companies worldwide still suffer from poor supply chain management. Moreover, undesired phenomena,such as the “bullwhip” effect have not yet been remedied. The applicability of control methodologies in real life supplychain problems is thus, naturally questioned.

It is true that in many methodologies that have been presented in this paper, the assumptions on which they are basedare often not valid in reality. For example, lead times are not fixed and are not known with accuracy, as many modelsassume. Inventory levels should be bounded below by zero and above due to warehouse capacities, but these boundsare not always taken into account. The same happens with the production rates which are limited by the machinerycapacities. Another limitation is that single stage systems are usually studied, assuming production of a single product or

3556 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

aggregated production. In real life systems, various products are produced with different production rates and differentlead times, which, however, share common machinery and storage facilities. Horizontal integration is often representedby considering the supply chain stages in a raw, while interconnections between different level and same level stagesare ignored. Finally, raw material costs which may be variable, labor costs and inventory costs are rarely taken explicitlyinto account.

From the above discussion, it is evident that despite the considerable advances that have occurred throughout the yearsin controlling supply chain systems, there is still plenty of room for further improvements. Elimination of the abovelimitations will lead to new methodologies of more applicability. Therefore, dynamic control of supply chain systemsremains an open and active research area. Among the alternative methodologies that have been presented in this reviewpaper, we would like to draw the attention of the reader to the MPC framework which has become extremely popularin the engineering community, as it proved successful in facing problems similar to the ones mentioned above. Amongother advantages, the MPC framework can easily incorporate bounds on the manipulated and controlled variables andleads to the formulation of computationally tractable optimization problems.

References

[1] Beamon BM. Supply chain design and analysis: models and methods. International Journal of Production Economics 1998;55:281–94.[2] Lee HL, Padmanabhan V, Whang S. The bullwhip effect in supply chains. Sloan Management Review 1997;38(3):93–102.[3] Miragliotta G. Layers and mechanisms: a new taxonomy for the bullwhip effect. International Journal of Production Economics 2006;

104(2):365–81.[4] Geary S, Disney SM, Towill DR. On bullwhip in supply chains—historical review, present practice and expected future impact. International

Journal of Production Economics 2006;101:2–18.[5] Riddalls CE, Bennett S, Tipi NS. Modelling the dynamics of supply chains. International Journal of Systems Science 2000;31(8):969–76.[6] Axsäter S. Control theory concepts in production and inventory control. International Journal of Systems Science 1985;16(2):161–9.[7] Edghill JS, Towill DR. The use of systems dynamics in manufacturing systems. Transactions of the Institute of Measurement and Control

1989;11(4):208–16.[8] Ortega M, Lin L. Control theory applications to the production–inventory problem: a review. International Journal of Production Research

2004;42:2303–22.[9] Bertsekas DP. Dynamic programming and optimal control. Belmont, MA: Athena Scientific; 2000.

[10] Bellman RE. Dynamic programming. New Jersey, NJ: Princeton University Press; 1957.[11] Camacho EF, Bordons C. Model predictive control. London: Springer; 1999.[12] Zhou K, Doyle JC, Glover K. Robust and optimal control. New Jersey, NJ: Prentice Hall; 1995.[13] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming. Belmont, MA: Athena Scientific; 1996.[14] Simon HA. On the application of servomechanism theory in the study of production control. Econometrica 1952;20:247–68.[15] Vassian JH. Application of discrete variable servo theory to inventory control. Operations Research 1955;3:272–82.[16] Forrester JW. Industrial dynamics: a major breakthrough for decision makers. Harvard Business Review 1958;36:37–66.[17] Forrester JW. Industrial dynamics. Cambridge, MA: MIT Press; 1961.[18] Barlas Y, Yasarcan H. Goal setting, evaluation, learning and revision: a dynamic modeling approach. Evaluation and Program Planning

2006;29:79–87.[19] Sterman JD. Business dynamics: systems thinking and modeling for a complex world. New York, NY: McGraw-Hill; 2000.[20] Ansof HI, Slevin DP. An appreciation of industrial dynamics. Management Science 1968;14:91–106.[21] Towill DR. Dynamic analysis of an inventory and order based production control system. International Journal of Production Research

1982;20:671–87.[22] Coyle RG. Management system dynamics. London: Wiley; 1977.[23] Lalwani CS, Disney SM, Towill DR. Controllable, observable and stable state space representations of a generalized order-up-to policy.

International Journal of Production Dynamics 2006;101:172–84.[24] Wikner J, Naim MM, Towill DR. The system simplification approach in understanding the dynamic behaviour of a manufacturing supply

chain. Journal of Systems Engineering 1992;2:167–78.[25] Wikner J. Continuous-time dynamic modeling of variable lead times. International Journal of Production Research 2003;41:2787–98.[26] Burbidge JL. Automated production control with a simulation capability. In: Proceedings of IFIP conference WG 5-7, Copenhagen, 1984.

p. 1–14.[27] Towill DR. Industrial dynamics modeling of supply chains. Logistics Information Management 1996;9:43–56.[28] Towill DR, McCullen P. The impact of agile manufacturing programme on supply chain dynamics. International Journal of Logistics

Management 1999;10(1):83–96.[29] Disney SM, Towill DR. On the bullwhip and inventory variance produced by an ordering policy. Omega 2003;31:157–67.[30] Chen F, Drezner Z, Ryan JK, Simchi-Levi D. Quantifying the bullwhip effect in a simple supply chain: the impact of forecasting, lead times,

and information. Management Science 2000;46:436–43.[31] Agrell PJ, Wikner J. An MCDM framework for dynamic systems. International Journal of Production Economics 1996;45:279–92.[32] Edghill JE, Towill DR. Assessing manufacturing system performance: frequency response revisited. Engineering Costs and Production

Economics 1990;19:319–26.

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3557

[33] John S, Naim MM, Towill DR. Dynamic analysis of a WIP compensated decision support system. International Journal of ManufacturingSystem Design 1994;1:283–97.

[34] Disney SM, Naim MM, Towill DR. Genetic algorithm optimization of a class of inventory control systems. International Journal of ProductionEconomics 2000;68:259–78.

[35] Deziel DP, Elion S. A linear production–inventory control rule. The Production Engineer 1967;43:93–104.[36] Riddalls CE, Bennett S. The stability of supply chains. International Journal of Production Research 2002;40:459–75.[37] Bellman R, Cooke KL. Differential-difference equations. New York, NY: Academic Press; 1963.[38] Warburton RDH, Disney SM, Towill DR, Hodgson JPE. Further insights into “the stability of supply chains”. International Journal of Production

Research 2004;42:639–48.[39] Zhou L, Naim MM, Tang O, Towill DR. Dynamic performance of a hybrid inventory system with a Kanban policy in remanufacturing process.

Omega 2006;34:585–98.[40] Lai CL, Lee WB, Ip WH. A study of system dynamics in just-in-time logistics. Journal of Materials Processing Technology 2003;138:265–9.[41] Dejonckheere J, Disney SM, Lambrecht MR, Towill DR. Measuring and avoiding the bullwhip effect: a control theoretic approach. European

Journal of Operational Research 2003;147:567–90.[42] Lee HL, Padmanabhan V, Whang S. Information distortion in a supply chain: the bullwhip effect. Management Science 1997;43:546–58.[43] Dejonckheere J, Disney SM, Lambrecht MR, Towill DR. The impact of information enrichment on the bullwhip effect in supply chains: a

control engineering perspective. European Journal of Operational Research 2004;153:727–50.[44] Lalwani CS, Disney SM, Towill DR. Controllable, observable and state space representations of a generalized order-up-to policy. International

Journal of Production Economics 2006;101:172–84.[45] Franklin GF, Powell JD, Workman M. Digital control of dynamic systems. Menlo Park, CA: Addison-Wesley; 1998.[46] Disney SM, Towill DR. Eliminating inventory drift in supply chains. International Journal of Production Economics 2005;93–94:331–44.[47] White AS. Management of inventory using control theory. International Journal of Technology Management 1999;17:847–60.[48] Towill DR, Evans GN, Cheema P. Analysis and design of an adaptive minimum reasonable inventory control system. Production Planning &

Control 1997;8:545–57.[49] Evans GN, Naim MM, Towill DR. Application of a simulation methodology to the redesign of a logistical control system. International Journal

of Production Economics 1998;56–57:157–68.[50] Dejonckheere J, Disney SM, Lambrecht MR, Towill DR. Transfer function analysis of forecasting induced bullwhip in supply chains.

International Journal of Production Economics 2002;78:133–44.[51] Grubbström RW, Wikner J. Inventory trigger policies developed in terms of control theory. International Journal of Production Economics

1996;45:397–406.[52] Wiendahl HP, Breithaupt JW. Automatic production control applying control theory. International Journal of Production Economics 2000;63:

33–46.[53] Grubbström RW. A net present value approach to safety stocks in planned production. International Journal of Production Economics

1998;56–57:213–29.[54] Grubbström RW, Molinder A. Safety production plans in MRP-systems using transform methodology. International Journal of Production

Economics 1996;46–47:297–309.[55] Lancaster K. Mathematical economics. New York, NY: Macmillan; 1968.[56] Grubbström RW, Orvin P. Intertemporal generalization of the relationship between material requirements planning and input–output analysis.

International Journal of Production Economics 1992;26:311–8.[57] Grubbström RW, Wang Z. A stochastic model of multi-level/multi-stage capacity constrained production–inventory systems. International

Journal of Production Economics 2003;81–82:483–94.[58] Grubbström RW, Huynh TTT. Multi-level, multi-stage capacity constrained production–inventory systems in discrete-time with non-zero lead

times using MRP theory. International Journal of Production Economics 2006;101:53–62.[59] Grubbström RW, Tang O. An overview of input–output analysis applied to production–inventory systems. Economic Systems Research

2000;12:3–25.[60] Popplewell K, Bonney MC. The application of discrete linear control theory to the analysis and simulation of multi-product, multi-level

production control systems. International Journal of Production Research 1987;25:45–56.[61] Wikner J. Dynamic analysis of a production–inventory model. Kynernetes 2003;34:803–23.[62] Burns JF, Sivazlian BD. Dynamic analysis of multi-echelon supply systems. Computers and Industrial Engineering 1978;2:181–93.[63] Wikner J, Towill DR, Naim M. Smoothing supply chain dynamics. International Journal of Production Economics 1991;22:231–48.[64] Disney SM, Towill DR. A discrete transfer function model to determine the dynamic stability of a vendor managed inventory supply chain.

International Journal of Production Research 2002;40:179–204.[65] Perea E, Grossmann I, Ydstie E, Tahmassebi T. Dynamic modeling and classical control theory for supply chain management. Computers and

Chemical Engineering 2000;24:1143–9.[66] Perea-López E, Grossmann I, Ydstie E, Tahmassebi T. Dynamic modeling and decentralized control of supply chains. Industrial Engineering

Chemistry Research 2001;40:3369–83.[67] Lin PH, Wong DSH, Jang SS, Shieh SS, Chu JZ. Controller design and reduction of bullwhip for a model supply chain system using z-transform

analysis. Journal of Process Control 2004;14:487–99.[68] Arrow KJ, Karlin S, Scarf H. Studies in the mathematical theory of inventory and production. Stanford, CA: Stanford University Press; 1958.[69] Clark A, Scarf H. Optimal policies for a multi-echelon inventory problem. Management Science 1960;6:475–90.[70] Scarf H. The optimality of (s, S) policies for the dynamic inventory problem. In: Proceedings of the first stanford symposium on mathematical

methods in social sciences. Stanford, CA: Stanford University Press; 1960.

3558 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

[71] Iglehart D. Optimality of (s, S) policies in the infinite horizon dynamic inventory problem. Management Science 1963;9:259–67.[72] Hausman WH, Peterson R. Multiproduct production scheduling for style goods with limited capacity, forecast revisions and terminal delivery.

Management Science 1972;18:370–83.[73] Federgruen A, Zipkin P. Computational issues in an infinite-horizon, multi-echelon inventory model. Operations Research 1984;32:818–36.[74] Iglehart D, Karlin S. Optimal policy for dynamic inventory process with nonstationary stochastic demands. In: Arrow K, Karlin S, Scarf H,

editors. Studies in Applied Probability and Management Science. Stanford, CA: Stanford University Press; 1962. [Chapter 8].[75] Song JS, Zipkin P. Inventory control in a fluctuating demand environment. Operations Research 1993;43:351–70.[76] Sethi SP, Cheng F. Optimality of (s, S) policies in inventory models with Markovian demand. Operations Research 1997;45:931–9.[77] Beyer D, Sethi S. Average cost optimality in inventory models with Markovian demands. Journal Optimization Theory Applications 1997;92:

497–526.[78] Bensoussan A, Liu RH, Sethi SP. Optimality of an (s, S) policy with compound poisson and diffusion demands: a QVI approach. SIAM

Journal on Control and Optimization 2006;44:1650–76.[79] Dong L, Lee HL. Optimal policies and approximations for a serial multiechelon inventory system with time-correlated demand. Operations

Research 2003;51:969–80.[80] Heath DC, Jackson PL. Modeling the evolution of demand forecasts with application to safety stock analysis in production/distribution systems.

IEE Transactions 1994;26:17–30.[81] Graves SC, Kletter DB, Hetzel WB. A dynamic model for requirements planning with application to supply chain optimization. Operations

Research 1998;46:S35–49.[82] Gallego G, Ozer O. Integrating replenishment decisions with advance order information. Management Science 2001;47:1344–60.[83] Gallego G, Ozer O. Optimal replenishment policies for multiechelon inventory problems under advance demand information. Manufacturing

& Service Operations Management 2003;5:157–75.[84] Ozer O, Wei W. Inventory control with limited capacity and advance demand information. Operations Research 2004;52:988–1000.[85] Sethi SP, Yan H, Zhang H. Peeling layers of an onion: a periodic review inventory model with multiple delivery modes and forecast updates.

Journal of Optimization Theory and Applications 2001;108:253–81.[86] Bensoussan A, Crouhy M, Proth JM. Mathematical theory of production planning. New York, NY: North-Holland; 1983.[87] Sethi SP, Yan H, Zhang H. Inventory models with fixed costs multiple delivery modes, and forecast updates. Operations Research 2003;51:

321–8.[88] Feng Q, Gallego G, Sethi SP, Yan H, Zhang H. Periodic-review inventory model with three consecutive delivery modes and forecast updates.

Journal of Optimization Theory and Applications 2005;124:137–55.[89] Simchi-Levi D, Zhao Y. The value of information sharing in a two-stage supply chain with production capacity constraints. Naval Research

Logistics 2003;50:888–916.[90] Olsder GJ, Suri R. Time optimal part-routing in a manufacturing system with failure prone machines. Proceedings of the IEEE Conference

on Decision and Control, vol. 1, 1980. p. 722–7.[91] Rishel R. Dynamic programming and minimum principles for systems with jump Markov disturbances. SIAM Journal on Control 1975;13:

338–71.[92] Rishel R. Control of systems with jump Markov disturbances. IEEE Transactions on Automatic Control 1975;20:241–4.[93] Davis MHA. Markov models and optimization. London: Chapman & Hall; 1993.[94] Akella R, Kumar PR. Optimal control of production rate in a failure-prone manufacturing system. IEEE Transactions on Automatic Control

1986;31:116–26.[95] Bielecki T, Kumar PR. Optimality of zero-inventory policies fro unreliable manufacturing systems. Operations Research 1988;36:353–62.[96] Kimemia JG, Gershwin SB. An algorithm for the computer control production in flexible manufacturing systems. IIE Transactions 1983;15:

353–62.[97] Sethi SP, Soner HM, Zhang Q, Jiang J. Turnpike sets and their analysis in stochastic production planning problems. Mathematics of Operations

Research 1992;17:932–50.[98] Fleming WH, Soner HM. Controlled Markov processes and viscosity solutions. New York: Springer; 1993.[99] Presman E, Sethi SP, Zhang Q. Optimal feedback production planning in a stochastic N-machine flowshop. Automatica 1995;31:1325–32.

[100] Presman E, Sethi SP, Suo W. Optimal feedback production planning in a stochastic N-machine flowshop with limited buffers. Automatica1997;33:1899–903.

[101] Sethi SP, Zhou XY. Optimal feedback controls in deterministic dynamic two-machine flowshops. Operations Research Letters 1996;19:225–35.

[102] Gharbi A, Kenne JP. Optimal production control problem in stochastic multiple-product multiple-machine manufacturing systems. IEETransactions 2003;35:941–52.

[103] Boukas EK, Haurie A. Manufacturing flow control and preventive maintenance: a stochastic control approach. IEEE Transactions on AutomaticControl 1990;35:1024–31.

[104] Kushner HJ, Dupuis PG. Numerical methods for stochastic control problems in continuous time. New York, NY: Springer; 1992.[105] Boukas EK, Yang H. Optimal control of manufacturing flow control and preventive maintenance. IEEE Transactions on Automatic Control

1996;41:881–5.[106] Boukas EK, Kenne JP, Zhu Q. Age dependent hedging point policies in manufacturing systems. Proceedings of the American Control

Conference 1995;3:2178–9.[107] Kenne JP, Gharbi A, Boukas EK. Control policy simulation based on machine age in a failure prone one-machine, one-product manufacturing

system. International Journal of Production Research 1997;35:1431–45.[108] Kenne JP, Gharbi A. Experimental design in production and maintenance control of a single machine, single product manufacturing system.

International Journal of Production Research 1999;37:621–37.

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3559

[109] Kenne JP, Gharbi A. Production planning problem in manufacturing systems with general failure and repair time distributions. ProductionPlanning and Control 2000;11:581–8.

[110] Sethi SP, Zhang Q. Hierarchical production and setup scheduling in stochastic manufacturing systems. IEEE Transactions on AutomaticControl 1995;40:924–30.

[111] Yan H, Zhang Q. A numerical method in optimal production and set-up scheduling of stochastic manufacturing systems. IEEE Transactionson Automatic Control 1997;42:1452–5.

[112] Boukas EK, Kenne JP. Maintenance and production control of manufacturing systems with setups. Lectures in Applied Mathematics 1997;33:55–70.

[113] Liberopoulos G, Caramanis M. Numerical investigation of optimal policies for production flow control and set-up scheduling: lessons fromtwo-part-type failure-prone FMSs. International Journal of Production Research 1997;35:2109–33.

[114] Bai SX, Elhafsi M. Scheduling an unreliable manufacturing system with non-resumamble set-ups. Computers & Industrial Engineering1997;32:909–25.

[115] Gharbi A, Kenne JP, Hajji A. Operational level-based policies in production rate control of unreliable manufacturing systems with set-ups.International Journal of Production Research 2006;44:545–67.

[116] Feng Y, Yan H. Optimal production control in a discrete manufacturing system with unreliable machines and random demands. IEEETransactions on Automatic Control 2000;35:2280–96.

[117] Song DP, Sun YX. Optimal service control of a serial production line with unreliable workstations and random demand. Automatica1998;34:1047–60.

[118] Boukas EK, Liu ZK. Manufacturing systems with random breakdowns and deteriorating items. Automatica 2001;37:401–8.[119] Sharifnia A. Production control of manufacturing system with multiple machine state. IEEE Transactions on Automatic Control 1988;33:

620–5.[120] Liberopoulos G, Hu JQ. On the ordering of optimal hedging points in a class of manufacturing flow control models. IEEE Transactions on

Automatic Control 1995;40:282–6.[121] Perkins JR, Srikant R. Hedging policies for failure-prone manufacturing systems: optimality of JIT and bounds on buffer levels. IEEE

Transactions on Automatic Control 1998;43:953–7.[122] Sethi SP, Suo W, Taksar MI, Zhang Q. Optimal production planning in a stochastic manufacturing system with long-run average cost. Journal

of Optimization Theory and Applications 1997;92:161–88.[123] Sethi SP, Suo W, Taksar M, Yan H. Optimal production planning in a multi product stochastic manufacturing system with long-run average

cost. Journal of Discrete Event Dynamic Systems: Theory and Applications 1998;8:37–54.[124] Sethi SP, Zhang H. Average-cost optimal policies for an unreliable flexible multiproduct machine. International Journal of Flexible

Manufacturing Systems 1999;11:147–57.[125] Lehoczky J, Sethi S, Soner HM, Taskar M. An asymptotic analysis of hierarchical control of manufacturing systems under uncertainty.

Mathematics of Operations Research 1991;16:596–608.[126] Kenne JP, Boukas EK. Hierarchical control of production and maintenance rates in manufacturing systems. Journal of Quality in Maintenance

Engineering 2003;9:66–82.[127] Sethi SP, Zhang Q. Hierarchical decision making in stochastic manufacturing systems. Boston, Cambridge, MA: Birkhauser; 1994.[128] Sethi SP, Yan H, Zhang H, Zhang Q. Optimal and hierarchical controls in dynamic stochastic manufacturing systems: a survey. Manufacturing

& Service Operations Management 2002;4:133–70.[129] Samaratunga C, Sethi SP, Zhou XY. Computational evaluation of hierarchical production control policies for stochastic manufacturing systems.

Operations Research 1997;45:258–74.[130] Keerthi SS, Gilbert EG. Optimal, infinite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and

moving-horizon approximations. Journal of Optimization Theory and Application 1998;57:265–93.[131] Morari M, Lee JH. Model predictive control: past, present, and future. Computers and Chemical Engineering 1999;23:667–82.[132] Mayne DQ, Rawlings JB, Rao CV, Scokaert POM. Constrained model predictive control: stability and optimality. Automatica 2000;36:

789–814.[133] Modigliani F, Hohn FE. Production planning over time and the nature of the expectation and planning horizon. Econometrica 1955;23:46–66.[134] Charnes A, Cooper WW, Mellon B. A model for optimizing production by reference to cost surrogates. Econometrica 1955;23:307–23.[135] Johnson SM. Sequential production planning over time at minimum cost. Management Science 1957;3:435–7.[136] Wagner HM, Whitin TM. Dynamic version of the economic lot size model. Management Science 1958;5:89–96.[137] Sethi SP, Sorger G. A theory of rolling horizon decision making. Annals of Operations Research 1991;29:387–416.[138] Chand S, Hsu VN, Sethi S. Forecast, solution, and rolling horizons in operations management problems: a classified bibliography.

Manufacturing and Service Operations Management 2002;4:25–43.[139] Kapsiotis G, Tzafestas S. Decision making for inventory/production planning using model-based predictive control. In: Tzafestas S, Borne P,

Grandinetti L, editors. Parallel and distributed computing in engineering systems. Amsterdam: Elsevier; 1992. p. 551–6.[140] Tzafestas S, Kapsiotis G, Kyriannakis E. Model-based predictive control for generalized production planning problems. Computers in Industry

1997;34:201–10.[141] Perea Lopez E, Ydstie BE, Grossmann I. A model predictive control strategy for supply chain management. Computers & Chemical Engineering

2003;27:1201–18.[142] Seferlis P, Giannelos NF. A two-layered optimization-based control strategy for multi-echelon supply chain networks. Computers & Chemical

Engineering 2004;28:799–809.[143] Braun MW, Rivera DE, Flores ME, Carlyle WM, Kempf KG. A model predictive control framework for robust management of multi-product,

multi-echelon demand networks. Annual Reviews in Control 2003;27:229–45.

3560 H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561

[144] Braun MW, Rivera DE, Carlyle WM, Kempf KG. Application of model predictive control to robust management of multiechelon demandnetworks in semiconductor manufacturing. Simulation 2003;79:139–56.

[145] Wang W, Rivera DE, Kempf KG, Smith KD. A model predictive control strategy for supply chain management in semiconductor manufacturingunder uncertainty. In: Proceedings of the American Control Conference. 2004. p. 4577–82.

[146] Wang W, Rivera DE, Kempf KG. A novel model predictive control algorithm for supply chain management in semiconductor manufacturing.Proceedings of the American control conference, vol. 1, 2005. p. 208–13.

[147] Dunbar WB, Desa S. Distributed model predictive control for dynamic supply chain management. In: Proceedings of the international workshopon assessment and future directions of NMPC, Freudenstadt-Lauterbad, Germany, August, 2005.

[148] Dunbar WB, Murray RM. Distributed receding horizon control with application to multi-vehicle formation stabilization. Automatica2006;42:549–58.

[149] Lin PH, Jang SS, Wong DSH. Predictive control of a decentralized supply chain unit. Industrial Engineering Chemistry Research 2005;44:9120–8.

[150] Yildirim I, Tan B, Karaesmen F. A multiperiod stochastic production planning and sourcing problem with service level constraints. ORSpektrum 2005;27:471–89.

[151] Birge JR, Louveaux F. Introduction to stochastic programming. Springer Series in Operations Research. New York: Springer; 1997.[152] Bitran GR, Haas EA, Matsuo H. Production planning of style goods with high setup costs and forecast revisions. Operations Research

1986;34:226–36.[153] Zhou K, Doyle J, Glover K. Robust and optimal control. Upper Saddle River, NJ: Prentice-Hall; 1995.[154] Basar T, Bernhard P. H-infinity optimal control and related minimax design problems: a dynamic game approach. Boston, MA: Birkhäuser;

1995.[155] Dullerud GE, Paganini F. A course in robust control theory: a convex approach. New York, NY: Springer; 2000.[156] Blanchini F, Rinaldi F, Ukovich W. A network design problem for a distribution system with uncertain demands. SIAM Journal on Optimization

1997;7:560–78.[157] Blanchini F, Rinaldi F, Ukovich W. Least inventory control of multistorage systems with non-stochastic unknown inputs. IEEE Transactions

on Robotics and Automation 1997;13:633–45.[158] Blanchini F, Pesenti R, Rinaldi F, Ukovich W. Feedback control of production-distribution systems with unknown demand and delays. IEEE

Transactions on Robotics and Automation 2000;16:313–7.[159] Bertsekas DP, Rhodes IB. On the minimax reachability of target sets and target tubes. Automatica 1971;7:233–47.[160] Bertsekas DP. Infinite-time reachability of state-space regions by using feedback control. IEEE Transactions on Automatic Control 1972;17:

604–13.[161] Blanchini F, Miani S, Ukovich W. Control of production-distribution systems with unknown inputs and system failures. IEEE Transactions

on Automatic Control 2000;45:1072–81.[162] Blanchini F, Miani S, Pesenti R, Rinaldi F. Stabilization of multi-inventory systems with uncertain demand and setups. IEEE Transactions on

Robotics and Automation 2003;19:103–16.[163] Blanchini F, Miani S, Rinaldi F. Guaranteed cost control for multi-inventory systems with uncertain demand. Automatica 2004;40:213–23.[164] Boukas EK, Yang H, Zhang Q. Minimax production planning in failure-prone manufacturing systems. Journal of Optimization Theory and

Applications 1995;82:269–86.[165] Boukas KE, Shi P, Andijani A. Robust inventory-production control problem with stochastic demand. Optimal Control Applications and

Methods 1999;20:1–20.[166] Boukas EK, Shi P, Agarwal RK. An application of robust control technique to manufacturing systems with uncertain processing time. Optimal

Control Applications and Methods 2000;21:257–68.[167] Boyd S, Ghaoui LE, Feron E, Balakrishnan V. Linear matrix inequalities in system and control theory. Philadelphia, PA: SIAM; 1994.[168] Nesterov Y, Nemirovski A. Interior point polynomial time algorithms. Philadelphia, PA: SIAM; 1994.[169] Boukas EK, Rodrigues L. In: Boukas EK, Malhamé RP, editors. Inventory control of switched production systems: LMI approach, analysis,

control and optimization of complex dynamic systems. Dordrecht, London: Kluwer Academic Publisher; May 2005.[170] Liberzon D, Morse AS. Basic problems in stability and design of switched systems. Control Systems Magazine 1999;19(5):59–70.[171] Laumanns M, Lefeber E. Robust optimal control of material flows in demand-driven supply networks. Physica A 2006;363:24–31.[172] Bemporad A, Borrelli F, Morari M. Min–max control of constrained uncertain discrete-time linear systems. IEEE Transactions on Automatic

Control 2003;48:1600–6.[173] Sutton RS, Barto AG. Reinforcement learning. Cambridge, MA: MIT Press; 1998.[174] Tesauro GJ, Gammon TD. A self-teaching backgammon program, achieves master-level play. Neural Computation 1998;6:215–9.[175] Bertsekas DP, Tsitsiklis JN. Neuro-dynamic programming. Belmont, MA: Athena Scientific; 1996.[176] Van Roy B, Bertsekas DP, Lee Y, Tsitsiklis JN. A neuro-dynamic programming approach to retailer inventory management. Technical report,

Laboratory for Information and Decision Systems. Cambridge, MA: Massachusetts Institute of Technology; 1998.[177] Patrinos P, Sarimveis H. An RBF based neuro-dynamic approach for the control of stochastic dynamic systems. In: Proceedings of the 16th

IFAC world congress, Prague, Czech Republic, 2005.[178] Nedic A, Bertsekas DP. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems:

Theory and Applications 2003;13:79–110.[179] Powell WB, Van Roy B. Approximate dynamic programming for high dimensional resource allocation problems. In: Si J, Barto A, Powell

WB, Wunsch D, editors. Learning and approximate dynamic programming: scaling up to the real world. New York: Wiley; 2004.[180] Powell WB, George A, Bouzaiene-Ayari B, Simao H. Approximate dynamic programming for high dimensional resource allocation problems.

In: Proceedings of the IJCNN, Montreal, August 2005.

H. Sarimveis et al. / Computers & Operations Research 35 (2008) 3530–3561 3561

[181] Topaloglu H, Powell WB. Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems. InformsJournal on Computing 2006;18:31–42.

[182] Kushner HJ, Yin GG. Stochastic approximation algorithms and applications. New York: Springer; 1997.[183] Powell WB, Ruszczynski A, Topaloglu H. Learning algorithms for separable approximations of stochastic optimization problems. Mathematics

of Operations Research 2004;29:814–36.[184] Powell WB. Approximate dynamic programming for operations research. Available for download at 〈http://www.castlelab.princeton.edu/

Papers.html〉; 2006.[185] Bauso D, Giarre L, Pesenti R. Neurodynamic programming for cooperative inventory control. In: 2004 American control conference, June

30–July 2, Boston, MA, USA, 2004. p. 5527–32.[186] Bauso D, Giarre L, Pesenti R. Cooperative inventory control. In: Menini L, Zaccarian L, Abdallah CT, editors. Current trends in nonlinear

systems and control. Basel: Birkhauser; 2005.[187] Olfati Saber R, Murray RM. Consensus protocols for networks of dynamic agents. In: Proceedings of American control conference, vol. 2,

Denver, Colorado, 2003. p. 951–6.