parallel discrete event simulation algorithm for manufacturing supply chains

Parallel Discrete Event Simulation Algorithm for Manufacturing Supply ChainsAuthor(s): R. Roy and R. ArunachalamSource: The Journal of the Operational Research Society, Vol. 55, No. 6 (Jun., 2004), pp. 622-629Published by: Palgrave Macmillan Journals on behalf of the Operational Research SocietyStable URL: http://www.jstor.org/stable/4101966 .Accessed: 23/05/2011 00:07

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unlessyou have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and youmay use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .http://www.jstor.org/action/showPublisher?publisherCode=pal. .

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

Palgrave Macmillan Journals and Operational Research Society are collaborating with JSTOR to digitize,preserve and extend access to The Journal of the Operational Research Society.

http://www.jstor.org

http://www.jstor.org/action/showPublisher?publisherCode=pal

http://www.jstor.org/action/showPublisher?publisherCode=ors

http://www.jstor.org/stable/4101966?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp

http://www.jstor.org/action/showPublisher?publisherCode=pal

Journal of the Operational Research Society (2004) 55, 622-629 C 2004 Operational Research Society Ltd. All rights reserved. 0160-5682/04 $30.00

www.palgrave-journals.com/jors

Parallel discrete event simulation algorithm for

manufacturing supply chains R Roy* and R Arunachalam

University of Warwick, UK

Parallel discrete event simulation (PDES) is concerned with the distributed execution of large-scale system models on multiple processors. It is an enabler in the implementation of the virtual enterprise concept, integrating semi- autonomous models of production cells, factories, or units of a supply chain. The key issue in PDES is to maintain causality relationships between system events, while maximizing parallelism in their execution. Events can be executed conservatively only when it is safe to do so, sacrificing the extent to which potential parallelism of the system can be exploited. Alternatively, they can be processed optimistically without guarantee of correctness, but incurring the overhead of a rollback to an earlier saved state when causality error is detected. The paper proposes a modified optimistic scheme for distributed simulation of constituent models of a supply chain in manufacturing, which exploits the inherent operating characteristics of its domain. Journal of the Operational Research Society (2004) 55, 622-629. doi: 10. 1057/palgrave.jors.2601688

Keywords: simulation; distributed models; manufacturing; supply chain

Introduction

Parallel discrete event simulation (PDES) has received attention in many applications with large, complex systems, for example, telecommunications. However, its use in

manufacturing has been limited.' Early studies (eg, refer-

ence2) of distributed execution used fine-grained decomposition to map system entities (eg, machines, transportation devices) on to different processors. Later work also considered system architectures with loosely coupled sub-

models,3 in which the relative independence of the processes reduces communication overhead and makes them particularly suitable for efficient PDES. Much of the research on

parallel and distributed simulation has for many years concentrated on algorithms for parallel processing, but more recent studies also include work on web-based cooperative model development4 and shared processing,5 decomposition methods for large-scale models based on Discrete Event

Specification (DEVS) formalism,6 and results from the

implementation of parallel processing of a fine-grained virtual factory model.7

Manufacturing supply chains are usually large, complex systems consisting of semi-autonomous cells, factories, etc that are interconnected by material and information flow.

They are asynchronous in nature and, hence, the use of a

global clock for the simulation to proceed in a lock-step manner is inefficient. Modular development and distributed

processing of the models of the constituent units have the

potential to make the modelling process more manageable

and improve execution time significantly, thus making the simulation of such systems more feasible in practice. The main focus of research into PDES has been on algorithms for maintaining causality relationships between system events. Conservative approaches allow the simulation to proceed only up to safe time limits that avoid causality errors, but can limit the extent to which inherent parallelism can be exploited. Optimistic protocols, on the other hand, allow a logical process (LP) to proceed without regard to the future events it will receive, but it rolls back to an earlier saved state when causality error is detected; the procedure can be computationally expensive. The relative efficiency of the two approaches is application dependent.8 This paper proposes an algorithm based on the optimistic protocol that is modified to improve performance by taking advantage of the operating characteristics of supply chain systems.

PDES algorithms

The physical system in PDES is viewed as a number of

interacting physical processes (PPs) and is modelled by constructing a simulator consisting of Logical Processes

(LPs), one for every PP. Interactions between PPs are modelled by corresponding LPs sending and receiving timestamped messages. Simulation proceeds by each LP

processing the events in its input queue in timestamp order. A causality error, however, can occur if an LP finds a

message in its queue with a timestamp less than its own clock value.

In the conservative approach,9"1 a clock is associated with each incoming link of an LP, which is set to the timestamp of

*Correspondence: R Roy, Warwick Manufacturing Group, International Manufacturing Centre, University of Warwick, Coventry CV4 7AL, UK. E-mail: [email protected]

R Roy and R Arunachalam-Parallel discrete event simulation algorithm 623

the message at the top of the queue or, if it is empty, to that of the last received message. Each LP repeatedly selects the link with the smallest clock time; if the associated queue contains a message, it is processed or else the LP blocks

(waits). The mechanism guarantees avoidance of causality errors, but deadlocks can occur (ie, each LP in the cycle waiting for a message). Deadlocks are usually avoided

through the use of null messages, sent by an LP on all its

output links after the processing of each event and that are used to provide lower bounds for timestamps of future

messages that it will send. Empirical evidence suggests that

performance is in many instances affected by a large proportion of null messages." An alternative approach uses a detection and recovery algorithm;12 when a deadlock is detected, it resorts to sequential processing of events to advance simulation time and resolve the deadlock. The basic conservative algorithm is inefficient in exploiting the inherent parallelism of events, particularly in the case of

loosely coupled systems, and mostly rely on look-ahead information to improve performance.'3 A global synchronization function is used to choose, among all LPs, a set of events that are safe to be processed, usually based on the notion of distances between LPs. In practice, a large proportion of time may be wasted in searching for a safe event.14 The use of look-ahead information requires the simulation modeller to be involved in the details of the

synchronization mechanism, and verify that any modifica- tions to the model will not affect the look-ahead properties.

In optimistic approaches, each LP has a single input queue, and the only constraint on its execution is that it must follow the local causality principle. In the commonly employed Time Warp paradigm,15 when a message arrives with a timestamp smaller than the local clock, the LP rolls back to an earlier time; this may result in a cascading series of actions to undo the effects of messages sent by it to other LPs. Antimessages are used to provide the event trail for the cascaded rollbacks.15 Whenever an LP sends a message to another, an antimessage is also created and stored in the corresponding output queue; when it rolls back, all the antimessages up to the point of the rollback are sent to the destination LPs to cancel previously sent messages. A global

virtual time (GVT), the minimum of the virtual clocks of all LPs and the timestamps of all messages in transit, provides a lower bound of the furthest an LP will need to roll back and, hence, the time for which state variables need to be stored.

Optimistic schemes try to exploit as much parallelism as

possible. The drawback can be a significant overhead of

memory management'6 and 'thrashing' behaviour where most of the time is spent in executing incorrect events and

undoing the effects with long, cascaded rollbacks.17 The executive of the protocol is more complex to develop than for a conservative scheme, but the issues of synchronization are more transparent to the modeller; however, selection of a suitable interval for state-saving operations is a problem.'8 Table 1 summarizes the relative merits of the two

approaches. The choice of protocol still remains a problem.8 Much of

recent research has been directed at throttling optimistic behaviour by limiting event computations beyond GVT to a simulation time window.19 Others have proposed limiting speculative execution and rollbacks to a local level, while remote LPs are sent messages only when it is safe to do so; hence, antimessages are not needed.20 Adaptive protocols to combine conservative and optimistic schemes have also been

suggested.21 The ratio of the number of external events scheduled by

one LP on another to the number of internal events scheduled on itself could be regarded as a measure of the

degree of coupling between the LPs in a distributed system, and affects the extent of parallelism present. In the distributed modelling of a manufacturing supply chain, each of its constituent units (cells, factories, etc) are modelled as an LP. The external events (eg, placement of orders) are

relatively small in number compared to that of internal events (eg, start of a machining cycle). The use of a conservative algorithm for such a loosely coupled system will lead to significant blockages due to the infrequent flow of

messages. The occurrence of deadlocks can also be high due to multiple loops in the customer/supplier relationships that exist between the LPs. Optimistic approaches do not suffer from these consequences, but performance is affected by rollbacks and the potential flooding of messages that can

Table 1 Comparison of conservative versus optimistic schemes

Conservative Optimistic

Parallelism Limited by worst-case scenario Not limited

Performance Depends on the quality of 'look-ahead' Can exhibit 'thrashing' behaviour; significant information present in the simulation overhead of memory management

Simulation executive Simple to develop Complex; harder to verify robustness

Model development Complicated; requires the modeller to be aware of More transparent and robust to changes in model, synchronization issues but selection of state-saving interval is a problem

624 Journal of the Operational Research Society Vol. 55, No. 6

follow. Jefferson15 argues that for most applications, each input to an LP (eg of a part) results in a few internal events (processing of the part) but only a single external event (output of the processed part), which would limit antimessages. When an LP receives an order in a manufacturing supply chain, however, it would normally need to place orders for a number of different components from its suppliers. Hence, the assumption does not hold and significant degradation of performance may result.

Proposed algorithm

A modified algorithm is proposed here based on the assumed feature of a manufacturing supply chain that external events (eg shipments) are typically batched together for action. The use of weekly MRP planning buckets is an extreme example in relation to order processing. Even in a lean production environment, the consequences are not as instantaneous or rapid when compared to that of internal events and, as such, at least for the purposes of modelling, the external events could be batched with little or no loss of integrity in the analysis of supply chains. A rollback then needs to occur only when the timestamp of a message received is less than the simulation clock at which the LP began to process the last batch of external messages; any further rollbacks would be wasted.

In Time Warp, the input and output queues are part of the simulator. A modified LP architecture is proposed here with the aim of avoiding wasted rollbacks (Figure 1). It has three functional units: message controller, simulator, and state- saving mechanism. The simulator and the message controller share a client-server relationship. The message controller incorporates the input and output queues and acts as a message server to the simulator, which performs the actual simulation of the PP. At appropriate intervals, the simulator requests messages from the message controller, which

responds by serving messages from the input queue of the LP. When the simulator outputs a message, it is stored in the appropriate output queue and transmitted by the message controller. The state-saving mechanism is used to enable the LP to rollback.

Each message controller has a (queue) clock associated with it that stores the simulation time at which the last request for messages was made by the simulator. When the message controller receives a request for messages, all that have timestamps greater than its own queue clock value (initially set to zero) and less than or equal to the current simulation time of the LP are sent. The message controller's queue clock is then incremented to the value of the simulation clock. Rollbacks occur when an incoming message from another LP has a timestamp less than or equal to its queue clock value, that is, only if the message would have been processed with a previous batch if it had arrived earlier. A sequential list of all previous clock values is maintained to determine the point of rollback, which would be the first value in the list that is greater than or equal to the timestamp of the message that caused the rollback. The procedure is summarized below.

Wait until (P1) request by simulator

send all messages messg in input queue that satisfies tqueueclk[i] <messg.t ? sim.t; where tqueueclk[i] = last entry in

queue clock list tqueueclk, messg.t = timestamp of message, sim.t = clock time of simulator

tqueueclk[i + 1] = sim.t

An alternative technique to antimessages is also proposed using rollback counters, which is the number of times an LP has rolled back since the start. The count is added to every

Request input message Message controller

Input queue Message from input queue Simulator

Output queue Message to output queue

Save state Load state

State saving mechanism

Request rollback

Figure 1 Modified LP architecture.


outgoing message. When an LP rolls back, it is incremented and a control message with the current value is sent on all its

output links. The rollback counts and the control messages are used by the message controller to detect invalid

messages. Two types of messages can be received from another LP: normal message and rollback control message. For a normal message, its timestamp is used to determine the

position in the time-sequenced input queue where it is to be inserted. The queue is checked for any messages from the same LP with a smaller timestamp and a higher rollback count than that of the message received; this would indicate that the LP that sent the message has since performed a rollback and the new (invalid) message is not inserted.

Otherwise, the next step is to check if the simulator needs to roll back. If the timestamp of the message is less than or

equal to the current input clock value, the message should have been received earlier to satisfy a previous request for

messages and a rollback is initiated. The point of rollback is determined by finding the latest in the queue clock list to have a value greater than or equal to the timestamp of the

message. If the incoming message from another LP is, instead, a

rollback control message, it is inserted in the position determined as before based on timestamp values. All

messages from the same LP lying ahead of the inserted control message (ie, with greater than or equal timestamps) are checked for any with a lower rollback count than its own, which would indicate that the message was sent before that LP rolled back; every such (normal or control) message is deleted. Finally, as before, if the timestamp of the control

message is less than or equal to the input queue clock value, a rollback of the simulator is initiated and the point of rollback is similarly determined. The procedure is summarized below.

For every normal message m received from an LP / (P2)

{ determine i such that i =position in the input queue Q at which the message is to be

inserted iffor any j = i-1 ..., 1

message Q[j] is from I and Q[j].rbc> m.rbc, delete m; where rbc - rollback count

else insert m at position i in Q

endif if m.ts clk, issue rollback; where ts = timestamp and clk = input

queue clock value

For every control message cm received from LP 1 (P3)

determine i such that i = position in the input queue Q at which the message is to be

inserted insert cm at position i in Q for all messages Q[j] such that j> i and Q[j] is from I

if Q[j].rbc < cm.rbc, delete Q(j); rbc= rollback count

if cm.ts<, clk, issue a rollback; where ts = timestamp and clk = input queue clock value

} When rollback required as a result of (normal or control)

message m (P4)

{ find the last value of i such that

tqueueclk[ij] <m.ts; where tqueueclk is the queue clock list rollback to t = tqueueclk[i] rbc + +,; increment rollback count rbc of LP by one broadcast rollback control message (t,rbc) on all output links;

t = timestamp of control message }

Proof of correctness

Consider a true representation PS of a physical system composed of N processes that communicate exclusively through message passing. PS is represented by a directed

graph consisting of N vertices {P1, P2, ... ,PN}, each of which

represents a process in PS. A PDES model MS is composed such that for every process Pi, there exists an LP, LPi, in MS that represents the behaviour of that process. If a message- passing link exists between Pi and Pj, a corresponding link exists between LPi and LPj, and message delivery is

guaranteed. PS could be viewed as the equivalent model based on a global synchronization clock.

For a process Pk, (k 1, 2,...,N), in PS, define tk = 0 and

tn(k) Z, the start and end of the simulation period. Further,

7 {to,

2t, t4,...,t (k)} represents the sequence of times at which Pk processes incoming messages and the sequence is

monotonically increasing (the assumption of batch processing of messages, and the times are not necessarily the same for all LPs). Let 1, termed input set, contain all messages that arrive at process Pk during time interval [t 1, t], for

j- 1,...,n and I- { }. Similarly define O4, termed output set, to contain all messages that Pk generates in the time interval

[t~1, t<] and Ok- { }. Let If(j) and Ok(j) define the

message input and output histories of process Pk, that is, the sequence of messages received and transmitted by it until time tj:

ok(j)= Iok + Ok + ...

+ Ijk

A function FI exists for the process such that Ok(j)

F(f(j-1)). The state of process Pk at time t is represented by Sf, which depends on the input received by it; hence, there exists a state transition function Gk such that

S- Gk(Skj-1, j-1, 8....,. Define the similar terms in MS by using lower case

alphabets, that is, for corresponding LP, LPk, define input set I&, output set of, input history ik(j), output history ok(j),

output function fk, state s, and state transition function gk. As it is modelled on an optimistic algorithm, input and


output histories may be incomplete and invalid messages may exist.

Define input history ik(j) of LPk to be correct, that is,

ik(j)-Ik(j), if:

1. All valid messages with timestamp t, to <t ti , are present in the input message stream for LPk, but not necessarily all delivered to it yet. (CI1)

2. A control message from another LP with a lower timestamp value and a higher rollback count than any normal message from it does not exist anywhere in the input message stream of LPk. (CI2)

3. If an invalid message m from another LP exists in ik(j) with timestamp tm and rollback count rm, then a control message c from that LP (as yet undelivered) also exists in the input message stream of LPk, with timestamp t, and rollback count rC such that tc < t, and rc > r,,. (CI3)

Condition CII ensures ik(j) will become complete (message delivery is guaranteed). Conditions CI2 and CI3 ensure that current invalid messages in ik(j) will be deleted through the application of procedure P3.

Define output history ok(j) of LPk to be correct, that is, ok(j)=ok(j), if:

1. All valid messages with timestamp t, t0<t<tj,

are

present in the history. (CO 1) 2. A control message with a lower timestamp value and a

higher rollback count than any normal message does not exist. (C02)

3. If an invalid message m to another LP exists in ok(j) with timestamp tm and rollback count r,,, then a control message c to it also exists in ok(j) with timestamp tc and rollback count rc such that tc~ ti, and

rc> rm. (CO3)

The three conditions are necessary to ensure that the messages it sends out to another LP do not violate conditions CII, CI2, and CI3, respectively, for correct input history of that LP.

A state s4 of LPk is defined recursively as valid (s=-S?)

as follows:

* So

is valid (ie, the initial state of an LP is valid) * is valid if

skm

gk(S n1, k-1), m

1,....j,

where k is -- i- w Im 1 complete and contains no invalid messages and sm-1 is a

valid state.

This implies that the LP has no outstanding or invalid messages with timestamp less than or equal to tQ 1. Hence, it cannot roll back to a state earlier than s4 and will produce a valid output

oi--Ok(j). If, however, i/ is incomplete or

contains invalid messages, the LP will next transform to a state that is invalid.

The proof is structured as follows. It is first proved that if an LP is at some valid state 4 then, given correct input history ik(j), it will transform to the next valid state sf41. This property is then used to prove that any LP, given

correct input history ik(j), will generate correct output history ok(j). Finally, using these results, the correctness of MS is proved.

Theorem 1 Assuming correct input history ik(j), if an LP is at some valid state s, then at some time in the simulation the LP can be guaranteed to be at the next valid state s4+ 1.

Proof If i is complete and contains no invalid messages, the LP will process i* and transform to the next valid state

s4+ 1 by the definition of valid states (note: if ij contains any

message with timestamp greater than t?, it will not be processed (from Pl)). However, if ik is incomplete and/or incorrect, the LP will transform to some invalid state.

Let us first consider the case of if being incomplete, that is, one or more messages in 1 (true process input set) are not included in 4. Since by the definition of correct history, all valid messages are in the input stream, at some point in the future if will be complete (message delivery is guaranteed). From the definition of valid state, the input history up to time tjl is complete; hence, the last message to complete i/ will cause the LP to roll back to time t (from P2 and P4) and state s4. The LP now processes if and transforms to state k akll in 1

s+l I since all messages with timestamps in the interval [t-1, t] are now in ?j. (NB: the LP before the roll back will have been in some simulation time t, ti• < t < Z.) . -k

Next, consider the case where i contains invalid messages. From the definition of correct input history, a control message with timestamp less than an invalid message and a higher rollback count must exist in the input stream which, when it arrives, will delete it and make the LP roll back to time t4 and state s (from P3 and P4); note that from the definition of valid states, the LP cannot roll back to a state earlier than s4. Combining the two cases, the last (normal or control) message to complete

i- and make it void of invalid

messages will cause the LP to roll back to s, which now

processes i-

and transforms to state sv+

1.

Theorem 2 If an LP receives correct input during the course

of a simulation, it can be guaranteed to generate correct output.

Proof For the output generated by an LP to be correct, it must not violate conditions CO1, C02, and C03.

Condition CO1 From Theorem 1, given correct input the LP will go through all valid states and, as valid states generate valid output, Condition CO1 is satisfied. Condition CO2 There are two ways an LP can send a control message with timestamp less than that of a valid message: (a) the LP sends the control message first and then sends the normal message; (b) the LP sends the normal message first and at some point in the simulation rolls back to a time less than its timestamp and transmits the control message.


Consider the first case. In the proposed algorithm, the rollback count of an LP is initialised to zero, and then the

only operation performed on it is to increment its value every time the LP rolls back. Hence, the rollback count cannot decrease during the course of the simulation and, if a control message is sent before the normal message, it cannot have a greater rollback count. For the second case, let us assume that the LP is at a valid state s at time t. As s is valid, all inputs in the time interval 0< t <

t# have arrived.

Hence, the LP cannot roll back to a time earlier than tk and a control message with timestamp less than tk cannot be sent. Taking the two cases together, C02 is satisfied, that is, a control message with a lower timestamp value and a higher rollback count than any normal message does not exist in the output history.

Condition C03 Let us assume that the LP outputs an invalid message with timestamp tm and rollback count r from an invalid state. Then some i, where t < tm,

must exist that is not complete. As correct input history is assumed (and message delivery is guaranteed), at some point

i. will be

complete and the LP will roll back to t. This will cause the rollback count to be incremented and the transmission of a control message with timestamp tk and rollback count r + 1 (from P4), thus satisfying C03.

Since all three conditions are satisfied, we can conclude that given correct input history, an LP is guaranteed to generate valid output. EO

Theorems 1 and 2 show that given correct input, any LP in MS will produce (in time) correct output that, in turn, are (correct) inputs to all the other LPs with which it has an output link. Next we consider the behaviour of MS as a whole. Let us define a simulation run in the interval [0, Z] to be described by a set R {[LP1], [LP2],...,[LPN]}, where

[LPk] denotes the output history of LPk for the entire simulation run, that is, ok(n(k)) at time tn(k) Z. Since messages are processed at discrete intervals, states of LPs are defined at these intervals (state transition points) only, that is, at 7 {t, 4, t2,...,tn(k)} for LPk, k 1,...,N. If such message processing times for all the LPs are the same, PS and any correct implementation of MS will also go through M+ 1 = (n + 1) valid states at these times. If they are not all the same, the state transition points will

be defined by T {t0, t,...,tM} {TUT2U.. .UTN}. Let Re be the output history at the eth state transition point at time te.

Theorem 3 Every simulation run will produce correct output histories, Re, 0<e

i M, at the state transition points {to, tl,...,tM} in [0, Z].

Proof Proof is by induction on e. For e= 0, Re= { } and, hence the theorem holds trivially. Let us assume that for some e> 0, Re-1 is correct and, hence, the output histories of

all LPs at time te,_ are correct. Consider LPk, an LP in MS. Since the input history of LPk is made up of the output histories of all LPs with which it has an input link, it will be correct at time te-1 and, hence, LPk will be in a valid state

k< k s I1 at time tk 1, where te-2 <-1 t

te-1. Then from Theorems 1 and 2, it will reach the valid state sk with a correct output history ok(j) at time tk, where te~ < ti te+ 1 Continuing the argument for all LPs, we can conclude that the simulation run will reach the next valid state with a correct output history Re. By induction, the simulation will produce a correct output history R at time Z.

Performance of the algorithm

The performance of a PDES algorithm in any application is dependent on a number of factors, for example, the extent of parallelism that is present and the 'thrashing' behaviour that may follow. Formulation of general rules on a relative performance of different algorithms is hard, even for the debate on conservative versus optimistic protocols, and any objective assessment of a particular technique has to deal with substantive issues.8,'21 The objective of the paper has been to present an alternative to the standard Time Warp protocol, which addresses some weaknesses of the latter with benefits that could be significant, particularly in applications with the potential for high levels of rollbacks, but a true assessment will require further research.

The algorithm has the potential to reduce significantly the number of rollbacks. However, it uses an interrogative approach of batch delivery of messages when an LP requests it, rather than an imperative scheme of delivery/processing based simply on timestamp. The message controller polls for messages ('wait until...'), which requires additional compu- tation but this is expected to be small compared to the savings in expensive operations for the much larger number of rollbacks that will often result from a standard Time Warp implementation. Bagrodia and Liao22 drew the same conclusion in their investigation of wasted rollbacks in the context of priority servers, and a similar interrogative approach was implemented in Maisie, a distributed simulation language.

The algorithm requires an LP to send only one message per output link to delete all erroneous messages instead of one per message that may be affected by the rollback, and thus has the potential to reduce greatly the message overhead. Cascaded rollbacks may still occur, but again at each stage only one (control) message needs to be sent to each output link. The maximum number of messages sent by an LP network to undo the effects of a rollback is equal to the largest distance that can be traversed on a graph without traversing a vertex more than once, where the vertices represent the LPs and the arcs the message passing links.

The algorithm also provides the system developer with a well-defined and efficient mechanism to implement state-


saving operations that are transparent to the modeller. Saving the state of the simulation at every clock update is computationally expensive. In practice, periodic saving or check pointing is usually employed in the implementation of Time Warp. Infrequent saving of state, however, could mean excessive rollback distance and inefficient execution, while too frequent an interval would undo the benefits of check pointing. The frequency is thus an important parameter that determines performance and it is difficult to select.18 In the modified algorithm, the simulator always rolls back to a point in the queue clock list and, hence, it is sufficient to perform state-saving operations only when this is updated.

Conclusion

A manufacturing supply chain is typically a loosely coupled system, which makes it particularly suitable for parallel processing.23 The use of a conservative algorithm for such a system is expected to lead to significant blockages, but there are also concerns related to performance in applying optimistic schemes to large models. However, rationalization of the model by introducing batch processing of messages at discrete intervals allows a clear definition of rollback points and the standard version of the Time Warp algorithm to be modified to address three key issues-reduce rollbacks, control the extent of message passing required to undo the effect of invalid operations, and make the state-saving mechanism more efficient. The proposed algorithm does this without, in the terminology of Reynolds,24 affecting the 'aggressiveness' and 'risk' inherent in Time Warp.

Most physical systems exhibit a delay in processing messages. Hence, the requirement for their batching is in itself not very restrictive. However, the efficiency of the algorithm and, hence the benefit, will depend on the batch frequency. What is an appropriate interval is clearly dependent on the system environment and will be a trade- off decision between computing performance and model integrity, but the processing of messages only a few times in a simulated day would be typically considered sufficient for manufacturing supply chains. The times when this is done could be parameters defined by the modellers and, since they need not be the same for each LP, set at the local level, which is particularly useful for modelling of supply chains operating in different time zones. This is the only decision related to the algorithm that needs to be taken by the modeller. The transparency of the algorithm is an important feature since the level of sophistication needed on the part of the modeller to exploit the technology successfully has often been cited as a reason for the limited use of PDES by the general simulation community.21'25

Batch processing, however, does not allow for immediate handling of urgent orders. Since even in such cases there is a processing delay, any loss of model integrity may not be significant if the batching interval is not too infrequent, but a

useful enhancement would be a mechanism for the message controller to interrupt the LP to handle such orders. The interrogative approach used in the algorithm should help to facilitate this.

References

1 Peng C and Chen FF (1996). Parallel discrete event simulation of manufacturing systems: a technology survey. Comput Ind Eng 31: 327-330.

2 Hon KKB and Ismail HS (1991). Application of transputers for simulation of manufacturing systems-a preliminary study. Proc Inst Mech Eng-Part B, J Eng Manuf 205: 19-23.

3 Fujii S, Tsunoda H, Ogita A and Kidani Y (1994). Distributed simulation model for Computer Integrated Manufacturing. In: Tew JD and Manivannan S (eds). Proceedings of the 1994 Winter Simulation Conference. IEEE: USA, pp 946-953.

4 Ferscha A and Richter M (1997). Java based co-operative distributed simulation. In: Andrad6ttir S, Healy KJ, Withers DH and Nelson BL (eds). Proceedings of the 1997 Winter Simulation Conference. IEEE: USA, pp 381-388.

5 Pidd M and Cassel RA (2000). Using Java to develop discrete event simulations. J Opl Res Soc 51: 405-412.

6 Lutz R (1998). High Level Architecture object model development and supporting tools. Simulation 71: 401-409.

7 Gan B-P and Turner SJ (2000). An asynchronous protocol for virtual factory simulation on shared memory multi-processor systems. J Opl Res Soc 51: 413-422.

8 Ferscha A (1995). Parallel and distributed simulation of discrete event systems. In: Zomaya AYH (ed) Parallel and Distributed Computing Handbook. McGraw-Hill: New York, pp 1003-1041.

9 Bryant RE (1977). Simulation of Packet Communications Architecture Computer Systems, MIT-LCS-TR-188, Massachu- setts Institute of Technology.

10 Chandi KM and Misra J (1979). Distributed simulation: a case study in design and verification of distributed programs. IEEE Trans Software Eng SE-5: 440-452.

11 Seethalaksmi M (1978). Performance Analysis of Distributed Simulation, MS Thesis, University of Texas, Austin.

12 Chandi KM and Misra J (1981). Asynchronous distributed simulation via a sequence of parallel computations. Commun ACM 24: 198-205.

13 Fujimoto R (1990). Parallel discrete event simulation. Commun ACM 33: 30-53.

14 Lubachevsky BD (1989). Efficient distributed event-driven simulations of multiple-loop networks. Commun ACM 32: 111-123.

15 Jefferson DR (1985). Virtual time. ACM Trans Program Lang Sys 7: 404-425.

16 Das SR and Fujimoto RM (1997). An empirical evaluation of performance trade-offs in Time Warp. IEEE Trans Parallel Distrib Sys 8: 210-224.

17 Lubachevsky BD, Shwartz A and Weiss A (1991). An analysis of rollback-based simulation. ACM Trans Model Comp Simul 1: 154-193.

18 Fleischmann J and Wilsey PA (1995). Comparative analysis of periodic state saving techniques in Time Warp simulators. In: Corporate IEEE (ed). Proceedings of the 9th Workshop on Parallel and Distributed Simulation. IEEE: USA, pp 50-58.

19 Turner SJ and Xu MQ (1992). Performance evaluation of the bounded Time Warp algorithm. In: Abrams MA (ed). Proceed- ings of the 6th Workshop on Parallel and Distributed Simulation. Society for Computer Simulation: USA, pp 117-126.


20 Dickens PM and Reynolds Jr PF (1990). SRADS with local rollback. In: Nicol D and Fujimoto R (eds). Proceedings of the SCS Multiconference on Distributed Simulation. Society for Computer Simulation: USA, pp 161-164.

21 Das SR (2000). Adaptive protocols for parallel discrete event simulation. J Opl Res Soc 51: 385-394.

22 Bagrodia RL and Liao WT (1990). Maisie: a language and optimising environment for distributed simulation. In: Nicol D and Fujimoto R (eds). Proceedings of the SCS Multiconference on Distributed Simulation. Society for Computer Simulation: USA, pp 205-210.

23 Arunachalam R (2000). An agent based computational framework for supply chain simulation. PhD thesis, University of Warwick.

24 Reynolds Jr PF (1988). A spectrum of options for parallel simulation. In: Abrams MA (ed). Proceedings of the 1988 Winter Simulation Conference. IEEE: USA, pp 325-332.

25 Fujimoto RM (1993). Parallel discrete event simulation: will the field survive? ORSA J Comp 5: 213-230.

Received February 2003; accepted November 2003 after one revision

parallel discrete event simulation algorithm for manufacturing supply chains

Documents

conditions of use

jstor archive

noncommercial use

simulation distributed

jstor transmission

system models

youmay use content

publisher contact information