qil rillr. · 2011-10-11 · the problem, let x be a feasible solution and let b = {(i,j)lxij is...
TRANSCRIPT
QIl riLLr. G;UBY
N SOLUTION OF LARGE DENSE TRANSPORTATION
PROBLEMS USING A PARALLEL PRIMAL ALGORITHM
by
Donald L. Miller*Joseph F. Pekny**
0 Gerald L. Thompson***
June 1988
Carnegie Mellon UniversityPITTSBURGH, PENNSYLVANIA 15213
GRADUATE SCHOOL OF INDUSTRIAL ADMINISTRATIONWIW.AM LARIMER MELLON. FOUNDER
DTIC"JUL 13 1988
F..
88 7 1.2 073
• .. =.'-r D
W.P. No. 87-88-42
€I
Management Science Research Report No. 546
SOLUTION OF LARGE DENSE TRANSPORTATION
PROBLEMS USING A PARALLEL PRIMAL ALGORITHM
by
Donald L. Miller*Joseph F. Pekny
*Gerald L. Thompson**
June 1988
Central Research and Development Department,
E. I. duPont de Nenours and Company, Inc.
Department of Chemical EngineeringCarnegie Mellon University
Graduate School of Industrial Administration
Carnegie Mellon University
This report was prepared in part as part of the activities of the ManagementSciences Resarch Group, Carnegie Mellon University, under Contract No. N00014-85-K-0139 NR 047-048 with the Office of Naval Research. Reproduction in wholeor in part is permitted for any purpose of the U.S. Government.
Management Sciences Research GroupGraduate School of Industrial Administration
Carnegie Mellon University ECTEPittsburgh, PA 15213 EL
JUL 1I Wo bo pa
Abstract
SOLUTION OF LARGE DENSE
TRANSPORTATION PROBLEMS USING A
PARALLEL PRIMAL ALGORITHM
by
Donald L. Miller, Joseph F. Pekny, and Gerald L. Thompson
IIWe implementet version of the primal transportation code on a 14
processor BBN Butterfly computer and solved a variety of large, fully dense,
randomly generated transportation and assignment problems ranging in sizes up
to m = n = 3000. We fad--that the search phase of primal transportation
algorthm was well suited for implementation on the Butterfly, but the pivot
phase could not be parallelized. Computational experience is presented
showing that speedup factors for a parallel over a sequential computer
of approximately 70 percent of the theoretical maximum were obtained. With
the parallel code the empirical difficulty of solving an nxn transportation
problem was proportional to no where a varied between 2.0 and 2.2. -
Accession For
NTIS GRA&IDTIC TABUnannouncedJ tificatit To
COPYBy 'NSPECTED
Distribution/
Availability Codes
AvallF.and/oiD4st Special
SOLUTION OF LARGE DENSE
TRANSPORTATION PROBLEMS USING A
PARALLEL PRIMAL ALGORITHM
by
Donald L. Miller, Joseph F. Pekny, and Gerald L. Thompson
1. INTRODUCTION
Among the most commonly solved and applied linear programs are
transportation problems, and their close relations, assignment and network
problems. They have applications of their own as well as being relaxations to
others such as travelling salesman problems.
In the decade of the 1960's, the belief by most researchers was that dual
(Kuhn (15]) and/or primal-dual (Ford-Fulkerson (9]) methods provided the most
efficient algorithms for this class of problems. This belief was based on
limited computation performed on very small problems by hand or by first
generation computers.
In the early 1970's two groups of researchers, Srinivasan and Thompson
(21,22] and Glover, Karney, and Klingman [11], wrote codes using the primal
transportation algorithm, also called the stepping stone method or MODI
method, see Charnes and Cooper [4] and Dantzig [6]. They used some newly
invented data structures, as well as some provided in the computer science
literature, to greatly improve the efficiency of the resulting primal codes,
and concluded that primal transportation codes ran 100 to 200 times as fast as
ordinary linear programming simplex codes and about 50 times as fast as primal
dual methods on the same problems. These conclusions are summarized in
Charnes, Karney, Klingman, Stutz and Glover [5]. Later Bradley, Brown and
Graves (3] came to the same conclusion for network problems. Because of the
1
computer memory limitations of that time, fully dense transportation problems
were solved for sizes up to about m = n = 200. For larger dimensions, only
sparse trnsportation problem having a relatively few arcs (approximately
lO) were considered. Further improvements in data structures used by the
primal codes appeared in Barr, Glover and Klingman [1].
In the late 1970's and early 1980's there was a resurgance of interest in
dual codes. Auction and bidding dual codes were proposed by Bertsekas (21,
and Thompson (24]. Srinivasan and Thompson [23] proposed cost operator
algorithmm. A new class of dual codes, using shortest ausgmeting paths, based
on the paper of Tomizawa (25] and improved by Dorhout L8], were developed.
Hung and Ram [13], Derigs (7], and Martello and Toth [16] provided
computational studies that compared their own shortest augmenting path codes
with those of the other ideas. McGinnes (17] programed both primal and dual
codes and made computational studies and included that dual codes were
superior. See also Hatch (12] and Glover and Klingman (10].
All of the computational experience discussed so far was obtained using
sequential computers to solve small dense, or larger sparse problems. In the
last five years a number of hardware companies have built computers which have
a parallel architecture, that is, which consist of several small independent
computers (processors) that can be programmed to work simultaneously on
different parts of the same problem.
The purpose of the present paper is to discuss the implementation of a
primal transportation code on a specific parallel machine, a 14 processor BBN
Butterfly computer, and to give computational experience obtained from using
it to solve a variety of large, fully dense, randomly generated transportation
and assignment problems ranging in sizes up to a = n = 3000. Our conclusions
are:
2
(a) The primal transportation algorithm is well suited forimplementation on the Butterfly computer because of the way thememory is distributed among the processors which makes the searchstep efficient to parallelize.
(b) The pivot step of the primal code is not amenable toparallelization.
(c) The search for pivot candidates becomes the dominant activity of theprimal algorithm as problem size increases.
(d) Speedup increases as the problem size increases.
(e) Parallel architecture permits the use of coding strategies that aredifficult or not profitable to implement on sequential computers.
(f) The empirical difficulty of solving an nxn transportation orassignment problem using the parallel code is proportional to nawhere a varies between 2.0 and 2.2.
2. THE PRIMAL ALGORITHM
The transportation problem is to ship at minimum cost a homogeneous good
from a set of m warehouses to a set of n markets. Let x ij be the amount of
the good shipped from warehouse i to market j, let c be the unit cost
of this shipment, let a be the availability (supply) of the good at
warehouse i and let b be the amount of the good required (demanded) at market
j. Assume E a I E jb j so that there is a feasible shipping pattern. The
mathematical statement of the transportation problem then is:
(a nMinimize E c (1)
nSubject to jE, x =J = a, for il,...,m (2)
m
E x = b for j=l,...,n (3)
x j 2_ 0 for all i and j (4)
3
Letting u and v be the dual variables associated with (2) and (3)t j
respectively, the dual problem can be stated as:
{ m n bjMaximize E u a + vb (5)
fli i J= J
Subject to
u + v < c for all i and j (6)I - ii
The following facts are well known concerning these two problems:
(1) The problem has the natural integer property: that is if a andI
b are positive integers for all i and j, then every feasibleJ
basic solution satisfying conditions (2), (3), and (4) gives x J
equal to a nonnegative integer.
(2) In any basic solution, at most m+n-l values of x1I are positive
and exactly m+n-l variables x,, are basic..
(3) Let V = fR ,...,R , CI,...,C } be the set of rows and columns of
the problem, let x be a feasible solution and let B =
{(i,j)lxij is basic); then the graph G = (V,B} is a connected
acyclic graph called the basis tree.
(4) There are well known procedures such as northwest corner, VAM,
matrix minimum, modified row minimum, etc. (see (22]) for finding an
initial basic primal feasible solution to (2), (3), and (4).
(5) Given a primal feasible basis tree, any node (row or column) can be
denoted as the root node and the tree can be rehung so that the
root node is at the top.
(6) The nodes of degree one in a basis tree are called pendant nodes.
Starting from each pendant node and working upward to the root node
it is possible to determine the shipping mounts x J. Similarly,
4
starting at the root node and working downwards it is possible to
determine the optimal dual variables u and v associated with
nodes R and C1, respectively.
(7) If k is any (row or column) node, not the root node r, of a basis
tree, then there is a unique path in the tree from k to r. The
first encountered node on this path is the predecessor p(k) of k,
and the number of arcs on the path is the distance d(k) of k from r.
Note that r has no predecessor and we define d(r) = 0.
(8) If x j is basic then (i,j) is an arc in the basis tree so that
either (a) p(RL) = Cj or (b) p(C ) = R . In case (a) we let
x(R1) = xj be the shipping amount and in case (b) we let x(C.) =I.3
xij
(9) If h and k are distinct nodes of the basis tree and k is on
the path between h and the root node r then node h is a
descendent of k. The basis tree is said to be stored in a recursive
list if it has the following property: if node h is stored after
node k on the list and h is not a descendent of k, then all
descendents of k are stored on the list between k and h. The
descendents of a node in a recursive list can be searched by calling
a recursive subroutine.
(10) The dual variables are stored as node lists where u(RH) u and
v(C ) = v . (If k is any node then exactly one of u(k) or v(k)
is defined depending on whether k is a row or column node.)
Figure 6 gives the flow diagram of the primal transportation code. It
was implemented by using the five node functions p(k), d(k), x(k), u(k),
v(k), and by storing the basis tree in a recursive list. The sequential
implementation of the primal code is discussed in more detail in [11], [14],
and (21].
5
A. GENERATE PROBLEM
Given m and n generatethe initial data aI, bj and
c ij for the problem at
random within the statedranges for each coefficient.
B. BUILD INITIAL TREE
Use a starting method tofind a feasible x . Calcu-ijlate the corresponding dualsolution u and v and the
other lists used to representthe initial tree.
C. SEARCH F. OUTPUT
Search by row for a pair Stop if no violation(i,j) such that (6) is found. Print optimalviolated, solution
Add arc (i,j) to the basistree, find outgoing cell and
change the primal solution onthe cycle.
E. UPDATE
Rehang the tree, find thenew duals, update all thetree lists.
Figure 1. Flow diagram for the sequential primal transportation algorithm.
6j
3. THE PARALLEL PRIMAL ALGORITHM
The efficient implementation of any algorithm on a parallel computer
necessarily depends on the exact architecture of that computer as well as the
characteristics of the algorithm. We discuss the implementation of the primal
algorithm of the previous section on a BBN Butterfly Plus computer. The
Butterfly Plus is a tightly coupled heterogeneous, shared memory
multiprocessor consisting of several Motorola 68020/68881 processors each
having 4 megabytes of its own local memory, and accessing the memories of the
other processors through a packet switched network. Our code was implemented
on a 14 processor machine. Discussions of the implementation on the same
parallel computer of a shortest augmenting path algorithm for assignment and p
travelling salesman problems can be found in Miller and Pekny [181 and Pekny
and Miller [19].
Because the Butterfly computer has its memory distributed over the 14
processors it was necessary to distribute the mxn cost matrix C so that
each processor stored several contiguous rows of C in its local memory. ,%S.
From this fact it is obvious from Figure 1 that steps A and C can readily be
done in parallel. The time to generate the protlem data, step A, was not%
counted in the computation time so will not be discussed further. The search %
subroutine, step C, was performed in parallel and was, in fact, the only major
subroutine that we were able to perform in parallel in this version of the
code. "
Subroutine B, build initial tree, was carried out sequentially because
(a) it was called only once (b) requires only 5 to 10 percent of the total
computation time. Computational experiments with parallel starting tree
bases will be reported on elsewhere.
7
A. GENERATE PROBLEM
B. BUILD INITIAL TREE
C. SEARCH
Each of the p processors searches 2the next raw it owns for the most .negative reduced cost; if one is
found it is put in the problemqueue.
TFG. SYNCHRONIZE D. SYNCHRONIZE
After finishing pivoting After finishing row Iprocessors wait until searches, processorsall others finish, wait until all
others finish.
F. PIVOT INo "pi~~vot qee ,
Each processor carries E. Is pivoqeout each of the pivots list empty?on its own copy of thebasis tree.
Yes
I. STOP --Yes H. Have all rows in No
Print optimal !the entire matrixHIsolution. been searched with-
out finding a pivot? ,_
Figure 2. Flow diagram for the parallel primal transportation algorithm.
*A i.
We now describe the parallel primal transportation algorithm shown in
Figure 3 on the assumption that p (>1) processors are available when the
problem is being solved. Each processor stores the rows it owns in a
wrap-around list so that when it reaches the end of its list it goes back to
the beginning. It maintains a pointer to the current row that is next to be
searched; the pointer initially points to the first row it owns.
After the initial tree is built each processor obtains its own copy of J
the lists defining the initial tree. Then subroutine C is invoked, and each
processor searches its current row for the entry having the most negative
reduced cost. If such a most negative reduced cost is found by a processor
it writes the row and column in which it was found and its reduced cost in the
pivot queue list; then the processor updates its current row pointers and goes
to synchronization step D where it waits until all processors have finished
their searches. This synchronization step is necessary in order that the
pivot queue list is completely defined before pivot subroutine F is called.
Control then passes to step E in which the question is asked whether the
pivot queue list is empty. If the answer is yes control passes to step H in
which the question is asked whether all of the rows in the entire matrix have
been searched without finding a pivot. If the answer is yes, control is
passed to step I and the optimal solution is printed. An answer of no sends
the computer back to search step C.
In the case that the answer to the question in E is no then subroutine F
is called and all p processors simultaneously perform all the pivots listed
in the pivot queue on their own copies of the basis tree. The reason this is
done is that it is faster for each processor to carry out the calculation
rather than have just one of them do the calculation and communicate the
result. As discussed in the next section, we also tried to break the pivoting
9 p.
process down into small tasks and have individual processors carry out these
tasks. However, the sizes of the tasks, sometimes referred to as their
granularity, was found to be too small for a parallel processing strategy to
reduce the time over the previously reported strategy.
After each processor completes its pivot task, it moves to
synchronization step G where it waits until all processors have finished
pivoting. The reason for the synchronization step here is that it is
necessary to prevent the pivot queue list from being altered before one or
more of the processors have completed their pivot tasks.
Once all processors indicate in step G that they have finished pivoting
control returns to search step C, completing the main computational loop.
4. ATTEMPT TO PARALLELIZE THE PIVOT STEP
At the end of the search step each processor has located the most
negative element in its current row, or has determined that the row has no
negative elementi. Hence the problem queue has up to 14 possible pivots that
can be performed. However it is not always possible to perform two (or more)
pivots simultaneously because the pivot operations may require changes to be
made on the basis functions of some of the same nodes.
To make these ideas precise let Q be the set of potential pivots, and
let k be an element of Q. Define T(k) to be the set of basis tree nodes
at which one of the five node functions p(k), d(k), x(k), u(k), v(k) is
changed. Two pivots h and k are said to be independent if T(h) n T(k) 0,
and are said to be dependent if T(h) n T(k) 0 0. Consider the graph P = (Q,E}
whose nodes consist of the pivots k in the pivot queue, and whose edges
(h,k) belong to 6 if pivots h and k are dependent. We would like to
find a maximal independent subset S in P which would then consist of a set
of pivots which can be done in parallel. Since finding a maximal independent
10
l0M lee
subset of a set is a well known NP hard problem, we used the following
heuristic program:
(1) Let S 0.
(2) Find a node k of largest degree in P.%
(3) Remove k from Q; put k in S.
(4) For each edge (k,h) in E, remove h from Q, and remove (k,h) from E.
(5) If Q 0 go to (2) else go to (6).
(6) Stop. Set S consists of independent pivots.
Since at least one node is removed from Q at each step of the algorithm, it
runs very quickly. The set S will not necessarily be the maximal
independent subset of P, but will usually be quite good.I
We implemented the above procedure but found that it was not
computationally as efficient as the idea described in the previous section.
The main difficulty is that the computational effort of determining for each
k in Q the set T(k) is essentially the same as actually performing the
pivot on k. To that time the work of determining the independent set S and
actually carrying out the independent pivots in S must be added. Overall
the computation time was increased by the attempt to parallelize the pivot
step, so it was abandoned.
5. COMPUTATIONAL RESULTS
In the course of our computational experiments we solved more than 500
randomly generated fully dense transportation problems on the Butterfly
computer. The averaged data is presented graphically in Figures 1 - 5. The
raw computational data appears in the appendix.
Figure 1 shows the performance of the parallel primal algorithm on square
transportation problems for sizes n = 500,1000,...,3000, and for shipping
mounts ranging from 1 (assignment problems) to 1000. As noted in (22] for
11
much svmaller problem sizes, the execution times increase with problem size and
with shipping amounts. We fitted these data points with an exponentialI
function of the form Kna and found that the exponent a ranged from a = 2.0
for assignment problems to a = 2.2 for transportation problems having
shipping amounts in the range [1,1000]. The correlation coefficients for all
of these were greater than .999, indicating an extremely good fit.
Figure 2 shows the effect of varying cost ranges and problem size on
execution times for assignment problems. The larger the cost range the less
the dual degeneracy the problem has. To put it differently the smaller the
cost range is the larger is the number of alternative optimal solutions. The
primal transportation algorithm is able to take advantage of dual degeneracy,I
and is able to solve problems having smaller cost ranges much faster than
those having larger cost ranges. In contrast, some dual methods find the
opposite phenomena to be the case, see (19].
In order to determine the speedup factor of the parallel primal algorithm
we solved assignment problems with n = 500, 750, and 1000 on the Butterfly
computer with just one processor (no larger problems could be solved on a
single processor) and also solved the same problems with 2, 4, 6, 7, 8, 10,
12, and 14 processors. The averages of the execution times of these runs are
plotted in Figure 3. For problems having the same cost range the speedup
factor is the ratio of the execution time on one processor divided by the
smallest execution time achieved by using any number of processors. Note that
the speedup factor increases with problem size being about 2.2 for n = 500
and increasing to about 2.5 for n = 1000. Note also that the most
effective number of processors seems to be in the range 6 to 8 for this range
of problem sizes. As problems get larger, the primal algorithm should be able
12
to use more processors effectively to improve the efficiency of the search
part of the algorithm, as will become evident in the next two figures.
Since we were able to parallelize the search phase of the algorithm but
not the pivot phase it is important to find out whether the percentage of time
spent increased with problem size. We made runs on the sequential version of
the algorithm using a SUN work station. The average percentage of search time
for three assignment problems with sizes ranging from n = 100 to n = 2500 and
for two cost ranges [0,1000] and [0,10000], are shown in the bar graph of
Figure 4. The search percentage ranged from about 55% for n = 100 to about
95* for n = 2500. The fact that we parallelized the part of the sequential
algorithm that increases with n was the main reason for the good performance
of the parallel primal algorithm.
In order to investigate further the speedup of the parallel versus the
sequential code we made use of a speedup formula due to Rettberg and Thomas
[20]. Let T be the total execution time, let f the fraction of time spent
in the search phase of the algorithm, and let S be the speedup factor. Then
speedup, S, is
-i
S=- T p(l-f)T + fT/p (l-f)p + f
In Figure 5 we have plotted this curve for problems having costs in the range
[0,1000] with f = .83 as can be observed in Figure 4. We also computed the
observed speedup values by dividing the execution time in Figure 3 for each
value of p into its value for p = 1. These points are also plotted in Figure
5. The maximum speedup observed when p = 8 was 70 percent of its
theoretical maximum value. We were unable to continue this study for larger
13
values of p because of the memory limitation of a single processor of the
Butterfly computer.
6. CONCLUSIONS
This paper is a continuation of the experimental investigations begun in
the 1970's on primal simplex transportation codes. It presents evidence that
the primal code is well suited for parallel computation at least with the
computer architecture of the Butterfly computer. Significant speedup factors
for parallel over sequential machines were achieved which enabled the parallel
ccmputer to solve fully dense transportation and assignment problems which are
at least one order of magnitude larger than those previously reported.
14
- - -- - - -- -
amP I
7
2000
1500
* ship [1.10001M 1000- - ship [1,100]
-- ship [1,10]ship 1]
500
0L
500 1000 1500 2000 2500 3000
Problim size. a
Figure 1. The effect on execution time of varyingshipping amounts for square densetransportation problems. The cost range was[0,1000). The lowest curve gives solutiontimes for assignment problems. Fourteenparallel processors were used.
15
1200
1000-
600 -W (0,100001600-~ [0 0,1000]
-. (0.100]
400-
200-
0500 1000 1500 2000 2500 300
Problem zaze, a
Figure 2. The effect on execution tixes of varyingcost ranges for square assignment problems.Fourteen parallel processors were used.
i6
200*
N A n-1000100.m n-750
A 0 n-500
1 2 3 4 S I 7 I9101 12 13 14
Number of Process.,:
Figure 3. The effect On execution times of changingthe number of parallel processors for threedifferent assignment problem sizes. Foreach curve the maximm speedup factor isthe ratio of the time for one processordivided by the smallest time for anynmer of processors.
17
*
100 p.,
80 •I
60.I
[0,10001[0,100001
40
20I
II
0'100 250 500 1000 1500 2000 2500
Problem size, A
Figure 4. Search tine as a percentage of run timeon a sequential (SUN) computer. Twodifferent cost ranges (0,I000] and(0,10000] were used.
118 "f" "e,
5.
-(I- Theoretical
4-
3-
0 Actual
22
Number -of Processor
Figure 5. Theoretical maximum speedup and actualspeedup as a function of the number ofprocessors in solving assignment problemswith cost ranges [0,1000]. The maxim=mspeedup, achieved with 8 processors, wasapproximately 70 percent of the theoreticalmaximum.
19
REFERENCES
1 1] Barr, R. S., F. Glover and D. Klingman. "The Alternating Basis Algorithmfor Assignment Problems." Mathematical Programming 13, (1977) 1-13.
1 2] Bertsekas, D. P.. "A New Algorithm for the Assignment Problem,"Mthemtical Programing 21 (1981) 152-171.
[ 3] Braaley, G. H., G. G. Brown and G. W. Graves. "Design and Implementationof Large Scale Primal Transshipment Algorithms." Management Science 24,(1977) 1-34.
[ 4] Charnes, A., and W. W. Cooper. Management Models and IndustrialApplications of LInear Programing, Vol. I. Wiley, New York, 1961.
[ 5] Charnes, A., D. Karney, D. Klingman, J. Stutz and F. Glover. 1975."Past, Present and Future of Large Scale Transshipment Computer Codes andApplications." Comp. Opns. Res. 2 (1975) 71-89.
[ 6] Dentzig, G. B.. Linear Programing and Extensions. Princeton U. Press,Princeton, N.J., 1963.
[ 7] Derigs, V., "The Shortest Augmenting Path Method for Solving AssignmentProblems - Motivation and Computational Experience," Annals ofOperations Research 4 (1985/6) 57-102.
8] Dorhout, B., "Het lineaire toewijzingsproblem, vergelijking vanalgorithmen," Report BN21, Stichting Mathematisch Centrum, Amsterdam(1973).
[ 9] Ford, L. R. and D. R. Fulkerson, Flows in Networks, Princeton UniversityPress, Princeton, NJ. 1962.
[10] Glover, F. and D. Klingman. "Comment on a Note by Hatch on NetworkAlgorithms." Operations Research, 26, (1978) 370-374.
(11] Glover, F., D. Karney and D. Klingman. "Implementation and ComputationalComparisons of Primal, Dual and Primal-Dual Computer Codes for Minimum
Cost Network Flow Problems." Networks 4, (1974) 191-212.
[12] Hatch, R. S. "Bench Marks Comparing Transportation Codes Based on PrimalSimplex and Primal-Dual Algorithms." Operations Research (1975) 23,1167-1171.
[13] Hung, M. S. and W. 0. Rom, "Solving the Assignment Problem byRelaxation," Operations Research 28 (1980) 969-982.
[14] Kennington, J. L., and R. V. Helgason, Algorithms for Network Program'-aming, Wiley-Interscience, New York, 1980.
[15] Kuhn, H. W., "The Hungarian Method for the Assignment Problem," NavalReearch Logistics Quarterly 2(1955) 83-97.
[16] Mrtello, S. and P. Toth, "Linear Assignment Problems," Annals ofDiscrete Math. 31 (1987) 259-282.
20U
[17] McGinnis, L. F., "Implementation and Testing of a Primal-Dual Algorithmfor the Assignment Problem," Operations Research 31 (1983) 277-291.
(18] Miller, D. L., and J. F. Pekny, "Results from a Parallel Branch and BoundAlgorithm for Solving Large Asymmetric Travelling Salesman Problems,"Working Paper, April, 1988.
[19] Pekny, J. F. and D. L. Miller, "A Parallel Branch and Bound Algorithm forSolving Large Asymmetric Traveling Salesman Problems," Working Paper,May, 1988.
(20] Rettberg, R., and R. Thomas, "Contention is no Obstacle to Shared-MemoryMultiprocessing," Coa=. A.C.N., 29 (1986) 1202-1212.
(21] Srinivasan, V. and G. L. Thompson, "Accelerated algorithms for labellingand relabelling of trees, with applications to distribution problems," J.Assoc. Comput. Macb. 9(1972) 712-726.
[22] Srinivasan, V. and G. L. Thompson, "Benefit-cost analysis of codingtechniques for the primal transportation algorithm," J. Assoc. Comput.MAch. 20(1973) 194-213.
(23] Srinivasan, V. and G. L. Thompson, "Cost Operator Algorithms for theTransportation Problem," Mathematical Programaing 12 (1977) 372-391.
(24] Thompson, G. L., "A Recursive Method for Solving Assignment Problems,"in: Studies on Graphs and Discrete Programing, Annals of DiscreteMAthematics II, ed. P. Hansen (North-Rolland, Amsterdam, 1981) 319-343.
[25] Tmizawa, N., "On Some Techniques Useful for Solution of TransportationNetwork Problems." Networks 1 (1971) 173-194.
I
21
' -j
Appendix
The following four tables containI the run time information presented, in
Figures 1-5, and also-/additional information concerni the dispersion of run
times, build initial tree time, search and pivot time, number of rows
searched, number of pivots, and number of pivots per search. This information
may be useful to others who are implementing the primal or other
transportation algorithms on parallel computers. ,
22.
%'N
'
,
'p
W I jj !I li11 1 I I2 2I1 1111 .Mm=
Ship [I]
runs size run time run time build tree search and rows number pivotsavg/std min/max time pivot time searched pivots /search
10 500 22.3/1.3 20.5/24.7 5.3 16.3 19961 9601 6.7410 1000 86.3/5.3 78.0/93.5 20.9 64.2 59424 30536 7.2110 1500 203.4/12.1 188.5/229.1 47.8 153.6 118073 60914 7.2310 2000 370.4/16.7 346.8/402.4 82.0 285.8 192375 103853 7.5610 2500 578.1/19.1 556.0/610.0 130.2 444.8 254412 145625 8.0210 3000 801.4/32.5 753.11854.3 181.8 615.6 309771 187116 8.46
Ship [1,10]
runs size run time run time build tree search and rows number pivots
avg/std min/max time pivot time searched pivots /search
10 500 25.8/2.1 23.5/29.2 4.6 20.6 15187 5954 5.47
10 1000 99.9/13.6 87.9/134.4 17.8 80.9 40616 16829 5.75
10 1500 223.6/20.6 196.6/270.8 39.2 182.4 71565 30166 5.89
10 2000 405.4/35.4 375.0/485.0 71.4 331.3 109752 46122 5.87
10 2500 653.4/34.5 609.3/707.0 111.8 538.5 156575 66126 5.92
9 3000 931.2/85.9 849.2/1143.5 158.5 768.7 194868 85376 6.15
Ship [1,100]
runs size run time run time build tree search and rows number pivotsavg/std min/max time pivot time searched pivots /search
10 500 56.3/66.1 32.0/244.4 4.5 51.2 14978 5148 4.8010 1000 142.9/27.0 125.8/218.8 17.5 124.2 37595 13044 4.8210 1500 337.9/43.2 308.31447.2 40.0 296.1 64167 22166 4.8410 2000 635.5/84.5 577.8/843.6 69.0 564.0 98075 33001 4.6810 2500 1025.8/94.2 947.7/1250.0 109.0 913.7 124921 44339 4.9610 3000 1522.7/179.5 1404.3/1948.3 155.8 1363.2 161806 56584 4.90
Ship [1,1000]
runs size run time run time build tree search and rows number pivotsavg( std min/max time pivot time searched pivots /search
10 500 38.5/3.8 33.8/45.2 4.5 33.4 14683 5154 4.9010 1000 168.4/33.9 146.3/261.3 17.5 149.6 39022 12891 4.6010 1500 399.5/48.3 351.5/524.6 38.8 358.8 64849 21729 4.6910 2000 784.2/97.0 696.0/1027.6 70.1 711.3 101129 32223 4.4510 2500 1279.5/110.8 1185.2/1525.9 110.2 1166.1 128667 42681 4.66
8 3000 1937.1/262.6 1795.2/2578.5 156.7 1776.3 163039 54173 4.67
Table 1. Raw data for the graphs in Figure 1. All problemswere randomly generated with costs in the range(0,10001. All problems were solved using a 14processor butterfly computer.
23
Cost [0,1001
runs size run time run time build tree search and rows number pivotsavg/std min/max time pivot time searched pivots /search
10 500 13.9/0.8 12.5/15.5 7.2 5.9 7996 3880 6.8110 1000 41.1/1.3 38.9/42.9 28.0 11.7 9805 5124 7.3310 1500 79.3/3.1 74.1/83.2 60.7 16.4 11725 5871 7.0210 2000 135.4/5.9 125.6/142.3 107.0 25.4 13386 6470 6.7810 2500 200.3/4.0 195.5/207.5 168.5 28.0 15190 7222 6.6610 3000 281.9/10.6 264.9/301.9 243.7 33.7 17133 7844 6.42
Cost (0,1000]
runs size run time run time build tree search and rows number pivotsavg/std min/max time pivot time searched pivots /search
10 500 22.3/1.3 20.5/24.7 5.3 16.3 19961 9601 6.7410 1000 86.3/5.3 78.0/93.5 20.9 64.2 59424 30536 7.2110 1500 203.4/12.1 188.5/229.1 47.8 153.6 118073 60914 7.2310 2000 370.4/16.7 346.8/402.4 82.0 285.8 192375 103853 7.5610 2500 578.1/19.1 556.0/610.0 130.2 444.8 254412 145625 8.0210 3000 801.4/32.5 753.1/854.3 181.8 615.6 309771 187116 8.46
I.
Cost [0,10000]
runs size run time run time build tree search and rows number pivotsavg/std min/max time pivot time searched pivots /search
10 500 23.6/1.6 19.6/25.2 5.3 17.7 22605 10718 6.64 V10 1000 98.4/6.7 89.3/111.4 21.4 75.7 73509 38064 7.2610 1500 237.9/10.8 219.6/257.5 46.6 189.3 157858 84369 7.5010 2000 449.8/29.3 422.3/504.4 82.7 364.4 265005 146606 7.7510 2500 748.7/41.0 668.8/819.5 135.7 609.8 402653 221668 7.7110 3000 1136.4/60.0 1001.1/1220.7 185.2 947.2 567511 324035 8.00
Table 2. Raw data for Figure 2. All the problems wererandomly generated assigrment problems ofstated sizes and costs chosen in the givenranges. All were solved using a 14 processorButterfly computer.
24
N'.%.*.~ ., ~ 'N ,
W AA -J X f ?Ww. . . . i L't , ..
Execution Time vs. Processors
Processors n = 500 n = 750 b 1000
1 45.7 104.3 199.3
2 32.0 73.7 138.3
4 23.5 52.3 102.2
6 22.23 49.2 86.5
7 22.3 - 86.3
8 21.6 47.7 77.6
10 21.0 48.3 81.8
12 21.0 48.7 86.8
14 21.0 49.3 83.9
Table 3. Data used for Figures 3 and 5. Allproblems were randomly generatedassignment problems with costs inthe range [0,1000]
Search Time as a Percentageof Execution Time
n Cost Cost[0,1000] [0,10000]
100 56.4 54.1
250 61.7 63.9
500 78.1 78.6
1000 82.8 94.2
1500 94.2 95.5
2000 95.0 96.0
2500 94.8 96.4,.'
Table 4. Data for the graph in Figure 4. Thenumbers are average times for the solutionof three randomly generated assignmentproblems for various problem sizes andtwo different cost ranges. All problemwere solved by a sequential version ofthe primal code on a SUN work station.
25
We-