oblivious routing for lc permutations on hypercubes

Oblivious routing for LC permutations onhypercubes 1

Zhiyong Liu a,2, David W. Cheung b,*

a Laboratory for Computer Science, Massachusetts Institute of Technology, USAb Department of Computer Science and Information Systems, The University of Hong Kong, Pokfulam

Road, Hong Kong

Received 10 January 1996; received in revised form 11 December 1998

Abstract

We propose an oblivious algorithm to route linear-complement (LC) permutations on

hypercubes in circuit switched and wormhole routing. The algorithm guarantees that N in-

dependent paths can be set up simultaneously for any LC permutation with only a comparison

of two bits in one routing step for any path. An LC permutation is determined by a trans-

formation matrix T and a constant modi®er C. For all the LC permutations with the same

transformation matrix T (we call them a type of permutations), an algorithm is executed to ®nd

an ordered sequence of dimensions without knowing a particular permutation. When the

sequence of dimensions is used in the routing process for all the packets of a permutation of

this type, a comparison of two bits is carried out in each routing step in the packet trans-

mission process. It is guaranteed that no contention will occur between any two paths on the

use of the dimensional links, thus N independent paths can be set up simultaneously for the N

packets of an LC permutation. Time complexity of the algorithm for ®nding an ordered se-

quence for the use of the n dimensions is O(n3) for any type of LC permutations (rather than

one particular LC permutation), and it can be carried out o�-line. The routing process itself is

distributed, and oblivious. Ó 1999 Elsevier Science B.V. All rights reserved.

Keywords: Parallel processing; Hypercube; Linear-complement permutations; Routing algorithms; Circuit

switched routing; Wormhole routing

Parallel Computing 25 (1999) 445±460

* Corresponding author. E-mail: [email protected] This work is supported in part by USA ARPA under Contract N00014-95-1-1246.2 On leave from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China,

also supported in part by K.C. Wang Education Foundation, Hong Kong.

0167-8191/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved.

PII: S 0 1 6 7 - 8 1 9 1 ( 9 9 ) 0 0 0 0 8 - 3

1. Introduction

Linear-complement (LC) permutations are an important class of permutationswhich are frequently used in various parallel computation tasks. They are recognizedand called LC permutations in Ref. [1]. BPC-permutations [9,10] are a subset of LCpermutations. The usefulness of BPC-permutation are described in Refs. [9,10]. Inaddition, the importance of LC permutations is that data alignment requirements inmany non-linear parallel storage schemes, known as XOR-schemes [3], are LCpermutations. Since they are frequently used as kernel communication tasks in manyapplication programs, like numerical analysis, image processing, scienti®c and en-gineering computations, the e�ciency of their realization has important e�ect on thee�ciency of whole computation. A great deal of research e�ort has been made one�cient realization of them [1,2,4,5,7±14] on hypercubes and multistage networks.We will focus on e�cient realization of LC permutations on hypercubes.

Con¯icts in the message transmission have important e�ects on the hardwarerequirement and communication speed.

In a packet switched model, a whole message packet is maintained as an entity, ina routing step it is forwarded one link along the path from its source to its desti-nation. For a routing algorithm in packet switched model, if a node is to hold morethan one message at any time, we say that a network con¯ict occurs in the node. If arouting algorithm can guarantee that no network con¯ict occurs, no bu�er is neededfor the transmission and no computation for arbitration is needed to ``select'' amessage to send.

In circuit switched model, a complete path needs to be established dedicatedly fora message packet from its source to its destination. Any link on this path cannot beused by any other packet during the transmission process of this packet. In awormhole routing model, a packet is partitioned into a number of ¯its, which areunits for routing. In a routing step, a ¯it can be forwarded one link along its pathand contiguous ¯its of a packet are always on the same or adjacent nodes along theirpath. A link in a network is also called an edge in a graph in this paper.

For a routing algorithm in circuit switched and wormhole routing model, if morethan one message is to be sent on one edge, we say that a network con¯ict occurs onthe edge. Note that when no edge is shared by two paths we say that the two pathsare independent to each other. The number of messages needing to use an edge isalso referred to as congestion of the edge in literature [6]. The maximum congestionof the edges in the network is called the congestion of the routing algorithm. We saythat a routing algorithm is con¯ict-free if its congestion is only one. A con¯ict-freealgorithm makes the transmission e�cient in terms of both communication time andhardware.

On the other hand, simplicity of a routing algorithm also has important e�ect onrequired hardware and communication time. We say that a routing algorithm isoblivious if the path for a packet is decided before routing and in routing time it doesnot need to know the routing for other message packets in the system, nor does itneed to remember the routing history for itself. An oblivious routing algorithm issimple to be implemented since a node (a source or intermediate) routes a message

446 Z. Liu, D.W. Cheung / Parallel Computing 25 (1999) 445±460

packet on it based solely on the destination address of the packet (and the address ofthe current node, of course).

The algorithms given in Refs. [2,14] are e�cient in that it can realize any LCpermutation in n steps on an N node hypercube (where n � log2 N ) when used inpacket switched model with one node needing to handle at most two packets in somesteps, and when it is used in circuit switched model, it can set up N independent pathsfor the N packets. But a path cannot be decided for any particular packet before therouting is really carried out. In a routing step, a comparison of the destination ad-dresses of two packets is needed to decide which dimension will be used to send apacket, then the bit-comparison between the node address and the two destinations(on the selected bit) can be carried out to decide which packet will be sent out on theselected dimension. The algorithm proposed in Ref. [7] can realize an LC permu-tation in n steps with any node needing to handle only one packet in any step and itneeds only one bit-comparison in each routing step, but it cannot be used to set up Nindependent paths for the N packets (it may cause congestion of two for some edges),since it may transfer packets circularly along two dimensions in some steps.

We develop an algorithm for routing LC permutations in circuit switched andwormhole routing model on hypercubes, which can set up N independent paths whileonly one bit-comparison is needed in each routing step. First, an algorithm is pre-sented which can ®nd an ordered sequence of the n dimensions of the hypercube forany type of LC permutations without knowing the particular permutation itself (thusit can be carried out o�-line for any type of LC permutations). When this orderedsequence of dimensions is used in communication time to route the individualpackets, N independent paths will be set up simultaneously without any con¯ict onthe use of the edges.

The algorithm is inspired by the algorithms in Refs. [2,7,10,14]. It seems that ifsome cycles of dimensions can be found o�-line, like the cycles of BPC permutationsin Ref. [10], then the paths for the N packets can be determined before run time, andthe routing itself can be done in an oblivious manner, and can be done in a simplerway ± only one bit-comparison is needed in any step. The algorithms in Refs. [2,14]do produce sequence of the use of the dimensions, but the di�culty is that after arouting step, two packets may reside on one node, and the transformation will not bea nonsingular one. This makes it di�cult to only use the transformation matrix todecide the sequence for the use of the dimensions. Our idea here is that we use someimaginary dimension such that if the packets were transmitted along that dimension(in addition to other dimension) then the transformation between the sources (thecurrent nodes where the packets reside) and the destinations will always be expressedby a nonsingular matrix, thus the sequence of the use of the n dimensions can bedecided only using the transformation matrix, without resorting to the permutationitself (thus the sequence can be predecided o�-line).

This paper is organized as follows. Some de®nitions will be given in Section 2. Ano�-line algorithm to ®nd an ordered sequence for the use of the n dimensions, as wellas the algorithm going to be used in communication time to route the packets, will bedeveloped in Section 3. Section 4 will prove the correctness of the routing algorithm.Concluding remarks will be given in Section 5.

Z. Liu, D.W. Cheung / Parallel Computing 25 (1999) 445±460 447

2. De®nitions

An n-dimension hypercube has N � 2n nodes. There is a link (edge) between nodei and node j if and only if there is a k such that ik 6� jk, where 06 i; j6N ÿ 1,06 k6 nÿ 1 and inÿ1inÿ2 � � � i0 and jnÿ1jnÿ2 � � � j0 are binary representations of in-tegers i and j. We also call the link the kth dimensional link.

We will express a permutation as P � f�S;D� j 06 S;D6N ÿ 1g, where S is a sourceaddress, D is its destination address, and we express S and D in their binary formssnÿ1snÿ2 � � � s0 and dnÿ1dnÿ2 � � � d0. In a permutation, each S and D only appears once.

We say a permutation P is a linear permutation if and only if any source±desti-nation pair �S;D� in P can be expressed as Ds � T � Ss, where T is a nonsingularn� n binary matrix with tnÿ1;nÿ1 being the top-left element and t0;0 being the bottom-right one, �anÿ1anÿ2 � � � a0�s is the transpose of an n-vector �anÿ1anÿ2 � � � a0�; and all theoperations are carried out in GF(2). That is to say that in a linear permutation, anybit in the destination address is a linear combination of the bits in its source address:

di �Xnÿ1

j�0

ti;j � sj for 06 i6 nÿ 1:

Let � be the bitwise exclusive-or operation and C be an integer less than or equal toN ÿ 1.

De®nition 1. A permutation is an LC permutation if and only if it can be expressed asP 0 � f�S;D� C� j 06 S;D;C6N ÿ 1g, where P � f�S;D� j 06 S;D6N ÿ 1g is alinear permutation.

Since matrix T is a nonsingular matrix, it has a nonsingular inversion matrix R.We will call matrix T and its inversion matrix R the transformation matrices, sincethey transform source (destination) addresses into destination (source) addresses. Wewill express Ds � T � Ss as D � T � S when no confusion will arise. We use tk;� (rk;�)to denote the kth row of T(R), and use t�;k to denote the kth column of T. We usematrix E to denote the identity matrix, and ek;� is the kth row of E.

We call all the permutations which have the same transformation matrix T (butdi�erent constant modi®er C) a type of LC permutations. We will see that theconstant modi®er C does not a�ect the routing sequence for the use of the n di-mensions (it only has e�ect on the transmission paths of the individual messages).

3. Routing LC permutations through prede®ned paths

We refer to the destination address of a packet on node S as D at any step. LetD � �T � S� � C and R be the inversion matrix of T. Thus it can also be expressed asS � �R� D� � C0 where C0 is also a constant.

The following algorithm, algorithm sequencing, is used to ®nd out an order forthe use of the n dimensions before run time. In communication time, any packet willmatch one bit of its destination address and the node address, only di�erencebetween the routing here and the naive hypercube routing is that the routing algorithm


in this paper will use the order generated by the following algorithm, rather than thesimple descending or ascending order in naive routing algorithms for hypercubes.

Algorithm Sequencing/* T, R: transformation matrices, where R � Tÿ1.Selected: the set of dimensions selected so far.Seq[1:n]: an array for the ordered sequence.aux: a variable used to record the ``imaginary'' dimension.

This is only used for modifying the transformation matrices and for the proofof the correctness of the routing algorithm, it has nothing to do with the realrouting process. */

0 BEGIN1 k:� n ÿ 1;2 Selected:� {};3 aux:� n;4 FOR j:� 1 TO n DO5 BEGIN6 Seq[j]:� k;7 Selected:�Selected + {k};8 IF tk;k � 19 THEN10 BEGIN11 IF k� aux12 THEN aux:� n;13 rk;� :� ek;�;14 /* Modify R according to Eq. (3). */15 END16 ELSE17 BEGIN18 IF aux� n19 THEN20 Find a such that tk;a � 1 and aux:� a;21 raux;� :� raux;� � rk;� � ek;�;22 rk;� :� ek;�;23 /* Modify R according to Eq. (5). */24 END;25 T :� Rÿ1; /* Modify T according to R. */26 IF aux� n27 THEN28 Pick up an unselected dimension b29 ELSE30 Find b such that tb;a � 1;31 k:� b;32 END;33 END;


The above algorithm takes nÿ 1 as the ®rst dimension to be used. Actually, anydimension can be taken as the ®rst one to be used without a�ecting the validity of thealgorithm because of the symmetry of the problem in hand. Time complexity (bit-level) of algorithm sequencing is O(n3) thanks to the inversion of the transformationmatrix in each step. The above algorithm generates an ordered sequence for the useof the n dimensions, any packet in a permutation will travel the n dimensions (ifnecessary) in that order.

The routing algorithm itself in communication time will be extremely simple ±only one bit-comparison is needed in each step: ``FOR k :� 1 TO n DO IFdSeq�k� 6� sSeq�k�, THEN send the packet along dimension Seq[k];''.

We give an example in Fig. 1. The permutation is D � �T � S� � �0000�. Thetransformation matrix T is

T �0 0 1 10 0 0 11 0 0 00 1 0 0

0BB@1CCA:

The ordered sequence found by the algorithm sequencing is (3, 1, 2, 0). Note thatbecause there is not an nth dimension in an n-dimensional hypercube, whenever wesay n is the imaginary dimension aux we mean no such dimension is needed. InFig. 1, the ®rst number on each node is the node number, and the second number isthe destination address of the packet issued by the node. There is a number asso-ciated with each link. It is the packet which uses the link as an edge of its path in thewhole transmission process. It is interesting that in this example every link has beenused (in one direction). This permutation is used for the data alignment requirementto access the ®rst column of the matrix stored by the XOR-scheme in Ref. [3]. Anypacket matches its destination address with the nodes in the order (3, 1, 2, 0). Fromthe ®gure we can see that no edge is shared between any two packets, thus N

Fig. 1. No egde is shared by two paths.


independent paths are set up automatically while each packet is routed obliviously.Also, all the packets are transmitted along the shortest paths.

4. Validity of the algorithm

It is easy to see that any packet arrives at its destination correctly after therouting, because any packet just simply matches one bit of its destination addresswith the node in each step of the routing process. We will prove that no edge will beshared by any two packets in the routing process. In order to prove this, it is enoughto prove that whenever two packets meet on any node in their transmission, they willbe transmitted from the node along di�erent dimensions, or it is equivalent to saythat only one packet can arrive at any node through a link on any one dimension. Inthe following if the destination of a packet is D, 06D6N ÿ 1, we just simply call itpacket D.

In the algorithm sequencing, the matrices T and R are modi®ed in each iteration.Let's T�k� and R�k� be the matrices after the kth modi®cation. Thus T�0� and R�0� arethe original matrices. We are discussing the routing in circuit switched or wormholemodel, but we describe the process as if it was in a packet switched model, so that wecan catch n snapshots in the routing process, each of which corresponds to a step inpacket switched routing model. We say that dimension k is an active dimension in astep if packets can only be actually transmitted along this dimension in the step. We®rst prove that each modi®cation of the matrices T and R represents a snapshot ofthe distribution of the packets on the nodes.

In the ®rst step, dimension nÿ 1 is to be used as the active dimension.Case 1. t�0�nÿ1;nÿ1 � 1. Let us consider two packets D and D0 on two nodes S and S0,

respectively, where S and S0 are neighbors on dimension nÿ 1:

dnÿ1 � t�0�nÿ1;� � S � cnÿ1 � t�0�nÿ1;nÿ1 � snÿ1 �Xnÿ2

j�0

t�0�nÿ1;j � sj � cnÿ1;

d 0nÿ1 � t�0�nÿ1;� � S0 � cnÿ1 � t�0�nÿ1;nÿ1 � s0nÿ1 �Xnÿ2

j�0

t�0�nÿ1;j � s0j � cnÿ1;

we also know that snÿ1 6� s0nÿ1, and sj � s0j for j 6� nÿ 1, thus we know that

dnÿ1 � d 0nÿ1 � t�0�nÿ1;nÿ1 � �snÿ1 � s0nÿ1� � 1:

This means that if S sends D to S0, S0 also sends D0 to S. After transmission in thisstep, each node holds only one packet.

Thus the relation between the destination D of any packet and its new nodeaddress S can be expressed as:

snÿ1 � dnÿ1 and sj � r�0�j;� � D� c0j; when j 6� nÿ 1 :

That is

S � R�1� � D� C0�1�; �1�


where r�1�nÿ1;� � enÿ1;�, and r�1�j;� � r�0�j;� , c0�1�nÿ1 � 0, and c0�1�j � c0j, for j 6� nÿ 1. Also, weare sure that R�1� is nonsingular because the mapping is one to one, and thus there isalso a T�1� which is the inversion of R�1�.

Case 2. t�0�nÿ1;nÿ1 � 0. In this case, we have

dnÿ1 � d 0nÿ1 � t�0�nÿ1;nÿ1 � �snÿ1 � s0nÿ1� � 0:

Thus we know that both D and D0 will be on one of the two nodes (S or S0) after thisstep. Then the mapping between the destination address and the current node couldalso be expressed by the same expression using R�1� as in Eq. (1). But this time, weare sure that R would be singular since two packets would match one node number.Thus there would not exist an inversion matrix of R�1�, and it would be di�cult toproceed to select the following active dimensions by only manipulating the trans-formation matrix without knowing the real permutation before run time. This is whycommunication time is needed to compare two words residing on one node to decidewhich dimension could be used as the next active dimension as in the algorithms inRefs. [2,14].

The key idea here is how to ®nd a nonsingular transformation matrix after eachstep such that we can always prede®ne the next active dimension before run time byusing only the transformation matrix.

Imagine, if the packets were routed along two dimensions, as explained in thefollowing, we could get a distribution that each node held only one packet. This isthe e�ect of the use of the dimension aux. (But di�erent from the case in the routingalgorithm in Ref. [7], here it is not actually used for packet transmission in our circuitswitched or wormhole routing model, it is just used for the manipulation of thetransformation matrices and we thus call it ``imaginary'' dimension.)

Consider the two packets D and D0 on the two nodes S and S0, where S and S0 areconnected by a link on dimension aux. Suppose aux � a and t�0�nÿ1;a � 1. Since

dnÿ1 � t�0�nÿ1;� � S � cnÿ1 � t�0�nÿ1;a � sa �Xnÿ1

j�0;j 6�a

t�0�nÿ1;j � sj � cnÿ1

and

d 0nÿ1 � t�0�nÿ1;� � S0 � cnÿ1 � t�0�nÿ1;a � s0a �Xnÿ1

j�0;j 6�a

t�0�nÿ1;j � s0j � cnÿ1;

thus we have

dnÿ1 � d 0nÿ1 � t�0�nÿ1;a � �sa � s0a� � 1:

Since snÿ1 � s0nÿ1, if we routed the packets in the way that ``if dnÿ1 � snÿ1 then sendthe packet along dimension nÿ 1, else send the packet along dimension a'' then oneof the two packets on the two nodes connected by dimension a would go alongdimension nÿ 1 while the other would go along dimension a. It is easy to see thatthis is also true for the two packets on the nodes connected on dimension nÿ 1. Thusany four packets on the four nodes connected by dimension nÿ 1 and dimension awould be sent along a circle of the four links on dimension nÿ 1 and dimension a, asshown in Fig. 2.


The relation between D and the new node S would be expressed by:

snÿ1 � dnÿ1;

sa � �enÿ1;� � r�0�nÿ1;� � r�0�a;�� D� c0�0�nÿ1 � c0�0�a � 1 and

sj � r�0�j;� � D� c0�0�j for j 6� a and j 6� nÿ 1:

That is, S � R�1� � D� C0�1�, where

r�1�nÿ1;� � enÿ1;�;

r�1�a;� � enÿ1;� � r�0�nÿ1;� � r�0�a;� and

r�1�j;� � r�0�j;� for j 6� a and j 6� nÿ 1:

Since the mapping could be expressed as above and it was one to one if routedcircularly, we are sure that R�1� is nonsingular and it has an inversion T�1� and thereasoning process can proceed.

But we do not actually transmit packets along dimension a, so dimension a is justused as an imaginary dimension to help ®nd out a matrix expression of the relationbetween the packets and the nodes. However, from the above analysis we are sure ofthe following:· Claim 1. A node can have two packets or zero packets, the same as in Refs. [2,14].· Claim 2. The packet D on a node S can be expressed by D � T �1� � S � C�1�, or

D � T �1� � S00 � C�1�, where S00 is the node connected to S on dimension aux.

Theorem 1. A packet D is on node S or S00 after the jth dimension in the orderedsequence is used for routing, where D � T �j� � S � C�j� and S00 is a neighbor of S ondimension aux.

Proof. We already know that it is true when j � 1 from the analysis above.Suppose that it is true when j � u, we prove it is true for j � u� 1, where

06 u6 nÿ 1 and we call the ®rst step ``step 1''. Suppose that the active dimension instep u is k, and the algorithm sequencing ®nds that the next active dimension is b.

Case 1. tb;b � 1. Consider the two packets D and D0 on the two nodes S and S0,respectively, where S and S0 are connected by a link on dimension b. It is easy to seethat db 6� d 0b since db � d 0b � tb;b � �sb � s0b� � 0 � 1: Thus in step u� 1, if D will besent to S0 then D0 will be sent to S, or both of them will not be sent. Therefore, afterstep u� 1 the relation between a packet D and the node number S will be that only sb

Fig. 2. The imagined transmission forms a loop.


and db become the same, the other bits remain the same as before. This is expressedby

sb � db; sj � r�u�j;� � D� c0�u�; j 6� b: �2�This means that after step u� 1 we have S � �R�u�1� � D� � C0�u�1�, where c0�u�1�

b � 0,the other bits of C0�u�1� are the same as that of C0�u� and

r�u�1�b;� � eb;�; r�u�1�

j;� � rj;�; j 6� b: �3�The algorithm does appropriately modify the transformation matrix R in accordancewith Eq. (3) in this case (see line 13).

Case 2. t�u�b;b � 0. It is easy to see that db � d 0b.Let us consider the two packets D and D00 ``mapped'' by node S and node S00,

where S and S 00 are two nodes connected by a link on dimension aux and D and D00

may be really on S and S00, respectively or they may be actually on one of S and S00.Let aux � a.

Subcase 1. D and D00 are really on nodes S and S00, i.e. the mapping is reallyone to one, similar to the original distribution. Since db � d 00b �t�u�b;a � �sb � s00b� � 0 � 1, we know only one of them is to be transmitted on di-mension b. Similar to the argument in the ®rst step, if we imagine only the packetwhose bit b has not been the same as bit b of its node would be transmitted ondimension b, otherwise it would be transmitted on dimension a, then transmissionof the four packets on the four nodes connected by the links on dimension b anddimension a would form a circle. Consequently, each node would hold only onepacket, and all the nodes would match its bit b and the packets, and only thosepackets, whose bit b has been matched, would be sent on dimension a (i.e. only sa

would change).If we denote that a packet D is on node S�old� before this circular transmission and

it is on node S�new� after this transmission, then we have

s�new�a � s�old�

a � s�old�b � db � 1;

and, then

s�new�a � s�old�

a � s�old�b � db � 1

� �r�u�a;� � D� c0�u�a � � �r�u�b;� � D� c0�u�b � � �eb;� � D� � 1:

That is after step u� 1 we have packet D on node S where

sb � db;

sa � �r�u�a;� � r�u�b;� � eb;�� D� c0�u�a � c0�u�b � 1:

sj � r�u�j;� � D� c0�u�j ; j 62 fb; ag:�4�

The relation between the destination addresses and the nodes can be expressed asS � �R�u�1� � D� � C0�u�1�, where c0�u�1�

b � 0 and c0�u�1�a � c0�u�a � c0�u�b � 1, the other bits

of C0�u�1� are the same as those of C0�u�, and


r�u�1�b;� � eb;�;

r�u�1�a;� � r�u�a;� � r�u�b;� � eb;�:

r�u�1�j;� � r�u�j;� ; j 62 fb; ag:

�5�

Subcase 2. D and D00 are actually on one node, S or S00, similar to the distributionthat when t�0�nÿ1;nÿ1 � 0 but after they are transmitted on dimension nÿ 1.

db � d 00b �Xnÿ1

j�0

t�u�b;j

� S � c�u�b

!�

Xnÿ1

j�0

t�u�b;j

� S00 � c�u�b

!� t�u�b;a � �sa � s00a� � 0

� 1� 1 � 1:

Thus we are sure that only one of them will be sent on dimension b, and that nocon¯ict will occur on the use of the links on this dimension. However, the keyproblem here is how to modify the transformation matrix such that our reasoningprocess can proceed. Notice that it will not be correct to use Eq. (3) to modify matrixR although packets are only transmitted on dimension b. This is because R�u� is notthe real mapping from the packets to the nodes, but an ``imaginary'' mapping withthe condition in our Claim 2 as mentioned before.

Let us consider another two nodes X and X00 in addition to the two nodes S andS00, where X and X00 are also connected on dimension b, X and S, and X00 and S00, areconnected on dimension a, as shown in Fig. 3. Suppose that packets Y, Y00, D and D00

are on nodes X, X00, S and S00, respectively, by the imagination. Actually we may havethe following two real situations:

Fig. 3. Correspondence between di�erent cases.


Situation 1. D and D00 are on node S, while Y and Y00 are on node X.From the above we already know that only one of the two packets D and D00 will

be transmitted on dimension b, similarly we know only one of Y and Y00 will betransmitted on dimension b.

Since db � yb � �Pnÿ1

j�0 t�u�b;j � S � c�u�b � � �Pnÿ1

j�0 t�u�b;j � X � c�u�b � � t�u�b;b � �sb � xb� � 0� 0� 1 � 0, we know that only one of D and Y will be transmitted on dimensionb. Similarly we know that only one of D00 and Y00 will be transmitted on di-mension b. Combining the above analysis for D, D00, Y and Y00, without lose ofgenerality, suppose that D and Y00 will be transmitted. The resulted distributionafter step u + 1 will be that D and Y will be on node X while D00 and Y00 will beon node S.

This is shown in Fig. 3(a) and is the same as the imaginary transmission that Dand Y00 are transmitted on dimension b while D00 and Y are transmitted on dimensiona, as shown in Fig. 3(c).

Situation 2. D and D00 are on node S, while Y and Y00 are on node X00.The same as in situation 1, only one of D and D00 can be transmitted on dimension

b and only one of Y and Y00 can be transmitted on dimension b; also, only one of Dand Y can be transmitted on dimension b and only one of D00 and Y00 can betransmitted on dimension b. Again, we suppose D and Y00 are transmitted on di-mension b in accordance with situation 1 above.

After D and Y00 are transmitted on dimension b, the resulted distribution after stepu + 1 will be that D00, D, Y, and Y00 are on nodes S, X, X00, and S00, respectively. This isshown in Fig. 3(b), and is also the same as the imaginary transmission shown in Fig. 3.

Thus the same as in subcase 1, the mapping of the destinations and the nodes willbe expressed by Eq. (5).

As we can see from lines 21 and 22 that the algorithm does modify the trans-formation matrix R appropriately in accordance with Eq. (5).

The above reasoning shows that algorithm sequencing modi®es the transforma-tion matrices correctly such that each modi®cation represents a snapshot of themapping between the destination and the node addresses. �

Notice that in the above proof we did not give a detailed transformationbetween C�u� and C0�u� in each step for two reasons. First, it is simple. Secondly, itdoes not a�ect the routing sequence for the use of the n dimensions no matterwhat a particular constant modi®er is in each step. But we are sure of the fol-lowing:· If r�u�b;� � eb;�, then we have t�u�b;� � eb;�.· When r�u�b;� � t�u�b;� � eb;� and c0�u�b � 0, we have c�u�b � 0.

Theorem 2. After the transmission through the sequence of the dimensions any packetarrives at its destination.

Proof. After the routing the mapping is expressed by D � E � S � C�n� and c�n�j � 0for 06 j6 nÿ 1. Also, there is not an unused dimension n that D is mapped by the


nth neighbor of S. Thus we are sure that each node holds only one packet and thenode is exactly the destination of the packet. �

Theorem 3. The algorithm sequencing is valid in that aux and b can always be found inany step.

Proof. Suppose that u dimensions have been selected (and used for routing), where06 u6 nÿ 1. From the above we can see that T �u� is nonsingular.

Suppose that at this step the active dimension is k. Referring to the algo-rithm sequencing, the only case we need to select a dimension a used as theimaginary dimension from the unselected dimensions is when t�u�k;k � 0. Thus weare sure that there is a nonzero element in row k and column a such that a isan unselected dimension, for if not then the kth row will be a linear combi-nation of some rows of the selected dimensions, and T�u� will be singular, acontradiction. Thus we can always ®nd such an imaginary dimension a when-ever we need to.

For the next active dimension b, when aux� n, the next active dimension can beselected from the unselected dimensions arbitrarily. When aux � a 6� n, the algo-rithm just needs to ®nd a nonzero element in column a. Since t�u�j;a � 0, for allj 2 Selected, thus we are sure that there is such a b among the unselected dimensionsthat t�u�b;a � 1, for if not then the ath column of T �u� will be a zero column and T �u� willbe singular, a contradiction. Thus we can always ®nd an unselected dimension to beused as the next active dimension for routing. �

In fact in the sequence ®nding process, whenever an active dimension is foundwhich is the same as the last imaginary dimension (i.e. k� aux in line 11 of algorithmsequencing), we ®nd a ``cycle'' of dimensions. If the number of dimensions in thecycle is m and the dimensions are i1; i2; . . . ; im, then all the numbers of di1 di2 � � � dim

constitute a complete residue system module 2m. This cycle is similar in the form to acycle in Bit-Permute for BPC permutations [10]. Although this cycle does not behavelike a cycle in Bit-Permute function for BPC-permutations where the cycle maps asource address exactly to a destination, we can use this kind of cycles as an order forthe purpose of routing.

The above theorem guarantees that the modi®cation of the transformation ma-trices in algorithm sequencing represents ``exactly to some extent'' the mappingbetween the packets and the nodes. Thus a path is predetermined for any packet. Incircuit switched model, a path is set up for a packet, and is kept on the wholetransmission process of the packet. In the following, we prove that the paths set upfor the N packets are independent to each other when the paths are chosen accordingto the order for the use of the dimensions generated by algorithm sequencing. It issimple and straightforward when Theorem 1 is used.

Theorem 4. No edge will be shared by any two paths while the packets are routedindependently using the n dimensions with the order given in the array Seq generated byalgorithm sequencing.


Proof. Let us call a path for a packet D path D.We prove that when two paths D and D0 meet at a node S, they never meet on any

node S0 where S0 is a neighbor of S on any dimension.Suppose that packets D and D0 arrive at node S from the neighbors of S on di-

mension k and k0, and suppose Seq[x]� k, and Seq[x0]� k0. Then we have

Case 1.

D � T �x� � S � C�x�;

D0 � T �x0� � S � C�x

0�:�6�

If S 6� 0, only when x 6� x0 Eq. (6) can be true. Thus we know that k 6� k0. This meansthat D and D0 arrive at this node from di�erent dimensional links (or, arrive at thisnode in di�erent steps when routed in packet switched model).

But if S� 0 and x� x0, then D�D0.The above means that only one packet can arrive at a node from any dimensional

link, thus only one path can use an edge on any one dimension.Case 2.

D � T �x� � S � C�x�;

D0 � T �x0� � S0 � C�x

0�:�7�

In this equation S00 is the neighbor of S on some dimension a. If x� x0, then we knowthat D0 had already been on node S before the time D was coming from dimension k(k� Seq[x]). It is only our imagination that D0 was sent to S00 through dimension awhile D was sent to S through dimension k in step x.

This also means that only one packet can arrive at a node from a link on anydimension.

The above means that no two paths can share any edge, and N independent pathsare set up for the N packets when the packets are routed independently using the ndimensions in the order generated by the algorithm sequencing. �

5. Conclusions

We have developed an algorithm to ®nd an order for the use of the log2 Ndimensions of a hypercube with N nodes such that when this order is used N in-dependent paths will be established for LC permutations, while only one compar-ison of two bits is needed in each routing step. The order is decided withoutknowing a particular permutation. The N independent paths are all shortest paths.In circuit switched and wormhole routing model, this can guarantee that nocontention can occur for the use of any link. The routing itself is oblivious anddistributed in that in the routing process only a comparison of two bits from thedestination address of the message packet and the address of the current node isneeded, without requiring the knowledge of any other packet passing this node orany other nodes in the network.


The main advantage of this method is that it does not need arbitration compu-tation in routing time to decide which packet should go through which link, and anyLC permutation can be realized in one pass. Since no contention will occur in therouting process, no policy is needed to solve contention and no bu�er is needed tohold any packet temporarily, the hardware design of the router can be simpli®ed.Thus this method is optimal in terms of both communication speed and hardwarerequirement.

LC permutations are used in various parallel computation jobs, the permutationrequirements are very frequent, e�cient realization of them will speed up the wholecomputation process.

Acknowledgements

We thank Prof. F. Thomson Leighton and Prof. Antonio Fern�andez for thehelpful discussions.

References

[1] R. Boppana, C.S. Raghavendra, On self routing in benes and shu�e-exchange networks,

Proceedings of the 1988 International Conference on Parallel Processing, August 1988, vol. I,

pp. 196±200.

[2] R. Boppana, C.S. Raghavendra, Optimal self-routing of linear-complement permutations in

hypercubes, Proceedings of the Fifth Distributed Memory Computing Conference, April 1990,

pp. 800±808.

[3] J.M. Frailong, W. Jalby, J. Lenfant, XOR-schemes: a ¯exible data organization in parallel

memories, Proceedings of the 1985 International Conference on Parallel Processing, August 1985,

pp. 276±283.

[4] C. Ho, S. Johnsson, Optimal algorithms for stable dimension permutations on boolean cubes,

Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications,

January 1988, pp. 725±736.

[5] Q. Hu, X. Shen, W. Liang, Optimally routing LC permutations on k-extra-stage cube-type networks,

IEEE Trans. Computers 45 (1) (1996) 97±103.

[6] F.T. Leighton, Introduction to Parallel Algorithms and Architectures, Arrays á Trees á Hypercubes,

Morgan Kaufmann, San Mateo, CA, 1992.

[7] Z. Liu, J. You, X. Li, Con¯ict-free routing on hypercubes, Proceedings of the 1992 International

Conference on Computer and Information, May 1992, pp. 153±158 (IEEE Version).

[8] Z. Liu, J. You, Con¯ict-free routing for BPC-permutations on synchronous hypercubes, Parallel

Computing 19 (3) (1993) 323±342.

[9] D. Nassimi, S. Sahni, A self routing benes network and parallel permutation algorithms, IEEE Trans.

Computers C 30 (5) (1981) 332±340.

[10] D. Nassimi, S. Sahni, Optimal BPC permutations on a cube connected SIMD computer, IEEE Trans.

Computers C 31 (4) (1982) 338±341.

[11] C.S. Raghavendra, M.A. Sridhar, Optimal routing of bit-permutes on hypercube machines,

Proceedings of the 1990 International Conference on Parallel Processing, August 1990, vol. I,

pp. 286±290.

[12] X. Shen, Optimal realization of any BPC permutation on k-extra-stage omega networks, IEEE Trans.

Computers 44 (5) (1995) 714±719.


[13] A. Sengupta, K. Zemoudeh, Self-routing algorithms for strongly regular multistage interconnection

networks, J. Parallel Distributed Computing 14 (2) (1992) 187±192.

[14] K. Zemoudeh, A. Sengupta, Routing frequently used bijections on hypercube, Proceedings of the

Fifth Distributed Memory Computing Conference, April 1990, pp. 824±832.


oblivious routing for lc permutations on hypercubes

Documents