path switching and grading algorithms for advance channel reservation architectures

1684 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 5, OCTOBER 2009

Path Switching and Grading Algorithms for AdvanceChannel Reservation Architectures

Reuven Cohen, Niloofar Fazlollahi, and David Starobinski, Senior Member, IEEE

Abstract—As a result of perceived limitations of TCP/IP insupporting high-throughput applications, significant efforts haverecently been devoted to develop alternative architectures basedon the concept of advance channel reservation. In this paper, wedevelop a polynomial-time algorithmic framework, called gradedchannel reservation ��, to support the implementation of sucharchitectures. This framework enables users to specify minimumbandwidth and duration requirements for their connections. Uponreceiving a request, �� returns the highest graded path, selectedaccording to a general, multicriteria optimization objective. Inparticular, if the optimization criterion is delay, we prove that�� returns the earliest time available to establish the connection.Thereafter, we present a generalization of ��, called ��,that is capable of supporting path switching throughout a con-nection. We present practical methods for minimizing or limitingthe number of path switches. Through extensive simulations, weevaluate the performance of �� and its variants under varioustopological settings and applications workload. Our results showthat, for certain traffic parameters, optimized path selectioncombined with path switching can reduce the average delay ofrequests by an order of magnitude and increase the maximumsustainable load by as much as 50%.

Index Terms—Algorithms, performance evaluation, reservation,routing, scheduling, switching.

I. INTRODUCTION

N ETWORK backbones are often presumed to be overpro-visioned. Yet, the emergence of new applications with un-

precedented bandwidth requirements is likely to quickly changethe current state of affairs.

For instance, future grid applications will require transfer ofextremely large files between different national labs and re-search centers [1]. As a simple illustration, experiments run onthe new Large Hadron Collider (LHC) accelerator at CERN,Switzerland, are expected to generate a prodigious volume ofdata, reaching the order of ExaBytes ( bytes).This data will have to be transferred from CERN to various sites

Manuscript received May 25, 2007; revised June 05, 2008; approved byIEEE/ACM TRANSACTIONS ON NETWORKING Editor C. Qiao. First publishedAugust 07, 2009; current version published October 14, 2009. This workwas supported in part by the U.S. Department of Energy under ECPI GrantDE-FG02-04ER25605 and the U.S. National Science Foundation under GrantsANI-0132802 and CCF-0729158. A preliminary version of this paper appearedin the Proceedings of the IEEE Gridnets Workshop, San Jose, CA, October2006.

R. Cohen is with the Department of Mathematics, Bar-Ilan University,Ramat-Gan 52900, Israel.

N. Fazlollahi and D. Starobinski are with Boston University, Boston, MA02215 USA.

Digital Object Identifier 10.1109/TNET.2008.2010520

around the world for the purpose of storage, processing, andanalysis [2].

Needs for large file transfers are not confined to grid ap-plications. For instance, many corporations rely on distributedstorage area networks (SANs) to seamlessly perform variousinformation management functions such as data backup, mir-roring, and recovery [3]. To implement the above functions, dis-tributed SANs must support quick and reliable transfer of verylarge files between remote sites.

In order to meet the throughput and delay requirement of theabove applications, one must be able to make full use of the net-work backbone resources. Yet, recent experiments have shownthat the standard TCP/IP protocol stack may be inadequate forthis purpose. Indeed, it has been observed that in ultra high-speed networks, there is a large gap between the capacity of net-work links and the maximum end-to-end throughput achievedby TCP [4]. The major cause of this discrepancy is the sharednature of TCP/IP where e-mails and WWW traffic interfere withlarge file transfers [5].

As a result of the existing limitations of TCP/IP, significantefforts have recently been devoted to develop an alternative pro-tocol stack based on the concept of advance channel reservation[5]–[11] that is specifically tailored for large file transfers andother high-throughput applications. This protocol stack is notintended to replace TCP/IP, but rather complement it. The mostimportant property of advance channel reservation is to offerhosts and users the ability to reserve in advance dedicated chan-nels (paths) to connect their resources.

Advance channel reservation protocols are run directly on topof Layer 2 (SONET or Gigabit Ethernet) and thus bypass IP.They make it possible for users to send requests and specifyminimum bandwidth and duration requirements for their con-nection. Several testbeds, such as UltraScience Net [5] and OS-CARS [11], are testing possible implementations of such proto-cols.

At a first glance, advance channel reservation architecturesmay appear similar to standard circuit switching architectures.One of the major contributions in this paper is to show that thesetwo types of architecture do actually differ in some fundamentalways and that advance channel reservation architectures offera great deal of flexibility that can be exploited to significantlyimprove performance.

First, advance reservation architectures allow to schedule thestarting time of a connection so that general, multicriteria opti-mization objectives can be satisfied. We can thus perform pathgrading—that is, assign grades to paths according to a certainoptimization objective and then select the highest graded path.Examples of possible grading, as explained in the sequel, in-clude selecting the earliest path, shortest path, widest path, or a

1063-6692/$26.00 © 2009 IEEE

COHEN et al.: PATH SWITCHING AND GRADING ALGORITHMS FOR ADVANCE CHANNEL RESERVATION ARCHITECTURES 1685

combination of those, satisfying certain bandwidth and durationrequirements.

Second, advance reservation architectures do not restrict aconnection to use the same path over its entire duration. Thus,they make it possible to implement path switching, where a con-nection can switch between different physical paths throughoutits lifetime. As shown in the sequel, path switching can sub-stantially reduce the average delay until connection establish-ment as well as the blocking probability of requests. We notehere that all the existing advance reservation protocols and al-gorithms proposed so far in the literature (cf. Section II) allocatethe same path for the entire connection duration, similar to cir-cuit switching.

Guided by the above observations, we introduce in this papera new algorithmic framework for advance channel reservationcalled graded channel reservation that implements thepath-grading and path-switching design principles. enablesgrading paths so that the path with the highest grade is selected.For general optimization criteria, we prove that has a com-putational complexity that is polynomial in the size of the graphand the maximum number of pending requests at any time.

We extend so as to support path switching. Our gener-alized framework, called , satisfies the same prop-erties as , but also allows a connection to switch betweendifferent paths throughout its duration. Furthermore, we pro-pose a variant called that provably performs theminimum number of path switches needed while returning thehighest graded paths.

It should be mentioned that is assumed to be run in acentralized environment. This is a reasonable scenario in smallnetworks, which reflect the operations of current testbeds. Thealgorithms presented here can also be applied in a distributedmanner using a link-state mechanism in conjunction with anappropriate signalling protocol (cf. Section II).

Our simulations, run for various topologies and traffic param-eters, demonstrate the importance of judicious path grading. Forinstance, when the main grading criterion is finding the earliestpath available, then the use of a secondary criterion based onthe selection of the shortest path (when several earliest pathsare available) significantly improves performance.

Our simulations also reveal that path switching leads to majorperformance gain. In some cases, it reduces the average delay byup to an order of magnitude and increases maximum sustainableload by as much as 50%. Significant performance improvementis observed even if (for practical reasons) a limit is imposed onthe number of switching permitted throughout the duration of aconnection.

The rest of this paper is organized as follows. In Section II,we review related work on advance channel reservation. InSection III, we introduce the algorithmic framework, proveits main properties, and explain how traffic engineering mech-anisms, such as trunk reservation, can be used in conjunction.In Section IV, we introduce the concept of path switchingand analytically illustrate its benefit for a small network. Wethen show how path switching can be integrated into andintroduce the variant that minimizes the numberof path switches. In Section V, we present simulation resultsevaluating the performance of our algorithms under various

network topologies and traffic parameters. We conclude thepaper in Section VI.

II. RELATED WORK

The topic of advance resource reservation has received con-siderable attention in the literature, a great portion of whichconcentrates on the design of distributed signalling protocols[12]–[14]. For instance, [13] discusses possible modification toRSVP to support advance channel reservation.

Several papers have considered the problem of joint routingand scheduling of file transfers. In [16], resource reservationstrategies are analyzed for specific topologies (stars, trees, andtrees of rings). In [6], a scheduling algorithm is introduced forlarge file transfers over LambdaGrids for paths with varyingbandwidth. In [17], the problem of offline scheduling androuting of file transfers from several users, each storing mul-tiple files, to a single receiver node is analyzed. [7] considersa similar model, but also proposes algorithms to address theproblem of rescheduling connections that have not completed.[18] and [19] propose various load balancing approaches toallocate lightpaths.

As in our paper, [20] investigates formal approaches toadvance reservation and provides theoretical analysis of thecomplexity of path selection. The proposed algorithms in [20]for advance channel reservation resemble our framework inthe sense that they segment time and keep track of future linkresidual bandwidths. However, their segmentation assumesthat the time axis is discretized, thus leading to performanceloss. Our model does not have this limitation. In addition, noperformance analysis or simulation results are provided in[20], and path switching and multicriteria optimizations are notconsidered.

The most relevant work on advance resource reservation isthe algorithm currently implemented on the UltraScience Net,referred to as ALL-SLOTS [5]. To find a path, ALL-SLOTS im-plements a variant of the Floyd-Warshall based on a union/inter-section algebra instead of the standard min/ algebra. Becauseof the union operation, ALL-SLOTS needs sometimes to dis-card overlapping intervals. As a consequence, there is no guar-antee of finding a path with a desired property, e.g., the shortest,earliest, or widest. Moreover, the returned paths are kept fixedduring the entire connection; i.e., there is no path switching.

III. MODEL AND ALGORITHMS

A. Notation and Model

Our model consists of a general, directed network topologydenoted by , where is a set of nodes and isa set of links. Each request can be expressed by a tuple

, where is the source node,is the destination node, is the requested bandwidth that isheld fixed during the connection, is the requested commu-nication duration, and specifies a time window duringwhich the user wants the transmission to start. The parameters

and are optional, and if omitted, they can be interpretedas the arrival time of the request, denoted by , and as ,accordingly.


Fig. 1. Segmentation of time axis into slots delineated by events. During eachslot, the state of each link in the network is fixed.

The reply to a request is a tuple (or, as we will ex-plain later, a vector of tuples when path switching isallowed), where is the transmission starting time satisfying

, and is a path from to containing only linkswith residual bandwidth of at least . In the case that a requestcannot be served (which may happen if is greater than theminimum link capacity of every possible path between andor if no path can be established during the interval ), thenthe algorithm returns .

B. Basic Algorithm

We now introduce our algorithmic framework, called gradedchannel reservation . returns a time slot that can ac-commodate a connection path according to a certain optimiza-tion objective in response to a request. The path can be gradedaccording to any property of interest, such as connection starttime, path length,1 path width,2 or a combination of these. Inthis section, we illustrate the operation of this framework forthe case where the earliest completion time is desired. Thus, wefocus on finding the path allowing earliest task completion, inconjunction with other criteria, when more than one such pathexists. We also discuss how the algorithm can be easily modi-fied to handle different grading criteria.

To simplify exposition, we present the operation of forthe case where the graph is undirected. However, ourresults (as well as simulations in Section V) apply to directedgraphs. uses the following procedure: It divides the timeaxis into slots delineated by events, as shown in Fig. 1. Eachevent corresponds to a set up or tear down instance of a connec-tion. Therefore, during each time slot, the state of all links inthe network remains unchanged. In general, the time axis willconsist of time slots, where is a variable and slotcorresponds to time interval . Note that and

. We denote by the ordered listof events. Every time a request arrives, we update by setting

and discarding all elements .Let be the vector of available

bandwidths of all links at time slot , where ,and denotes the available bandwidth of link duringslot , with . We then define the bandwidth list

to be the ordered list of vectors of avail-able bandwidths at each time slot . Upon the arrivalof a request, we compute the largest value of , denoted by ,such that , i.e., .We then update by removing all the terms for whichis less than .

1Path length refers to the number of hops between source and destination orpossibly the total length of a weighted path.

2Path width is defined as the minimum available link bandwidth among alllinks in the path.

Fig. 2. With path switching, a connection lasting � � � h between nodesB and C can be established starting from slot 1. Without path switching, theconnection can be established starting from slot 3 only.

Suppose a user sends a request tuple . Wedefine as the remainder of after we omit all terms , suchthat or . If or are not already included in ,they should be appended to the beginning and to the end of listcorrespondingly. Similarly, we derive from by removingthe terms for which or , where is the largestvalue of such that .

This notation is illustrated in Fig. 2 for a four-node clique,where for each time slot, only the links with sufficient residualbandwidth, i.e., bandwidth greater than or equal to , areshown. The figure shows the duration of each slot; e.g., slot1 lasts from time 1:00 p.m. to time 2:30 p.m., slot 2 lastsfrom time 2:30 p.m. to time 3:00 p.m., etc. In this case,1:00, 2:30, 3:00, 5:00, 6:00, . Now, suppose 1:30 and

7:00; then, 1:30, 2:30, 3:00, 5:00, 6:00, 7:00 . Notethat in this example.

We now describe and its functions. The pseudocode ofthe main routine is as follows.

Algorithm :

If

.

.

.

Else,

.

The function returns the time, denoted by ,of the highest graded time slot in that can accommodate therequest . If no such slot is available it returns

, which means that no slot was found, and the requestis rejected. We will present an implementation ofwhere the highest graded time slot is defined as the earliest slot


in which a connection can be established. However, beforehand,we describe the other functions of .

If result is not , then calls thefunction . This function returns a path betweensource and destination starting at time . The pathis selected according to secondary optimization criteria, e.g.,shortest path, widest path, narrowest path, or a combination ofthese. Combination of path properties, such as shortest-widestpath, means priority is given to the shortest paths, but if mul-tiple shortest paths are found, we return the widest among those,whereas in the simple shortest path search, if multiple shortestpaths exist, then one is just picked at random.

The function is used to update and after a re-quest is allocated as follows. If the end time of the connection,

, is not already included in , then adds theevent to in the right position so as to maintain the increasingorder of elements of . After updating , it also updates bysubtracting the allocated bandwidth from the available band-width of all links included in path for all the time slots con-tained in the interval . If or were not included inpreviously, new elements should be added to accordingly.

We now describe the function. The pseudocodecan be presented as follows.

Function :

For in do:

.

.

If ,

.

Else,

.

can be explained by the following steps:1) For each slot in , calculate a by calling

a function . Specifically, the functionis used to give a grade to a

route starting at time .2) Find the maximum grade, call it ; i.e., the of the

route starting at time is maximum.3) If , it is possible to establish a path starting from

some slot in . Return the starting time .4) Else, no path is found for the connection duration, and no

connection can be started in between and . There-fore, the request will be rejected by returning .

Function grades slots. The grade of a slotmay be either a scalar or a vector, in which case comparison isconducted in a lexicographic order. When the grade is a scalar,only slots in which a connection can start have a grade greaterthan 1. The implementation of depends on thespecific optimization criterion. We next consider the case wherethe goal is to find the earliest time slot at which a connection canbe set up, using a scalar grade. The pseudocode is as follows.

Function :

while do

.

.

.

The operator stands for intersection between graphs, theresult of which is a subgraph that contains only links belongingto all the graphs. can be detailed as follows:

1) For each slot , construct a graph by removingfrom all the links with residual bandwidth less than .

2) Find an intersection of graphs for the smallest, such that the requested duration is satisfied—that is,

—and denote it by . Thus, each link inhas residual bandwidth greater than or equal to for all

the time slots from slot to slot .3) Perform a breadth first search (BFS) path discovery from

to on the graph using function .4) If one or more paths exist, function returns 1.5) Else, function returns 0.6) Grade for each slot is defined as

. Adding the exponential term results in a betterscore for earlier time slots when the score is a single realnumber. The exponential function can be replaced by anypositive, strictly decreasing function smaller than 1.

When is implemented as above, then theprocedure satisfies the following property.

Theorem 1: SlotSearch always returns the earliest time atwhich a path satisfying the requested bandwidth and connec-tion length can be established between nodes and .

Proof: We prove by contradiction, let denote the startingtime slot returned by . Suppose the intersection be-tween graphs , with , for and

contains a path between the source and destination. Since, the grade will be smaller than

corresponding to , which contradicts our assumption that isreturned by . Also, if the intersection between therelevant graphs, , contains no path between thesource and destination, then necessarily there is no possibilityto find a path satisfying the constraints starting at any time inthe interval . Therefore, we are sure that no path existsstarting at a time earlier than .

We note that if , then it is guaranteed thatwill always find a path (assuming that the

requested bandwidth does not exceed the link capacities).This is because all the links in the network are available in fullcapacity in the last time slot (slot ) and its length is infinite.

The following theorem states that has polynomial-timecomplexity. Specifically, denote by the maximum number ofpending requests at any time and the worst-case computa-tional complexity of the path search; then, we have the followingtheorem.


Theorem 2: has a computational complexity of.

Proof: Every new job starts with an existing event (or at). Therefore, it only adds at most one new future event

(at its end, unless it coincides with an existing event). Conse-quently, the number of future events is bounded by the numberof pending (or unfinished) jobs, .

Every execution of requires finding theintersection of at most different graphs, each having edgesrequiring at most operations, and then performs a BFSsearch, requiring another operations.is called at most times by , and then

is called, requiring another operations,leading to the result in the theorem statement.

As an example, consider the case where the search criterionof is the shortest path. Using a BFS procedure,computing the shortest path requires at most opera-tions. Thus, the computational complexity of in this case isjust . In fact, if the criterion is only based on findingthe earliest time slot available, and then performing a search forthe optimal path, the optimization presented in [20] can lead toa time complexity of by storing for each edge thenext time its capacity drops below the required bandwidth.

C. Other Grading Criteria

The proposed algorithmic framework can accommodateother grading policies. For instance, consider the selection ofthe shortest-earliest path from to . To this end, we denote thelength of the shortest path between nodes and by . Ifno path is found between and , then . We canthen define the grade at slot , as ,where is a constant used for assigning a higher weight topath length than connection start time. We note that this gradeis maximum for the shortest path between and , and if mul-tiple such paths are available, then the earliest one is returnedbecause of the exponential term. In that case, willreturn the slot such that . If ,then it will return .

D. Trunk Reservation

The literature is rich of mathematical theories to handlerouting and capacity allocation in circuit-switched networks(sometimes referred to also as loss networks) [21], [22]. A majorinsight from this literature is that greedy (myopic) policies maybe detrimental from a network’s point of view. Thus, one maywish to prioritize use of “efficient” routes, such as shortest-pathroutes. Trunk reservation is a well-known control mechanismto achieve such prioritization in a distributed fashion [23]. Foreach link with capacity , trunk reservation dedicates acertain fraction of the link capacity to the exclusive ofhigher priority paths, where .

Although advance channel reservation and circuit-switchednetworks differ in several aspects, it is reasonable to expect thatgreedy policies returning the earliest available path (which maynot be the shortest) could lead to inefficiencies. By tuning thethreshold parameter , trunk reservation allows to balancebetween the two extremes of earliest-shortest and shortest-ear-liest path optimizations.

Trunk reservation can easily be integrated into . Foreach source–destination pair , we first determine all theshortest-path links (i.e., links belonging to a shortest pathbetween and ). Then, whenever a connection reservationrequest between nodes and arrives, the available bandwidthon a shortest-path link during slot corresponds to the unusedbandwidth of that link (as in ). However, if link is anon-shortest-path link, then the available bandwidth on that linkis restricted to . Note that a link may bea shortest-path link with respect to a certain source–destinationpair but not to another.

This modified procedure, referred to as , possesses thesame properties as . Hence, it returns the earliest slot avail-able to establish a connection between nodes and under theconstraints of trunk reservation. Similarly, all the results pre-sented in the next section for path switching extend to the casewhere trunk reservation is implemented.

IV. PATH SWITCHING

A. Motivation

The algorithm described in the previous section returns asingle path for the entire connection duration. We observe,however, that it is possible to satisfy a request even if differentpaths are used at different times. We refer to this approach aspath switching. By relaxing the constraint of using the samepath over all time slots, significant performance improvementin terms of network utilization can be achieved.

Fig. 2 illustrates the benefits of path switching for a clique offour nodes. Suppose we are interested in setting up a connec-tion for h between nodes and . If we do not usepath switching, the earliest time slot to establish a path is slot 3,where the same path (e.g., the direct link between nodes and

) is available during time slots 3, 4, and 5. On the other hand,if path switching is allowed, then the connection can be set ear-lier, namely during slots 1, 2, and 3. In this case, a different pathwould be used at each of the time slots.

B. Algorithm

In this section, we explain how to integrate path switchinginto . The new framework, called , has similarstructure to , so we just describe the main modifications.

Function should be replaced with, which returns the earliest slot at

which a connection between a source and a destinationcan be set up. Note that the returned connection does notnecessarily use the same path over its entire duration. The mainmodification in is in calling a function

, which is detailed below.If does not return , then

calls , which finds aset of paths between the source and the destinationfor the connection duration starting at time . Specif-ically, returns a vector of tuples

, where is the totalnumber of path switches, are path switch in-stances within the interval , and denotes the selectedpath starting at time for . At each time slot, a


path is selected according to the desired secondary optimizationcriteria.

Finally, calls the function thatupdates and after a request is allocated. The list is up-dated the same way as done by the function . Regarding

, for each intervalsubtracts the allocated bandwidth from the residual band-width of all links included in the path .

We now explain . As before, onlyslots at which a connection can be initiated are given gradesgreater than 1, and a negative exponential function is added toassign higher grades to earlier slots.can be represented with the following pseudocode.

Function :

while do

.

.

.

The operator is a logical AND between outcomes of thefunctions The following steps explain how this function works:

1) For each slot , construct a graph by removingfrom all the links with residual bandwidth less than .

2) Find a minimum number such that the requested durationis satisfied, i.e., .

3) Perform a BFS path discovery from to on each graphusing function , where .

4) If there is a path (not necessarily the same) between thesource and destination in each time slot from slot to slot, then the variable is set to 1.

5) Else, the variable is set to 0.6) An exponential term is added to to assign

higher score to earlier slots.For the above implementation of , the following

property can be proved.Theorem 3: When path switching is permitted,

always returns the earliest available timeslot that can accommodate a connection between nodes and ,satisfying the requested bandwidth and connection length .

Proof: For any starting timewill fail (return a result less than 1) only if there is a time slotwithin where no appropriate path exists. Otherwise, itwill return a true result with grade decreasing with the startingtime.

The computational complexity of as presented is, but it can be further decreased to

by keeping in storage the next failure time similar to thenonswitching case.

C. Minimum Path Switching

From a practical perspective, a possible drawback of pathswitching is the need to perform many path establishments and

releases throughout the life of a connection. To address thisissue, we introduce a strategy, called minimum switching, thatguarantees that the number of switches performed during a con-nection is minimized while returning the highest graded timeslot to start the connection.

Our implementation, called , uses exactly thesame functions as , except for a new function called

that replaces . Thisfunction intersects the graphs for consecutive time slots (donein step 4b below) and checks whether there continuously existsa path over all these time slots (done at the beginning of step4). The algorithm selects a path that is available during themaximum possible number of consecutive time slots (if mul-tiple paths are available, then it selects one according to sec-ondary optimization criteria). At the moment a path used pre-viously becomes unavailable, the connection switches to a newpath (step 5).

The function uses a list, called Paths,to keep track of paths used during the connection. Every timethe algorithm decides that the connection must switch paths,the previous path together with its starting time are appendedto the list. A detailed description of is asfollows:

1) Initialize and Paths with andset as the slot corresponding to the start of the connection,i.e., .

2) Set , where is the graph obtained by removingall links with insufficient residual bandwidth for the requestat slot .

3) If , append to Paths and exit.4) If ,

a) set to be the highest graded path between andin the graph (the existence of at least one path inis guaranteed by the success of ).

b) Set , and return to step 3.5) Else, , append new path to

Paths, set , and return to step 3.We illustrate the behavior of for the configu-

ration shown in Fig. 2. As before, let’s consider a request ar-riving at 1:00 p.m. demanding a connection of h be-tween nodes and . As in , the connection starttime is 1:00 p.m. (slot 1). The intersection between theresidual graphs corresponding to slot 1, , and to slot 2, ,does not contain any path between to . Therefore, the path

is selected during slot 1 and is switched at the begin-ning of slot 2. Intersecting graphs and results into a singlepath during time slots 2 and 3, which the connec-tion can use until its completion. We note here thatwould have switched to the shorter path during slot 3.This example illustrates a tradeoff between optimizing path se-lection and minimizing the number of path switches.

The next theorem proves that indeed minimizesthe number of path switches.

Theorem 4: returns the earliest available timeslot to establish a connection. Furthermore, the returned con-nection makes a minimum number of path switches.

Proof: The fact that returns the earliest avail-able time slot follows immediately from Theorem 3 since it


uses the same function to find the first slotavailable.

Next, we prove by contradiction the minimality of pathswitches. We use the notation to denote theswitching times of the connections, except for andthat respectively represent the start and termination time ofthe connection. Assume that there exists another sequence oftimes , where , and , i.e., fewerpath switches. Since , at least one of the switches in theprimed sequence is conducted later than its equivalent in thereturned sequence. Therefore, there must be some first slot,

, such that and . Since a single pathexists between times and , it must also exist betweenthe times and (since ). However, the function

should have used this path for as longas it exists and should have returned time instead of , thusleading to a contradiction.

We conclude this section, by describing another heuristic fordecreasing the number of switches, called limited switching. Ac-cording to this heuristic, a connection is allowed to switch to abetter path as long as the number of switches does not exceeda certain predefined threshold. To implement this approach, wehave developed an algorithm, called , that limits thenumber of switches per connection to at most .can be considered as a mixture of and . It op-erates as follows. It starts with a slot in that contains at leastone path between the source and the destination and selects apath according to the desired optimization criterion. In the nextslot, it switches paths if the current path is no longer availableor a better path is found. This procedure continues as long asthe number of path switches does not exceed the limit . After

path switches, sticks to the last path found forthe rest of the connection. In the case that no path is available atone of the time slots, or if after switches the connection cannotcontinue with the current path, then the algorithm starts anothersearch for this connection, starting from the next time slot in thewindow specified by the user.

V. SIMULATION AND PERFORMANCE EVALUATION

A. Performance Measures

We have developed a simulation tool in C code to evaluate theperformance of the proposed algorithms. The main performancemetrics of interest are:

1) Average delay, which corresponds to the average timeelapsing from the point a request is placed until the con-nection actually starts.

2) Blocking probability, which is the probability of rejectionin the case users define a finite-length time window to setup the connection.

3) Maximum sustainable load, which corresponds to themaximum offered load (in terms of requests per unit oftime) that the network can sustain. When the offered loadexceeds the maximum sustainable load, then the averagedelay of requests becomes unbounded.

Fig. 3. Topologies for simulations. (a) 8 node, (b) 14 node LambdaRail.

In terms of network performance, algorithms with lower av-erage delay or blocking probability and higher maximum sus-tainable load are more desirable.

B. Simulation Parameters

The simulator allows evaluating our algorithms under varioustopological settings and traffic conditions. The main simulationparameters are as follows.

• Topology: Our simulator supports arbitrary topologies. Inmost of the simulations, we use the two topologies depictedin Fig. 3, namely, a fully connected graph of eight nodesand a topology that represents a superposition of the DoEUltraScience Net and the National LambdaRail testbeds[7], [24], [26]. Each link on these graphs is full-duplex andassumed to have a capacity of 20 Gb/s.We also report results obtained for random, directedgraphs, each with nodes and edge probability

, where represents the probability that a directededge exists between any given ordered pair of nodes. Thecapacity of each link is 20 Gb/s. Disconnected graphs arediscarded from the simulation results.

• Arrival process: We assume that the aggregated arrival ofrequests to the network forms a Poisson process. Our simu-lations are repeated for different arrival rates, also referredto as network load.

• Connection length: We assume that the requested connec-tion length , with mean , is distributed according to ei-ther of the following distributions:1) Exponential:

where and .2) Pareto:

where and .Without limitation of generality, for both of the abovemodels, the mean connection length is set to one time unitthat is defined as 1 h in our simulations. For the exponen-tial model, we set , and for the Pareto model, we set

and .


• Bandwidth: This parameter corresponds to the requestedbandwidth . We consider two models:1) Uniform: The bandwidth is uniformly distributed

among the integers 1 to 10 Gb/s.2) 80/20: Where 80% of the requests are for 1-Gb/s con-

nections, the remaining 20% are for 10-Gb/s connec-tions. This models the scenarios where most of theusers have access to 1-Gb/s links and some have ac-cess to 10-Gb/s links.

• Source: We again consider two models:1) Uniform: The source is chosen uniformly at random

among all the nodes.2) Hot-spot: One of the nodes (e.g., a host with a super-

computer) is more likely to be a source node than othernodes in the network. In our simulations, we assumethat the hot-spot node has a probability 50% to be se-lected. Otherwise, one of the other nodes is selecteduniformly at random.

• Destination: This is denoted by and is selected uniformlyat random among all the nodes (excluding the source).

• Blocking window: Users may specify a time windowduring which the connection should start. Therefore, if aconnection cannot be started during this window, it willbe rejected. A window is specified by the time interval

, where is the window start time that is set to(the arrival time of a request), and is the

window end time. We use , where repre-sents the window length.

All of the simulations are run for a total of at least requestsfor different values of network load. We note that it is difficultto determine the exact value of the maximum sustainable loadusing simulation. Thus, we define maximum sustainable loadas the network load at which the average delay starts exceeding24 h. Since the average delay curve increases very sharply withnetwork load around that value, we conjecture that the real valueof the maximum sustainable load is very close.

C. Simulation Results

We present simulation results illustrating the benefits of pathswitching, path grading, and trunk reservation.

1) Path Switching: In this section and the following, weassume that ; i.e., there is no blocking. Fig. 4 de-picts the average delay versus network load for (no switch)and (unlimited switching) for the eight-node cliquetopology. In both cases, path selection is based on the earliest-shortest optimization criterion.

It is apparent from the figure that switching improves boththe delay and the maximum sustainable load significantly. Forinstance, for the exponential model, the maximum sustainableload exceeds 150 requests/h with path switching, while it isslightly above 100 requests/h without path switching. Thus, pathswitching leads roughly to a 50% increase in the maximum net-work utilization achievable.

For the same topology and traffic parameters, Fig. 5 shows theperformance of various heuristics aimed at limiting the numberof path switches. The figure indicates that even if the number ofswitches is limited to a maximum of three, two, or one per con-nection, significant performance improvement can be achieved.

Fig. 4. Performance of �� with and without path switching for the eight-nodeclique topology. Distributions of source, destination, and requested bandwidthare uniform. Pareto and exponential distributions for connection length are con-sidered.

Fig. 5. Performance of �� with several path switching alternatives for theeight-node clique topology, i.e., ��

�� , and �� . Connection length is exponentially distributedwith mean 1 h. Source, destination, and requested bandwidth are distributeduniformly.

In the latter case, the maximum sustainable load exceeds 120 re-quests/h, about a 20% improvement compared to the case whereswitching is disabled. On the other hand, the minimum switchheuristic does not perform better than no switching at all. Theprobable reason is that minimum switching uses nonoptimalpaths that end up degrading performance (as shown by the ex-ample depicted in Fig. 2).

Fig. 6 compares the performance of with and withoutpath switching, for the LambdaRail topology. The results showthat the gain in terms of maximum sustainable load is not as sig-nificant as for the eight-node clique topology. The main reasonis that the LambdaRail topology is less dense (i.e., the graphhas fewer links). Thus, there are fewer alternative paths betweeneach source and destination that can be used for path switching.


Fig. 6. Performance of �� with and without path switching for the Lamb-daRail topology. Pareto and exponentially distributed connection lengths withmean 1 h are compared. Source, destination, and requested bandwidth are dis-tributed uniformly.

Fig. 7. Performance of ��

�� , and �� for the LambdaRail topology, with uniformsource and destination and 80/20 requested bandwidth distribution. Averagedelay decreases in the same order as the algorithms listed here. Connectionlength is exponentially distributed.

That being said, path switching still improves performance in anonnegligible fashion.

We have evaluated the performance of and its variationsfor other parameter settings as well. For instance, Fig. 7 showsthe performance of the different path switching heuristics whenthe requested bandwidth is distributed according to the 80/20model for the LambdaRail topology. Fig. 8 shows results for thehot-spot model. The results obtained are qualitatively similarto those earlier, wherein and alwaysimprove performance.

2) Multicriteria Path Optimization: The implementationof (and its variants) we used for the simulationsreturns the earliest available path. In addition, when severalearliest paths are available, allows performing optimization

Fig. 8. Performance of ��

��, and �� for the LambdaRail topology, for the hot-spotmodel (node 6 is the hot-spot), and uniform requested bandwidth distribution.Connection length is exponentially distributed.

of the path selection. Figs. 9 and 10 illustrate the importanceof such optimization for the eight-node and LambdaRailtopologies, respectively. Source, destination, and bandwidthare distributed uniformly. The figures show the performanceof using four types of path optimization. In thefirst three, if multiple earliest paths are found, the shortestone is selected. If several shortest paths are available, then theshortest-narrowest heuristic chooses the narrowest path amongthose, the shortest-random (or simply shortest) chooses one ofthe paths at random, and shortest-widest chooses the widest.3

Widest-shortest heuristic first selects the widest path among allthe earliest paths available. If multiple earliest-widest paths arefound, the shortest among those is selected.

From both figures, it is clear that the first three heuristics sig-nificantly outperform the fourth one; that is, selecting one ofthe shortest among all the earliest paths is a better strategy thanselecting one of the widest. The figures also show that a fur-ther optimization is not as essential; that is, choosing an ear-liest-shortest path at random is approximately as good as theother heuristics. Our results are consistent with those reported inthe QoS routing literature (see, e.g., [26]), where shortest-widestrouting is shown to outperform other routing strategies. Note,however, that our setting is different since standard QoS routingalgorithms do not support advance reservation of network re-sources.

3) Blocking Probability: We evaluate the blocking proba-bility for the case where users specify a finite-length window

, which ranges from 0 to 24 h. We estimate blocking prob-ability using simulation by computing the fraction of rejectedrequests.

Fig. 11 shows results for the LambdaRail topology at a fixedinput load and different choices of blocking window . Weobserve that switching reduces the blocking probability for allvalues of . Fig. 12 compares the blocking probability of

3Widest and narrowest refer to the path with the largest or smallest path band-width, respectively.


Fig. 9. Performance of �� with various multicriteria path op-timization in the eight-node clique topology, namely: widest-shortest,shortest-random, shortest-widest, and shortest-narrowest path optimizations.Source, destination, and bandwidth are uniformly distributed. Connectionlength is exponentially distributed.

Fig. 10. Performance of �� with various multicriteria path optimiza-tion in the LambdaRail topology, namely: widest-shortest, shortest-random,shortest-widest, and shortest-narrowest path optimizations. Source, destination,and bandwidth have uniform distribution. Connection length is exponentiallydistributed.

and for different loads and a fixed window lengthh. Again, switching improves performance at all ex-

amined values of network load. Similar results are reported inFig. 13 for random graphs, showing that our findings extend toother topologies.

4) Trunk Reservation: We next examine the benefits oftrunk reservation on the performance of , and

. We consider the eight-node topology wheresource, destination, and bandwidth demand are distributeduniformly. We set for each link , i.e., 4 Gb/s ofeach link’s capacity is reserved for shortest-path routes. In ourcase, there is only one shortest path route traversing each link,

Fig. 11. Performance of �� compared to �� in terms of blockingprobability for the LambdaRail topology. Connection window start time is sameas request arrival time. Blocking probability is measured for several differentwindow lengths 0, 1, 2, 3, 6, 12, and 24 h. Average network load is held fixed at35 requests/h, and distribution of source, destination, and bandwidth is uniform.Connection length follows the Pareto distribution.

Fig. 12. Performance of �� compared to �� in terms of blockingprobability for the LambdaRail topology with fixed connection window of2 h. Distribution of source, destination, and bandwidth is uniform. Connectionlength follows the Pareto distribution.

namely the one-hop route between the two nodes connected toeach side of the link.

The results in Fig. 14 show that in this scenario, trunk reser-vation leads to significant increase in the maximum sustainableload, on the order of 30% to 40% for and .The performance improvement for is also signifi-cant but less marked, probably because already per-forms well without trunk reservation.

It should be emphasized that trunk reservation requires propertuning of the threshold parameter . Consider, for instance,the same eight-node clique topology as before, but with onlyone source–destination pair generating traffic. In that case, it iseasy to see that one must set for all links (i.e., notrunk reservation) to achieve best performance.


Fig. 13. Performance of �� compared to �� in terms of blockingprobability for 15-node random graph topologies with edge probability � � ��.Distribution of source, destination, and bandwidth is uniform and fixed connec-tion window � � � h. Connection length follows the exponential distribution.Each point on the curves represents an average over 10 random graphs withidentical parameters.

Fig. 14. Average delay performance of �� withand without trunk reservation in the eight-node clique topology. Source, desti-nation, and bandwidth are uniformly distributed. Connection length is exponen-tially distributed. For trunk reservation, � �� for all links � .

Fig. 15 displays the blocking probabilities of andwith and without trunk reservation for random

graphs, under the same settings as Fig. 13. We setfor each link . Interestingly, the performance of im-proves when using trunk reservation, while that ofdegrades. This result shows that the choice of the threshold pa-rameter should not only be based on the topology and the trafficdemand, but also on the specific advance channel reservationalgorithm being used.

VI. CONCLUSION AND FUTURE WORK

In this work, we developed a novel algorithmic frameworkfor advance reservation based on the principles of path gradingand path switching, called . We explained how this frame-work can be used to find and grade paths according to a desiredoptimization criterion. If the optimization criterion is delay, weproved that returns the earliest time available to start the

Fig. 15. Comparison of blocking probabilities of ��,�� and�� for random graphs under the same settings as Fig. 13. For trunkreservation, � �� for all links � .

requested connection. Furthermore, we proved that the com-plexity of is polynomial in the size of the graph and thenumber of pending requests.

Thereafter, we presented a generalization of , called, which is capable of supporting path switching

throughout a connection. We showed that this algorithm re-tains the same properties as ; that is, it returns the earliestavailable time slot and its complexity is polynomial. Con-sidering practical issues of switching, we designed a variantcalled that provably minimizes the number ofpath switches needed during a connection and another variant,called , that limits the number of path switches toat most switches per connection.

Our simulation results, run for various topologies and trafficparameters, showed that can significantly improveperformance with respect to in terms of maximum sus-tainable load, average delay, and blocking probability. We ob-served that the greatest gain is achieved when the topology isdense, that is, when there are many alternatives for switchingpaths. The heuristic, while appealing from a the-oretical perspective, did not perform much better than . Thelikely reason is that paths returned by are subop-timal (i.e., not necessarily the shortest ones) and, thus, may con-sume considerable network resources. On the other hand, the

class of algorithms performed better than , evenwhen a single switch between paths is allowed during a connec-tion.

A profound insight of our results is that greedy routing poli-cies returning the earliest available time slot may be inefficient,especially when path switching is disabled or minimized, asin and . As such, we showed that nongreedypolicies based on trunk reservation, as , or limited pathswitching, as , may yield significant performanceimprovement. The main advantage of overis that it is not as sensitive to parameter tuning and appears tooutperform for any topology and traffic pattern, irrespectiveof the value of .

Another important and novel aspect of our algorithmicframework is to enable multicriteria path optimization. Our


simulations of showed that a secondary opti-mization in conjunction with the earliest path selection isbeneficial—i.e., choosing earliest-shortest paths is better thanother heuristics—but we observed that further optimizations,like earliest-shortest-widest path, is not essential.

We expect the findings of this work to open several new inter-esting lines of research. For instance, it would be interesting toassess and integrate the cost of path switching into the gradingfunction of and investigate the possible resulting tradeoffs.Initial experiments reported for the Lambda Station testbed [27]show that the impact of path switching on throughput is rela-tively limited. Statistical analysis of the gains achievable withpath switching is another area open for further work. Initial re-sults for a simple topology can be found in [28].

ACKNOWLEDGMENT

The authors would like to thank Dr. N. Rao for fruitful dis-cussions.

REFERENCES

[1] “GlobalGridForum” [Online]. Available: http://www.gridforum.org/[2] “Large Hadron Collider (LHC) project,” [Online]. Available: http://

lhc.web.cern.ch/lhc/[3] T. Jepsen, “The basics of reliable distributed storage networks,” IT Pro-

fessional, vol. 6, no. 3, pp. 18–24, May/Jun. 2004.[4] , N. S. Rao and W. R. Wing, Eds., “Network provisioning and protocols

for DOE large-science applications,” in Provisioning for Large-ScaleScience Applications. New-York, Argonne, IL: Springer, Apr. 2003.

[5] N. Rao, W. Wing, S. Carter, and Q. Wu, “UltraScience net: Networktestbed for large-scale science applications,” IEEE Commun. Mag., vol.43, no. 11, pp. S12–S17, Nov. 2005.

[6] H. Lee, M. Veeraraghavan, H. Li, and E. K. P. Chong, “Lambda sched-uling algorithm for file transfers on high-speed optical circuit,” in Proc.IEEE Int. Symp. CCGrid, Chicago, IL, Apr. 2004, pp. 617–624.

[7] A. Banerjee, W.-C. Feng, B. Mukherjee, and D. Ghosal, “Routing andscheduling large file transfers over Lambda grids,” presented at the 3rdInt. Workshop on PFLDnet, Lyon, France, Feb. 2005.

[8] “OptIPuter” [Online]. Available: http://www.optiputer.net/[9] R. Cohen, N. Fazlollahi, and D. Starobinski, “Graded channel reserva-

tion with path switching in ultra high capacity networks,” presented atthe ICST/IEEE Gridnets San Jose, CA, Oct. 2006.

[10] N. Fazlollahi, R. Cohen, and D. Starobinski, “On the capacity limits ofadvanced channel reservation architectures,” in Proc. IEEE INFOCOMHSN Workshop, Anchorage, AK, May 2007, pp. 60–62.

[11] C. Guok, D. Robertson, M. Thompson, J. Lee, B. Tierney, and W. John-ston, “Intra and interdomain circuit provisioning using the OSCARSreservation system,” presented at the ICST/IEEE Gridnets, San Jose,CA, Oct. 2006.

[12] S. Norden and J. Turner, “DRES: Network resource management usingdeferred reservations,” in Proc. IEEE Globecom, Nov. 2001, vol. 4, pp.2299–2303.

[13] A. Schill, S. Kuhn, and F. Breiter, “Resource reservation in advancein heterogeneous networks with partial ATM infrastructures,” in Proc.IEEE INFOCOM, Kobe, Japan, Apr. 1997, vol. 2, pp. 611–618.

[14] W. Reinhardt, “Advance reservation of network resources for multi-media applications,” in Proc. 2nd IWACACA, Heidelberg, Germany,Sep. 1994, pp. 23–33.

[15] W. Reinhardt, “Advance resource reservation and its impact on reser-vation protocols,” presented at the Broadband Islands, Dublin, Ireland,Sep. 1995.

[16] T. Erlebach, “Call admission control for advance reservation requestswith alternatives,” ETH, Zurich, Switzerland, Tech. Rep. 142, 2002.

[17] A. Banerjee, N. Singhal, J. Zhang, D. Ghosal, C.-N. Chuah, andB. Mukherjee, “A time-path scheduling problem (TPSP) for aggre-gating large data files from distributed databases using an opticalburst-switched network,” in Proc. ICC, Paris, France, 2004, vol. 3, pp.1569–1573.

[18] S. Figueira, N. Kaushik, S. Naiksatam, S. Chiappari, and N. Bhatnagar,“Advance reservation of lightpaths in optical-network based grids,”presented at the ICST/IEEE Gridnets, San Jose, CA, Oct. 2004.

[19] N. Kaushik and S. Figueira, “A dynamically adaptive hybrid algorithmfor scheduling lightpaths in lambda-grids,” in Proc. IEEE/ACM CC-GRID/GAN’05-Workshop on Grid and Adv. Netw., Cardiff, U.K., May2005, vol. 1, pp. 418–425.

[20] R. Guerin and A. Orda, “Networks with advance reservations: Therouting perspective,” in Proc. IEEE INFOCOM, Tel-Aviv, Israel, Mar.2000, pp. 118–127.

[21] F. Kelly, “Loss networks,” Ann. Appl. Probability, vol. 1, no. 3, pp.319–378, 1991.

[22] K. Ross, Multiservice Loss Models for Broadband TelecommunicationNetworks. New York: Springer-Verlag, 1995.

[23] F. P. Kelly, “Routing and capacity allocation in networks with trunkreservation,” Math. Oper. Res., vol. 15, pp. 771–793, 1990.

[24] “National LambdaRail Inc.” [Online]. Available: http://www.nlr.net/[25] “UltraScience Net” [Online]. Available: http://www.csm.ornl.gov/ul-

tranet/topology.html[26] Q. Ma and P. Steenkiste, “On path selection for traffic with bandwidth

guarantees,” in Proc. ICNP’97, Washington, DC, 1997, p. 191.[27] “Lambda Station Path Switching Experiment” [Online]. Available:

http://www.lambdastation.org/path-switching.html[28] R. Cohen, N. Fazlollahi, and D. Starobinski, “Path switching and

grading algorithms for advance channel reservation architectures,”Center for Inf. and Syst. Eng., Boston Univ., Tech. Rep. 2008-IR-0070,2008.

Reuven Cohen received the B.Sc. degree in physicsand computer science and the Ph.D. degree in physicsfrom Bar-Ilan University, Ramat-Gan, Israel.

He was a Post-Doctoral Fellow in the Departmentof Mathematics and Computer Science at the Weiz-mann Institute, Rehovot, Israel; the ECE Departmentat Boston University, Boston, MA; and the Depart-ment of Physics at Massachusetts Institute of Tech-nology, Cambridge. Since October 2007, he has beenat the Department of Mathematics at Bar-Ilan Univer-sity, where he is an Assisant Professor. His research

interests are random graphs, distributed algorithms, and network stability.

Niloofar Fazlollahi received the B.S. degree in elec-trical engineering from the Sharif University of Tech-nology, Tehran, Iran, in 2005.

Since 2005, she has been a Ph.D. student atBoston University, Boston, MA, under the supervi-sion of Prof. David Starobinski. Her current researchinterests are in the modeling and analysis of jointscheduling and routing schemes in ultra high-speednetworks.

David Starobinski (SM’07) received the Ph.D. de-gree in electrical engineering from the Technion—Is-rael Institute of Technology, Haifa, in 1999.

From 1999 to 2000, he was a Post-DoctoralResearcher in the EECS Department at University ofCalifornia, Berkeley. From 2007 to 2008, he was aninvited Professor at EPFL, Lausanne, Switzerland.Since September 2000, he has been at Boston Uni-versity, Boston, MA, where he is now an AssociateProfessor. His research interests are in the modelingand performance evaluation of high-speed, wireless,

and sensor networks.Dr. Starobinski received a CAREER Award from the U.S. National Science

Foundation and an Early Career Principal Investigator (ECPI) award from theU.S. Department of Energy.

path switching and grading algorithms for advance channel reservation architectures

Documents