t,,garetto/conferences/identifying.pdf · where q 1 2p, wo is theminimumwindow size, mn maximum...

entifyn rou u Pa ths in 02

Mesh Networks: a Model-based Approach

Theodoros SalonidisThomson Paris Research Lab

Michele GarettoUniversit'a di Torino, Italy

Amit Saha Edward KnightlyTropos Networks, CA Rice University, Houston, TX

Abstract- We address the problem of identifying high through-put paths in 802.11 wireless mesh networks. We introducean analytical model that accurately captures the 802.11 MACprotocol operation and predicts both throughput and delay ofmulti-hop flows under changing traffic load or routing decisions.The main idea is to characterize each link by the packet lossprobability and by the fraction of busy time sensed by thelink transmitter, and to capture both intra-flow and inter-flowinterference. Our model reveals that the busy time fractionexperienced by a node, a locally measurable quantity, is essentialin finding maximum throughput paths. Furthermore, metrics thatdo not take this quantity into account can yield low throughputby routing over congested paths or by filtering-out non-congestedpaths. Based on our analytical model, we propose a novel routingmetric that can be used to discover high throughput path in acongested network. Using city-wide mesh network topologies wedemonstrate that our model-based metric can achieve significantperformance gains with respect to existing metrics.

I. INTRODUCTION

Mesh networks offer inexpensive wireless coverage overlarge areas via use of wireless multi-hopping to wirelinegateway nodes. Recently, cities are expanding use of meshnetworks from public service and public safety to also in-clude large-scale public broadband wireless access, potentiallyserving millions of users.1 Such deployments will carry highamounts of traffic that will stress the 802.11 mesh backbone,thus causing unfairness and starvation, well known problemsof the 802.11 CSMA protocol. Given this limitation, modelingand understanding 802.11 in conjunction with congestion con-trol, traffic engineering and routing schemes is of paramountimportance.

In this paper, we address the problem of identifying highthroughput paths in an 802.11 mesh network by introducing amodel that accurately predicts throughput and delay of multi-hop flows under fixed or changing traffic conditions. Existingmodels for 802.11 mesh networks focus on predicting through-put of a set of single-hop, single-receiver flows [6], [11], [13],[14]. A recent paper [10] proposes an analytical approach toestimate the end-to-end throughput over a single path, whichis limited to the case of nodes having a single receiver andrequires a rather complex, centralized computation that cannotbe translated into a routing protocol. In contrast to [10], ourmodel applies to arbitrary traffic matrices and yields a routingmetric that can be easily incorporated to an efficient routingprotocol to discover the optimal path. In [15] the authorspropose an admission control schemes for flows in a single-channel, multi-hop network based on knowledge of both local

'See Houston's RFP for example: www.houstontx.gov/it/wirelessrfp.html.

resources at a node and the effect of admitting the new flowon neighboring nodes. In contrast to [15], our approach isbased on a more precise mathematical model of the behaviorof 802.11, and can efficiently discover the path providing thelargest bandwidth to the new flow.

Our model expresses the throughput and delay of each linkof a multi-hop flow as a function of (i) average input rate, (ii)the fraction of busy time carrier-sensed by the link transmitterand (iii) the packet loss probability experienced on the link-all locally measurable with zero or minimal communicationoverhead. To model changing traffic conditions, we introducea two-step technique to estimate available path bandwidth,defined as the maximum additional rate aflow can push beforesaturating its path. The first step computes the capacity of eachlink in the path using the busy time fraction and packet lossprobability as a summary of the interference caused by otherlinks outside the path. In the second step, the link capacitiesare coupled with a clique-based computation that capturesinterference of links within the path. We use the topology of acity-wide mesh network deployed in Chaska, MN to demon-strate the effectiveness of our model in accurately predictingthroughput and delay under existing traffic conditions and inestimating available path bandwidth.

Armed with the ability to estimate available bandwidth overa single path, we use our model to study the ability of routingprotocols' link-cost metrics to discover high throughput paths.Several mesh routing metrics have been proposed that take intoaccount packet loss probability [7], [8], data transmission rates[4], and multi-channel, multi-radio capabilities [9], [16]. Thesemetrics have been demonstrated to find higher throughputpaths than minimum-hop metrics. However, their performancehas never been investigated under congested conditions thatnaturally arise in gateway-centric mesh networks.

To address this question, we use our model and a series ofexperiments that gradually evolve from single-link to full-scalecity-wide topologies. We find that all existing routing metricsare highly sensitive to traffic load and detect congestionthrough the packet loss probability. However, we show thatpacket loss does not always provide accurate informationand can result in low-throughput routing decisions, either byselecting congested paths or by filtering-out non-congestedpaths. Instead, the busy time fraction is an additional factoressential to discover high throughput paths, especially undercongested conditions.

We introduce a new available bandwidth metric that can becombined with a source-route link-state routing protocol todirectly compute the path providing the highest throughput.Our metric takes into account intra-flow and inter-flow inter-

1-4244-1588-8/07/$25.00 ©2007 IEEE 221

ference using fraction of busy time as well as packet loss. Wecompare its performance with existing loss-based metrics inboth the Chaska topology and a regular Manhattan networktopology, under various traffic scenarios. Our experimentsshow that our proposed metric based on direct estimationof available bandwidth can yield high gains, being mostlyeffective in planned mesh networks of regular topology thatprovide several paths through spatial reuse.

The paper is organized as follows: We introduce the analyti-cal model in Section II. In Section III, we evaluate the model'saccuracy and introduce and evaluate the available bandwidthestimation technique. In Section IV, we investigate the abilityof existing routing metrics to discover high throughput paths.In Section V, we compare model-based metrics for availablebandwidth estimation to existing loss-based metrics. SectionVI concludes this paper.

II. ANALYTICAL MODEL

In [11], [12] we introduced a general decoupling techniqueto analyze the behavior of each node in an 802.11 network witharbitrary topology. One important limitation of our previouswork is that we limited ourselves to the case in which eachsource sends traffic to a single neighboring node. Since thiscase allows to analyze only particular traffic patterns, wenow extend the analysis to the general case in which a nodetransmits to multiple neighbors. Notice that we use the term'node' to refer to one network interface, i.e., one instanceof the 802.11 MAC protocol which is shared by all flowspassing through the node. We first review in Section I1-A themodeling framework introduced in [11], [12]. The extensionof the analysis to the case of multiple receivers is describedin Section Il-B.

A. Review of modeling framework

In 802.1 1, the behavior of a node is determined by whatit senses on the channel, i.e., by the occupation of the 'air'around it in the frequency spectrum used. In a wireless meshnetwork, the state of the channel can be perceived differentlyby different nodes, because not all of them are in radio range ofeach other. As a consequence, existing techniques developedin the so called 'single cell' case (usually based on the classicanalysis of [5]) are not applicable, and one has to consider theprivate channel view of each node.

T, T Tb

Fig. 1. Example of the evolution of the channel state perceived by a node

The evolution of the channel state experienced by a node canbe described as a renewal process with four different states, asillustrated in the example of Fig. 1. The 4 states are :(i) idlechannel; (ii) channel occupied by a successful transmission ofthe node; (iii) channel occupied by a collision of the node; (iv)busy channel due to activity of neighboring nodes, detectedby means of either physical or virtual carrier sensing (theNAy). The time intervals during which the station remainsin each of the four states above are denoted by or, T~, T~, andTb, respectively. While or is constant, equal to one backoffslot, the duration of the other intervals can be variable (withgeneral distribution), depending on the access mechanism(basic access or RTS/CTS), the frame size, and the sending

rate of the transmitting station(s). Both T5, T,, and Tb include adeterministic idle slot at the end (see [12] for details). Let FJ1,,1715, IU1' FIb be, respectively, the occurrence probabilities of thefour states described above. To compute these probabilities, weneed to specify the events that can occur after an idle slot haselapsed. Let T be the probability that the backoff counter ofthe node reaches zero after an idle slot; let e be the probabilitythat when the backoff counter reaches zero, the transmissionqueue is empty; let p be the probability that a transmission ofthe station is not successful; at last, let b be the probability thatif the station does not transmit after an idle slot, the channelbecomes busy because of the activity of other nodes. Then wecan express the occurrence probabilities of the four channelstates as follows: rl, T (I1 p)(l e) , r, Tp(I1 e)

~~cT [(1 T)+Te] (1- b) , I-b =[(1 T) +Te] b.

Using standard renewal-reward theory, the throughput of thenode (expressed in packet/s) is given by

T(1 p)(l e)1LsT5 + rIITc + ]II,u + rlbTb (1)

Now, the probability T is a deterministic function of p,which depends only on backoff parameters such as the windowsize, the number of backoff stages, etc. The complete expres-sion Of T for 802.11 that takes into account the maximumretransmission limit jointly with the maximum window size,is given by

2q(l1 pm±l)q(l1 pm±l)+Wo [I1 p -p(2p)m'(1+pm mr'q)

(2)where q 1 2p, Wo is the minimum window size, mn is themaximum retry limit, and mn' is the backoff stage at which thewindow size reaches its maximum value, m' < m.

The average durations T, and T, of a successful trans-mission or of a collision in which the station is involvedcan be computed a priori (see [5]), depending only on thedistributions of packet sizes and data rates. It turns out that theonly unknown variables in Equation (1) are: i) the occurrenceprobability b of a busy period, and its average duration Tb; ii)p, the conditional packet loss probability; iii) e, the conditionalprobability of empty buffer.

The value of e depends on the traffic load of the node. Thevalues of b, Tb and p, depend on the interaction of the nodewith the rest of the network. In [11I] we described an iterativetechnique to compute these quantities, that allows to solve forthe entire network and thus predict analytically the stationarybehavior of each node, including its throughput. In particular,we introduced a methodology to evaluate the fraction of timefB during which the channel is sensed busy, as well as theaverage duration Tb of a busy period. The value of fB isrelated to our model through the following expression

fB~~ + lbTbIT IICTc + rlTc7 + rlbTb (3)

Moreover, we proposed a technique to compute the packet lossprobability p on thie (single) link used by thie node. In thiispaper we are not concerned with the analytical computationOf fB, Tb and p. The interested reader is referred to [11].Indeed, in this work we assume that fB, Tb and p are directlymeasured by each node through a combination of active and

22

IF

passive measurements. For example p could be measured by anode either by sending broadcast probes or by keeping track ofthe number of retries associated with transmitted data packets.

is not empty. We have

T(lP)T5+TpTT+(l T)(1 b)cr+(l TbT

B. Extension to multiple receivers

The extension of the model to the case of a node havingmultiple receivers has to take into account the fact that eachoutgoing link can have a different quality, which depends onthe specific packet loss probability experienced by packetsover that link. Notice that the MAC buffer is shared byall packets passing through the node, according to a FIFOqueueing discipline. This can cause a head-of-line (HOL)blocking effect in case of heterogeneous outgoing links. Inparticular, a single link with poor quality can compromise theperformance of many high quality links.

The main idea behind our modeling approach is to reconductthe analysis of a node having multiple receivers to the caseof a single outgoing link, by defining a properly weightednode average transmission probability T and node packet lossprobability p, and basically reuse equations (1) and (3). At thesame time, we model the dynamics of the shared MAC as avirtual server with given average service capacity.

Let ntLbe the number of links used by the node to forwardits traffic. Each link is associated to a packet loss probabilitypi, 1 < i' < nL, which maps into a per-link transmissionprobability T, by (2). Let Ai be the arrival rate of packetsdestined to link i', and A AZ~i the total arrival rate ofpackets to the node. We introduce the weight wi iA1 < i' < nL, which is equal to the probability that a genericpacket to be served by the MAC has to go over link i'. Wealso introduce the service rate pui of link i', and the aggregateservice rate of the node nuZLiu.We model the MAC buffer using a simple MIMIlIB model,

where B denotes the buffer capacity expressed in packets.The traffic intensity at the queue is p =A/,u. The probabilitythat the queue is empty is w0 =(1 p)/(l pB±l). Wechose this queue model because it provides simple closedform expressions for the finite buffer case, and because weare mostly interested in the performance of links close tosaturation, where the exact details on the arrival and departureprocesses at the queue are not important, and the exponentialassumption provides accurate enough predictions.

To derive the node service rate pu. we put e 0. i.e.. weassume that the queue is not empty. We then compute theaverage number ci of channel intervals (i.e., any of the fourchannel intervals described in Section II-A) required to servea packet belonging to link i'. We have

ci

where Wj is the window size at backoff stage j.Usingquantities ci, we can compute the probability si that a genericchannel slot is part of the service time of a packet belongingtolnki s~ WiCi)/ZinL 1(wjcj). The average transmission

probability of the node is T =>EnLsi Ti, and the averagepacket loss probability of the node is p- Z= i T.~Let A be the average duration of a time slot when the queue

The service rate pui over link i' is the sum of two contri-butions, piu= ,UT + Di'T. The first is the rate T,UT at whichpackets are transmitted successfully, given by

I1TiSiT (I -pi)

AThe second is the rate D,UD at which packets are discarded bythe MAC due to the maximum retry limit, given by

I'D,SiTi (I1 pi)p+l2±A(1 -pTl±l)

Let,Up n>31 ,, and nU > 12 ,Di The nodeaggregate service rate is finally given by ,u= pTp+ ,UD. Noticethat ,u, UPT and ,UD can be expressed as functions of the onlyunknown variable b.

Now, the aggregate node throughput can be evaluated eitheras Tp (1 70)/IT, which is a function h(b) of theonly unknown b, or from equation (1), where T and p aresubstituted by T- and p-, respectively. This latter expressionrequires knowledge of both b and e. We can thus form a systemof two equations in two unknowns, b and e:

{ Tf(1 -p)(l e)Hi T, i± T,±flic+±flbTb

fib TbHi T, ±WiT,+±fl U±lbTb

h(b)fB

which can be solved numerically. Notice that, in the abovetwo equations, channel state probabilities ]7l1, IU1, IFJ,, Fib areanalogous to those introduced in Section 1I-A for the case of asingle outgoing link, this time using node average probabilitiesT andp.

Once b and e are known, we can evaluate Tp and individuallink throughputs Tp as

Tp Si fl (1 pi)Tpi= n 1SjTj (1 pj) l< i< nL (4)

Standard equations of the MIMIlIB queue model allow usto also compute the buffer overflow probability, the averagenumber of packets in the queue, and the average queuing delayQ. To compute the average queuing delay Qi experienced bya packet destined to link i', we add the average time spent inthe queue before service (this component is common to allpackets, and equal to Q -1/,u) to the specific service time ofa packet sent over link i', obtaining Qi Q 1/,u + I/Pi~.

III. MODEL VALIDATION

To validate our analytical model and demonstrate huw it canbe exploited to predict the end-to-end throughput and delayof multi-hop flows, we proceed in two steps. First, we showthat the model accurately predicts individual link behaviorunder given traffic conditions. Second, we show how the modelcan be used to predict the performance of a new flow to beinserted on a given multi-hop path, i.e., under a hypotheticalchange in traffic conditions. In both steps of the validation,we directly measure via simulation the fraction of time fBsensed busy by each node, and the packet loss probability pion each link. These quantities are then fed into the model

23

outlined in Section II. We first describe how the measurementof fB and pi can be performed locally by the nodes at lowcommunication overhead, then we validate the model undercurrent traffic conditions, and finally we assess the accuracy ofthe performance prediction of multi-hop flows under changingtraffic conditions.

In all experiments we have used the ns-2 [2] simulatorwith wireless extensions. These extensions to ns-2 modelchannel contention, packet collision, channel capture andbackoff, based on the specifications of IEEE 802.11. The radiopropagation model uses the two-ray ground reflection path lossmodel for large scale propagation and a Ricean fading modelfor small scale propagation. Apart from using the standardIEEE 802.1 lb parameters, we used 11 Mb/s data rate, 30-packet node buffer size, 150 m maximum transmission range,212 m maximum carrier sense range, and turned off RTS/CTSin all simulations.

A. Estimation of fB and pThe fraction of busy time fB provides a measure of the

amount of air-time sensed busy by the node due to the activityof other nodes in its sensing range. Therefore, the estimationof fB has to be done per-node, not per-link. In principle, thecomputation of fB can be done through a passive measurementtechnique, thereby incurring zero communication overhead.The value of fB can be very well approximated by the fractionof time that the activity of a node is suspended because theNAV timer is pending. Although this computation is simple toimplement, current network-card drivers do not allow accessto the NAV variable. Regardless, we assume that fB can becomputed locally with a more accessible device driver.

Estimation of the packet loss probability has to be done per-link, and requires active measurements in case a link is notcurrently used. Active measurement of link loss probabilitycan be performed using either unicast or broadcast probes sentat low rate by the node, so as to minimize communicationoverhead. Unicast packets yield more accurate prediction withhigher overhead in case a node has to monitor the quality ofseveral links. Broadcast probe packets, originally proposed in[7], can be used to simultaneously measure the packet lossprobability over all links, resulting in reduced communicationoverhead at the expense of decreased accuracy (broadcastprobes have to be sent at the lowest data rate, thus can besubject to a different packet loss probability than unicast datapackets). In our ns-2 implementation, we use broadcast probesinserted at the head of the MAC queue to avoid undesiredqueuing delays. In practice, this can be accomplished byusing the traffic classes provided by the 802.1 le standard,as suggested in [9]. We also augment our measurement bycounting the number of retries required for data packets beingtransmitted on each link. Existing device drivers such asMadWifi enable computation of this quantity.

B. Link estimation under current traffic conditionsWe now show that our model can predict with high level

of accuracy the performance of individual links under statictraffic conditions, provided that the value of fB and p areknown. Notice that a node can estimate the performance ofeach of its outgoing links locally based on its measurementof fB and p, i.e., there is no need to exchange informationbetween neighboring nodes.

We consider the topology of the residential high-speedwireless mesh network commercialized by Chaska.net [1],which is reported in Figure 2. The network is comprised of194 APs and we select 14 of these nodes as gateways.

3000

2500

a) 2000

100

1000

500

(a) Chaska coverage map

0 500 1000 1500 2000 2500

X coordinate

(b) Extracted Chaska topology

Fig. 2. Topology of the Chaska wireless mesh network. The coverage map(left) is taken from [1]. On the extracted topology (right), square nodes denotesgateways, while solid (dashed) edges connect nodes which are in transmission(sensing) range.

We first consider the case of downstream traffic alone. Weassume that all gateways are saturated, and that the arrivalrate of packets destined to each Access Point associated witha given gateway is the same. More specifically, the amount ofapplication data arriving at each gateway is 5.6 Mb/s (largeenough to saturate the queues of all gateways), equally dividedamong the APs. The payload of each data packet is constant,equal to 1000 bytes.

2.5Y=x

10000Y=x

1000

1.5-

05~~~~~~~~~~~~~~~~~~~~~~C

Moel(M

luu

10

1 10 100 1000 10000

Model (in)

Fig. 3. Comparison between simulation and analysis for the throughput (left)and queuing delay (right) of each link, in case of saturated downstream traffic,with all flows active

Figure 3 depicts results for the case that all downstreamflows are active (i.e., all APs receive traffic from the gatewaythey are connected to). We compare model and simulationresults for the throughput and queuing delay on each traversedlink in the network. The total amount of traffic successfullydelivered to the APs is 31.46 Mb/s. We have also consideredthe case in which only half of the flows (randomly chosen)are active and found similar results.

We observe a good match between model and simulationresults for the vast majority of links. Notice that, in Figure 3, inthe case of downstream traffic, only a few links are congested(those directly attached to a gateway), while links that are

24

TABLE I

SUMMARY OF RESULTS FOR CURRENT NETWORK LOAD

scenariodownstream, all flows_downstream, half flows_upstream, all flowsupstream, half flows fT

throughput error0.00980.00830.03050.0355

delay error0.6740.7820.2150.282

a few hops away from the gateway are lightly loaded andcorrespond to the clouds of dots with delay around 10 ins.Congested links exhibit, instead, a queuing delay in the orderof hundreds of milliseconds. We argue that because of the useof a simple MIM/l/B queue model the delay prediction is notvery accurate for lightly loaded links whereas the delay overthe most critical, congested links is accurately estimated bythe model.

1.6-

1.4-

1.2

0.4-

01

Y=x10000

Y=x

1000

100

10

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6Model (Mb/s

1 10 100 1000 10000

Model (ms

Fig. 4. Comparison between simulation and analysis for the throughput (left)and queueing delay (right) of each link, in case of aggregate upstream trafficper gateway of 2.4 Mb/s, with all flows active

Next, we consider the more critical case of upstream traffic.Here, several congested points may arise as traffic is pushedfrom the APs to the gateways. We assume that the aggregateamount of data sent to each gateway is 2.4 Mb/s, and that anequal amount of traffic is sent by each of the APs connectedto a given gateway. Results of this experiment with all flowsactive are reported in Figures 4. We tried with only half theflows active and found similar results. The amount of trafficthat arrives at the gateways is 25.16 Mb/s which is less thanthe aggregate amount of traffic sent by all Access Point, 33.6Mb/s, indicating that the gateways are fully loaded.

We observe again a good match between model and sim-ulation results for most of the links. The number of linksexperiencing delays in the order of 100 ms is much higher thanin the case of downstream traffic, suggesting the presence ofmany congested points in the network. The delay prediction ismore accurate than in the case of downstream traffic, becausemore links are congested.

Table I summarizes the accuracy of the model's predictionsin the four scenario considered above. It contains, for eachcase, the mean relative error in the throughput prediction andthe mean relative error in the delay prediction, averaged overall active links.

C. End-to-end estimation under changing traffic conditionsHere, we show how the model can be used to predict end-

to-end throughput and delay of a multi-hop flow that is goingto be added to the network. First, we describe a techniqueto compute the available throughput on a path based on themodel in Section II. Then, we evaluate the accuracy of theproposed technique considering the Chaska topology.

The available bandwidth of a multi-hop flow over a pathis the maximum throughput it can achieve subject to thecondition that no queue along the path gets overloaded, i.e.,the traffic intensity on each link is kept smaller than or equalto 1. Our estimation technique takes into account inter-flowand intra-flow interference separately and consists of two stepsdescribed below:

Inter-flow step: For each link 1 in the path, we first findthe maximum additional input rate El that can be added underthe constraint that traffic intensity does not exceed 1. Themaximum allowable value of El is found by iteratively applyingthe model according to a binary search for the maximum valueof Al. While doing so, we keep the measured values of fB andp fixed relative to a node. This is indeed a crucial point of ourtechnique because when we add traffic on a link, we perturbthe network state. Our main approximation is thus to assumethat fB and p do not vary significantly when we add trafficon a single link without saturating the link.

Intra-flow step: The quantities El computed in the firststep provide an estimate of the throughput that each linkcould obtain if operated in isolation. However, the throughputavailable on a multi-hop path is smaller, because the linkscontend with each other. To account for self-interference, weproceed as follows. We construct an intra-flow link contentiongraph that captures interference relationships among the linksof the path. Each vertex in the contention graph correspondsto a link. An edge exists between two vertices if they contendwith each other. Each clique j'of the contention graph providesa constraint on the maximum achievable throughput F overthe path in the form Klj 1, i.e., the sum of thenormalized time shares required to send the amount of trafficF over each link of the clique cannot exceed 1. It followsthat F < Fj, Vj, where Fj = Elcj Finally, the pathavailable throughput is computed taking the minimum amongall Fj. Figure 5 illustrates an example of computation of theavailable path throughput in a simple case.

50 pkt/sec 100 pkt/sec

Clique 1 2'I ~

3

3 --40 4--025 pktlsec 20 pkt/sec

Clique 2

~4-

Contention graphFig. 5. A 5-hop flow and its intra-flow contention graph. Dotted linesdenote interference range and dotted circles denote cliques. The path availablethroughput is min{( I + + 1 )9 1, ( p1 + 1 + } 10 pkts/sec.

D. Accuracy of the throughput estimation technique

To validate our technique to estimating the availablethroughput over a multi-hop path, we consider the Chaskatopology with the network already loaded with 100 flows.We separately consider upstream and downstream traffic andset the total amount of traffic sent to or from each gatewayto 2 Mb/s, equally divided among the flows associated to it.Then we insert one additional flow, randomly chosen amongthose not already present in the network, and we repeat this

25

--F-

experiment 50 times. The rate of the new flow is limited tothe value computed by our technique. Results for the case ofdownstream and upstream traffic are shown in Figures 6 and7, respectively. Despite several approximations, our techniqueis able to obtain a good match with simulation, especially forthe throughput prediction.

0.2 0.4 0.6 0.8 1.2Predicted Throughput (Mb/s

1000o[y=x ~

E

0CZ

C:r)0.1.'5 0EC/)

10 100Predicted Delay (ms

(a) Throughput (b) DelayFig. 6. Download scenario: Comparison of simulation and analytical resultsin case of 2Mbps traffic, with 100 flows initially active and 50 additionalmulti-hop) flows.

10001.4 Xy

1.2

10

06

1004

C0.2

0.2 0.4 0.6 0.8 1.2 1.4Predicted Throughput (Mb/s

1 10 100 1000Predicted Delay (ms

(a) Throughput (b) DelayFig. 7. Upload scenario: Comparison of simulation and analytical results incase of 2Mbps uplink traffic, with 100 flows initially active and 50 additionalmulti-hop flows.

IV. EVALUATION OF ROUTING METRICS

Several link-cost routing metrics have been proposed in theliterature with the goal of finding high throughput paths in802.11 mesh networks. Most are primarily based on the packetloss probability measured on the links. In this section, weuse our model to evaluate the performance of existing routingmetrics in congested networks. After introducing the metrics,we show through simple scenarios and graphs that they canfail to discover high-throughput paths, essentially becausethey rely on link quality measures that provide only partial,incomplete information about the throughput achievable ona given path. In contrast, we show that there is significantspace for improvement if we directly estimate the end-to-endthroughput over a path, which can be done using a model-based approach such as the one described in this paper.

In an effort to understand fundamental properties that gov-ern the correlation of routing metrics and end-to-end through-put in wireless mesh networks, we focus on the baselinecase of single channel, single data rate, and UDP traffic. Ourmodel can also be used to study the case of different, fixedper-link data rate through the incorporation of appropriatepacket transmission duration, or study the case of multi-radiosystems where each radio executes a separate instance of802.11 MAC protocol on a ditterent channel. The baseline caseis a necessary first step before considering more heterogeneousnetwork scenarios, dynamic data rate adjustment mechanismssuch as Autorate Fallback, as well as the role of transportprotocols such as TCP.

A. Routing metrics

The Expected Transmission Count (ETX) metric, originallyproposed in [7], computes for each link 1 the average numberof transmission attempts ETX1 required to send successfullya packet over the link:

ETX11

1 -P1 (5)

where pl is the packet loss probability at the MAC layer,that can be estimated locally using for example the broadcastprobes technique.

The Expected Transmission Time (ETT) metric [9] is animproved version of ETX, that also takes into account thepacket size Si and the data rate B1 used on the link:

ETT1 =ETX1 x SB1 (6)

The Weighted Cumulative ETT (WCETT) metric [9] improvesETT by introducing an additional (weighted) component re-lated to the channel diversity of the path. We do not furtherconsider this metric because we focus on single-channelnetworks.

The Interference-aware Resource Usage (IRU) metric [16]is defined as:

IRUl ETT1 x N1 (7)where N1 is the set of neighbors that the transmission onlink 1 interferes with. Notice that, in all metrics above theonly dynamic variable is the packet loss probability while allother quantities depend only on network topology, hence notaffected by network load. Therefore, we will refer to them asthe class of loss-based metrics. Moreover, with a single-rateand constant packet-size, all metrics assign to the links a costthat is proportional to the ETX metric. Even the IRU metric,that includes the number of interfering nodes, is proportionalto ETX when the node density is homogeneous.

Being based on packet loss probability, all the above metricsdepend significantly on the network load. Indeed, in wirelessmesh networks, packet loss probabilities can be very large(e.g., larger than 0.5) even under medium load, due to avariety of problems related to the MAC protocol itself (such ashidden terminals). Under many circumstances, losses inducedby channel contention can be much higher than those due tochannel corruption (fading, shadowing, etc) [I11]. Therefore,loss-based metrics are significantly affected by network load,although they are expected to provide more stable link costs(e.g., subject to less random fluctuations) than other load-sensitive metrics such as those based on packet-pair techniques[8] or RTT measurements [3].We conclude that the performance of loss-based metrics

should be assessed within the larger family of load-sensitivemetrics. The fundamental question then is the following:within the class of load-sensitive metrics, how effective areloss-based metrics at finding high-throughput paths? We ad-dress this question in the following sections, with the help ofboth our model and simulation experiments.

B. Performance of a single link

We first consider the performance of a single link. Accord-ing to loss-based metrics, the cost of a link is proportional to

26

00

the expected number of retransmissions to successfully senda packet (ETX). The service time on the link is implicitly as-sumed proportional to the number of retransmission attempts.Hence, the maximum throughput on the link (the inverse ofthe service time) is inversely proportional to ETX and canbe expressed as Tp(p) =Tmax,(l p), where Tmax, is themaximum achievable link throughput (when p =0).

Instead, according to our model in Section II, the throughputon a link depends on three variables: the packet loss probabil-ity p, the fraction of busy time fB and the average durationof a busy period Tb.

1080-40

Fig. 8. The throughput of a single link as a function of the fraction of busytime fB and of the packet loss probability p, according to the model (gridsurface) and loss-based metrics (solid surface)

Figure 8 depicts the throughput of a single link as a functionof fB and p, according to our model2 and according to loss-based metrics. We observe that they differ considerably. Inparticular, loss-based metrics neglect the impact of fB andassume that the throughput decreases linearly with p. Instead,the throughput depends on p in a non-linear way due to theexponential backott mechanism ot 802.11. More importantlyit decreases linearly with fB. This can be explained by thefact that 1 fB is essentially the air-time available to the linktransmitter.

Thus, for the simplest case of a single link we have a firstclear indication that the throughput estimation according toloss-based metrics can deviate substantially from the actualbehavior of 802.11.

C. Single path performanceWe investigate the throughput performnance of loss-based

metrics for a single multi-hop flow in isolation, i.e., a singlepath without any interference from other nodes outside thepath. It is well known that the throughput along a singlepath suffers from intra-flow interference. A typical behavioras a function of the number of hops is illustrated in Figure9, where we have simulated a backlogged source, standard802.11lb parameters, no RTS/CTS, and nodes equally spacedapart so that any node is in range of just its predecessorand successor nodes. As the number of hops increases, thethroughput exhibits a quick drop and then stabilizes to a valueof approximately 1.2 Mb/s. A similar qualitative behavior isobserved under different configurations such as node spacing,

21t turns out that the throughput as predicted by our model is almostinsensitive to Tb, which plays a minor role with respect to fB. Thus, inFigure 8 we set Tb =T5, and focus only on the impact of fB and p.

unequal sensing and transmission ranges, use of RTS/CTS,etc.

throughput (simulation)throughput (lossbased metrics) 9

5 tossbasedpath cost

4 7

2 4

0 t~~~~~~~~~~~~~~~~

2 3 4 5 6 7 8 9 tONumber of Hops

Fig. 9. Throughput over a single path as a function of the number of hops,according to simulation and loss-based metrics

The right-side y-axis of Figure 9 reports the path cost ofloss-based metrics when only broadcast probes exist withoutany data traffic injected over the path. In this case, the packetloss probability experienced by the probe packets is close tozero on each link. Hence, all loss-based metrics yield a pathcost proportional to the number of hops.

Analogous to the single link case, we also report thethroughput prediction according to loss-based metrics, i.e.,the inverse of the path cost. We observe that this quantityrepresents well the throughput available on the chain fornumber of hops less than 4. However, longer paths thatwould in reality provide a stable throughput of 1.2 Mbps, areincreasingly penalized by loss-based metrics as the number ofhops increases.

D. Path selection performanceWe now investigate the performance of loss-based metrics

in a congested network in which multi-hop flows contendwith each other. We focus on the simplest case where thereexist only two alternative independent paths and show thatloss-based metrics can indeed lead to a sub-optimal decision:select a path providing significantly lower throughput than thealternate path.A typical situation encountered in the large mesh network

of Figure 2 is the generic scenario depicted in Figure 10. Here,mesh AP A must select a path either toward gateway G, oran independent path toward gateway G2. The number of hopsbetween A and G, or G2 is variable and possibly different. Inaddition, both nodes G, and G2 can be loaded with upstreamand/or downstream flows. We investigate the throughput loss

b

~~AFig. 10. Generic path selection problem: mesh AP A must send traffic toeither gateway GI or G2.due to sub-optimal routing decisions of loss-based metrics byevaluating the individual impact of two critical parameters:high packet loss and high busy time fraction.

Impact of high packet loss. Figure 11I depicts an instanceof the generic path selection problem of Figure 10. Withrespect to gateway Gl, AP A has a two-hop path through

27

1,6 10

relay node B. The path load is induced by AP C which hasestablished a single-hop UDP upload flow to gateway GI andcan create high packet loss to link A -B because it is hiddenfrom A. With respect to gateway G2, A has a path with avariable number of hops and no interference is induced byother flows. In this case, the available bandwidth and pathcost are only a function of the number of hops as in Figure9.

G1-f*---- - .. -HGB A2

cFig. II. Example topology used to derive results in Figure 12

0

H

3.5

3

2.5

2

1.5

0.5

achievable unused G2 ---e---current load G1 XX.....achievable G1 -E

XX .............._(3<-X- xx. ---3-<3 3-

-X./GD00-A.

2 2.5 3 3.5 4 4.5 5loss-based path cost to G

Fig. 12. Load on gateway G1 and achievable throughput on gateways G1and G2 as a function of the loss-based path cost to G1, for the exampletopology of Figure 11

We now perform a simulation experiment focusing on pathA -G1. We gradually increase the load on link C -G1, andfor each load value we measure: i) the loss-based path costfrom A to G1; ii) the maximum throughput A would achieveif it selected the path to gateway G1.

Figure 12 depicts three quantities as a function of themeasured path cost A -G1. The first two quantities are the(input) load on link C- G1 and the maximum achievablethroughput of path A-G1. The third quantity is the maximumachievable throughput on the alternate path A -G2, giventhat this path would not be selected due to higher cost; morespecifically, for each value cl of the measured path costA-G1, we report the throughput of Figure 9 that correspondsto a path A -G2 of cost Fcl] hops.As expected, the path cost from A to G1 increases as the

load on link C -G1 increases. However, the selection ofgateway G1 resulting from loss-based metrics can lead tosignificantly less throughput than the achievable throughput onthe path to gateway G2. Indeed, paths toward G2 with lessthan 4 hops can provide from 0 to 100% more throughputthan that obtained toward G1. Longer paths can provide evenhigher gains but could be more difficult to find.

We also observe that, no matter how high the load on G,and the corresponding path cost are, there can always be a longenough path toward G2 not selected by loss-based metricswhile providing higher throughput, at least equal to 1.2 Mb/s,the stable path throughput in Figure 9.

Impact of high busy time fraction. Figure 13 depicts aninstance of the generic path selection problem of Figure 10where the loss-based path cost and throughput A -G1 aremainly affected by the high fraction of busy time sensed at

intermediate node B due to the activity of two independentlinks C -G1 and D -G3 within its sensing range.3

Focusing again on path A -G1, we repeat the experimenton the scenario of Figure 13. We equally increase the loads oflinks C- G1 and D-G3 and measure the loss-based path costand the maximum achievable throughput to G1. The resultsare reported in Figure 14, which also depicts the achievedthroughput for a higher path cost to gateway G2.

We observe that the throughput loss due to the sub-optimalrouting decision is higher than in Figure 12. The reason isthat the throughput reduction due to high fraction of busy timesensed by node B is not captured by loss-based metrics. Thisadditional penalty on the path to G1 is not included in theloss-based path cost. As a result, an alternate non-chosen pathto G2 of 4 hops could provide 200 % higher throughput, i.e.,the throughput loss due to loss-based metrics is approximatelytwice the throughput loss observed in Figure 12.

G3 *DD

. oM\A

CFig. 13. Example topology used to derive results in Figure 14

3.5 current load on G1 and G3 ... x3 achievable unused G2 0----- X

achievable G1 a-E x.'I

-0

=1

2

1.5

0.5

msxg...- .eD

X"'0 3D> -

o2 2.5 3 3.5 4 4.5

loss-based path cost to GIFig. 14. Load on gateways G1 and G3 and achievable throughput ongateways G1 and G2 as a function of the corresponding path cost to gatewayG1, for the example topology of Figure 13

We conclude that loss-based metrics can fail to discoverhigh-throughput paths, possibly leading to severe throughputlosses, in the order of 100% or more. Indeed, the packet lossprobability on the links forming a path provides very partialinformation about the achievable throughput on the path. Thus,a routing strategy based on this information can select highlysuboptimal paths.

V. PERFORMANCE EVALUATION

In this section we compare the performance of our model-based metric for available bandwidth estimation with theperformance of existing loss-based metrics. As in Section III,we used the ns-2 simulator for this evaluation. We first outlinein Section V-A the design of a routing protocol based on ourmetric. In Section V-B, we compare the performance of thedifferent metrics in various network scenarios.

3The reader is referred to [11] for an explanation of why node B will sensea large fraction of busy time.

28

A. The Routing Protocol

Our available bandwidth metric is meant to be used incombination with a proactive routing protocol similar to LQSR[9]. We emulated the behavior of a link state protocol using acentralized routing database that is instantaneously updatedwhenever nodes provide new measurements of fB and p,link ETX, link IRU, etc. In particular, measurements areupdated using the average of the samples collected duringthe last 20 seconds. When a source node S needs to senda packet to a destination node D, S retrieves a route fromthe centralized database and puts the entire path in the packetheader. Intermediate nodes only need to relay packets basedon the source-routing information carried in the packet.

We use a modified version of Dijkstra's shortest path algo-rithm. For Dijkstra's shortest path algorithm to be applicable,the cost of a link should be strictly positive so that the cost ofa path (route), which is defined to be the sum of the cost of theconstituent links, increases monotonically with the increase inthe number of constituent links. For any route, since the loss-based metrics, i.e., IRU and ETX, increase monotonically withthe number of constituent links, the routing protocol can usethe standard version of Dijkstra's shortest path to select routes.

However, when our proposed model-based metric is used,the routing protocol uses a modified version of Dijkstra'sshortest path algorithm for route selection. In this modifiedversion, the cost of a link is the available bandwidth from thesource to the destination of the link, and the cost of any pathis defined as the maximum available bandwidth between thesource and destination. The available bandwidth over a path,computed according to the technique described in Section III-C, is guaranteed to monotonically decrease with the hop count.Hence, we can use a modified Dijkstra's algorithm, shown inAlgorithm 1, to efficiently compute the available bandwidthfrom a source node to all other nodes in the network.

Algorithm 1 Modified Dijkstra's shortest path algorithm tosearch for path with maximum available bandwidth.Require: N: Total nodes.Require: s: Source node.Require: Adjacent[u]: Set of all nodes adjacent to u.fori= I to Ndo

available[i] 0;previous[i] rnull

end foravailable[s] = oc /* available bandwidth from s to s *1S -- 0; /* initialize set of visited nodes *1Q 1,2,...,N;while Q is not empty dou - Extract Max(Q)S S U u;for all v C Adjacent[u] doB = CliqueBandwidth(u,v);if available[v] < B then

available[v] = Bparent[v] - u;

end ifend for

end while

/* set of all unvisited nodes */

The function "CliqueBandwidth(u, v)" computes the avail-able bandwidth from the source node s to node v usingthe newly added node u as the previous hop. The available

bandwidth computation is done using the intra-flow clique-based procedure of Section III-C over a single path s -> v,formed by link (u,v) and the path defined by the parent[]entries of u and its parent nodes back to source s.

B. Results

For our first set of experiments, we consider the Chaskatopology in Figure 2. We randomly pick 100 nodes and start anupstream flow to its nearest gateway according to minimum-hop count. The traffic load sent by each source node is chosenas a uniform random value between 0 and 2/rm Mb/s, where mis the number of flows sending traffic to the same gateway. Bydoing so, we randomize the load on the 14 gateways presentin the network. After allowing these 100 flows to run for 30seconds, we randomly choose another node, different fromthe 100 existing sources, and start an upstream flow fromthis node with the freedom to choose the destination gateway.For the new flow we compute the best route to a gatewayaccording to ETX, IRU, and our metric, that we call AVAIL;once computed, the flow uses the same route for the rest ofthe simulation.

3rr

2.5

-0

0)I0

2

0.8 ETX

0.7 = T

1.5- 0.2[.10 5 10 15 20 25 30

Flow serial number

0.5.

0 10 20 30Flow serial number

40 50

Fig. 15. Sorted flow throughput for the Chaska topology. Since the topologydoes not allow many alternate independent paths from AP's to gateways, allthe metrics perform similarly.

Having selected the route, irrespective of the routing metricused, we use our model to predict the available bandwidthalong the chosen route and rate-limit the newly added up-stream flow to the predicted bandwidth value. This is donefor the sake of a fair comparison with ETX and IRU, sinceneither of them can be used to compute the sustainable sendingrate of the new UDP stream. For each metric, we repeatthe experiment 50 times, each experiment having the sameinitial 100 flows but different source nodes from which tostart the new flow to be added in the network. We measurethe throughput achieved by the new flow in each run, and sortthe 50 values in increasing order.

Figure 15 indicates that the AVAIL metric provides a gainwith respect to the other metrics, especially for lower through-put values, as shown in the insert of Figure 15. However, in allcases the gain is small. The reason is that the simulated Chaskatopology is composed of almost disconnected, dense clustersof nodes, hence it provides low spatial reuse and a limitednumber of independent paths towards distinct gateways.

We next consider a richer and structured topology providingmore spatial reuse and degrees of freedom in the selection ofroutes. More specifically, we consider a manhattan network of

29

()

0.6

0.42

0.3

O.

11jS-*

7r

6

0.5

c,4a)

0 3

21

- AVAIL* IRU* ETX

0.8-AVAI L

0.7 * RUETX

0.6

i5 0.5

a 0.4

2 0.3

0.2

0.1

0 10 20 30 40 50 0 10 20 30 40 50Flow serial number Flow serial number

(a) Sorted flow throughput for the Manhattan (b) Length of routes chosen by the different met-topology with 3 Mbps load. AVAIL achieves 49% rics for the Manhattan topology with 3 Mbps(51%) higher throughput than ETX (IRU) on av- maximum load.erage.

Fig. 16. Manhattan topology

196 nodes arranged in a 14 x 14 grid. Each node is within VIIsensing and transmission range of only its North, South, East,and West neighbors. We randomly select 10 of these nodes This research wato act as gateways4. The load on each gateway is uniformly and by a grant fromdistributed between 30% and 100% of a preset maximum andthis load is equally distributed among the flows ending at thegateway. We run two sets of experiments for two values of [1] Chaska.net, Residthis preset maximum, namely, 3 Mb/s, shown in Figure 16(a) http: //wwwo . chand 4 Mb/s, shown in Figure 16(c). Figure 16(b) reports the [3] AT Adya,VN Bahllength of the routes selected by the different metrics in the Radio Unification:3 Mb/s case. IEEE Broadnets, So

Since the topology provides a rich selection of independent [4] High ThroughputD

paths the AVAIL metric is able to find routes with considerably works. Kluwer Mohigher throughput than the routes chosen by ETX or IRU. In Special Issue on In,Figure 16(a), AVAIL achieves 49% (51%) higher throughput [5] G. Bianchi. Perforrwith respect to ETX (IRU) on the average. As shown in dination function. 1Figure 16(b), AVAIL typically selects longer routes than 18(3) 535-547,Manthose chosen by IRU or ETX. This is because AVAIL does access protocols ina better job at finding longer routes with higher available Philadelphia, PA, Ubandwidth than shorter routes. The throughput gain of AVAIL [7] D. S. J. D. Couto.increases when the offered load in the network is increased. In throughput path meFigure 16(c), AVAIL achieves 127% (108%) higher throughput MobiCom, San Die,with respect to ETX (IRU) on the average. More importantly, for static multi holunder loss-based metrics almost half the flows receive close to Philadelphia, PA, Uzero throughput while AVAIL is able to identify non-starving [9] R. Draves, J. Padhpaths. Wireless Mesh Net

USA, Sept. 2004.

VI. CONCLUSIONS [10] Y Gao, D. Chiu, acapacity in multi-hcACM SIGMETRIC5

We proposed a novel technique to discover high throughput [11] M. Garetto, T. Salorpaths in a congested 802.11 mesh network. The novelty of and capturing starvaour approach lies in the direct computation of the end-to-end IEEE INFOCOM, Iavailable bandwidth through the use of an accurate analytical [12] M. Garetto, J. Shi, a

model of 802.11, that we have extended to the case in which two-flow topologiesnodes send to multiple receivers. We have also shown how to [13] K. Medepalli and Ileverage our technique within a distributed routing protocol 802.11 based wirel(which exploits local measurements performed by the nodes tions. In Proc. IEEto effectively route newly added flows, achieving significant [14] X. Wang and K.gains with respect to existing routing metrics. We plan to in CSMA/CA bascextend our approach to study the coupling of mesh routing [15] Y Yan'g and R.'Kprotocols with dynamic data rate adjustment mechanisms such Hoc Networks. IEEas AutoRate Fallback and congestion control protocols such [16] Y Yang, J. Wang,as TCP. networks. In Proce

0 10 20 30 40 50Flow serial number

(c) Sorted flow throughput for the Manhattantopology with 4 Mbps load. AVAIL achieves 108%(127%) higher throughput than ETX (IRU) onaverage. Also, under ETX and IRU half of theflows starve.

ACKNOWLEDGEMENTS

s supported by NSF Grant CNS-0325971I Cisco.

REFERENCES

lential High Speed Wireless Internet Access.aska.net.lator, ns-2. http: //www.isi.edu/nsnam/ns/.J. Padhye, A. Wolman, and L. Zhou. A Multi-

Protocol for IEEE 802.11 Wireless Networks. Inan Jose, CA, USA, October 2004.Iolmer, and H. Rubens. The Medium Time Metric:Route Selection in Multirate Ad Hoc Wireless Net-)bile Networks and Applications (MONET) Journal,tternet Wireless Access: 802.11 and Beyond.mance analysis of the IEEE 802.11 distributed coor-tEEE Journal on Selected Areas in Communications,r. 2000.J. Garcia-Luna-Aceves. A scalable model for channelmultihop ad hoc networks. In Proc. ACM MobiCom,JSA, Sept. 2004., D. Aguayo, J. Bicket, and R. Morris. A high-etric for multi-hop wireless routing. In Proc. ACMgo, CA, USA, September 2003.lye, , and B. Zill. Comparison of routing metricsp wireless networks. In Proc. ACM SIGCOMM,JSA, Sept. 2004.ye, and B. Zill. Routing in Multi-Radio, Multi-Hop[works. In Proc. ACM MobiCom, Philadelphia, PA,

and J. Lui. Determining the end-to-end throughputop networks: methodology and applications. In Proc.S, Saint-Malo, France, 2006.nidis, and E. Knightly. Modeling per-flow throughputation in CSMA multi-hop wireless networks. In Proc.Barcelona, Spain, April 2006.and E. Knightly. Modeling media access in embeddeds of multi-hop wireless networks. In Proceedings of5, Cologne, Germany, Sept. 2005.F. Tobagi. Towards performance modeling of IEEELess networks: A unified framework and its applica-'E INFOCOM, Barcelona, Spain, April 2006.Kar. Throughput modelling and fairness issuesed ad-hoc networks. In Proc. IEEE INFOCOM,-arch 2005.*avets. Contention-Aware Admission Control for Ad,E Trans. on Mobile Computing, 4(4):363-377, 2005.and R. Kravets. Designing routing metrics for mesh7edings of WiMesh, Santa Clara, CA, Sept. 2005.

4In a real deployment the flexibility of placing gateways might be limitedsince gateway nodes typically require a dedicated link to the wired backbone.

30

~ A ."wmIi1 L.

t,,garetto/conferences/identifying.pdf · where q 1 2p, wo is theminimumwindow size, mn maximum...

Documents