tcp

1026 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010

A Machine Learning Approach to TCP ThroughputPrediction

Mariyam Mirza, Joel Sommers, Member, IEEE, Paul Barford, and Xiaojin Zhu

Abstract—TCP throughput prediction is an important capabilityfor networks where multiple paths exist between data senders andreceivers. In this paper, we describe a new lightweight method forTCP throughput prediction. Our predictor uses Support VectorRegression (SVR); prediction is based on both prior file transferhistory and measurements of simple path properties. We evaluateour predictor in a laboratory setting where ground truth can bemeasured with perfect accuracy. We report the performance of ourpredictor for oracular and practical measurements of path proper-ties over a wide range of traffic conditions and transfer sizes. Forbulk transfers in heavy traffic using oracular measurements, TCPthroughput is predicted within 10% of the actual value 87% ofthe time, representing nearly a threefold improvement in accuracyover prior history-based methods. For practical measurements ofpath properties, predictions can be made within 10% of the actualvalue nearly 50% of the time, approximately a 60% improvementover history-based methods, and with much lower measurementtraffic overhead. We implement our predictor in a tool called Path-Perf, test it in the wide area, and show that PathPerf predicts TCPthroughput accurately over diverse wide area paths.

Index Terms—Active measurements, machine learning, supportvector regression, TCP throughput prediction.

I. INTRODUCTION

T HE availability of multiple paths between sources and re-ceivers enabled by content distribution, multihoming, and

overlay or virtual networks suggests the need for the ability toselect the “best” path for a particular data transfer. A commonstarting point for this problem is to define “best” in terms of thethroughput that can be achieved over a particular path betweentwo end-hosts for a given sized TCP transfer. In this case, thefundamental challenge is to develop a technique that providesan accurate TCP throughput forecast for arbitrary and possiblyhighly dynamic end-to-end paths.

Prior work on the problem of TCP throughput prediction haslargely fallen into two categories: those that investigate formula-based approaches and those that investigate history-based ap-proaches. Formula-based methods, as the name suggests, pre-dict throughput using mathematical expressions that relate a

Manuscript received December 02, 2008; revised July 15, 2009; approvedby IEEE/ACM TRANSACTIONS ON NETWORKING Editor C. Dovrolis. Firstpublished January 12, 2010; current version published August 18, 2010. Thiswork was supported in part by NSF grants CNS-0347252, CNS-0646256,CNS-0627102, and CCR-0325653. Any opinions, findings, conclusions, orrecommendations expressed in this material are those of the authors and donot necessarily reflect the views of the NSF. An earlier version of this paperappeared in SIGMETRICS 2007.

M. Mirza, P. Barford, and X. Zhu are with the Department of Computer Sci-ences, University of Wisconsin–Madison, Madison, WI 53706 USA (e-mail:[email protected]; [email protected]; [email protected]).

J. Sommers is with the Department of Computer Science, Colgate University,Hamilton, NY 13346 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TNET.2009.2037812

TCP sender’s behavior to path and end-host properties such asRTT, packet loss rate, and receive window size. In this case,different measurement tools can be used to gather the inputdata that is then plugged into the formula to generate a pre-diction. However, well-known network dynamics and limitedinstrumentation access complicate the basic task of gatheringtimely and accurate path information, and the ever-evolving setof TCP implementations means that a corresponding set of for-mula-based models must be maintained.

History-based TCP throughput prediction methods are con-ceptually straightforward. They typically use some kind ofstandard time series forecasting based on throughput measure-ments derived from prior file transfers. In recent work, He etal. show convincingly that history-based methods are generallymore accurate than formula-based methods. However, the au-thors carefully outline the conditions under which history-basedprediction can be effective [11]. Also, history-based approachesdescribed to date remain relatively inaccurate and potentiallyheavyweight processes focused on bulk transfer throughputprediction.

Our goal is to develop an accurate, lightweight tool for pre-dicting end-to-end TCP throughput for arbitrary file sizes. Weinvestigate the hypothesis that the accuracy of history-based pre-dictors can be improved and their impact on a path reduced byaugmenting the predictor with periodic measurements of simplepath properties. The questions addressed in this paper include:1) which path properties or combination of path properties in-crease the accuracy of TCP throughput prediction the most? and2) what is a minimum set of file sizes required to generate his-tory-based throughput predictors for arbitrary file sizes? Addi-tional goals for our TCP throughput prediction tool are: 1) tomake it robust to “level shifts” (i.e., when path properties changesignificantly), which He et al. show to be a challenge in his-tory-based predictors; and 2) to include a confidence value withpredictions—a metric with little treatment in prior history-basedthroughput predictors.

We use Support Vector Regression (SVR), a powerful ma-chine learning technique that has shown good empirical perfor-mance in many domains, for throughput prediction. SVR hasseveral properties that make it well suited for our study: 1) It canaccept multiple inputs (i.e., multivariate features) and will useall of these to generate the throughput prediction. 2) SVR doesnot commit to any particular parametric form, unlike formula-based approaches. Instead, SVR models are flexible based ontheir use of so-called nonlinear kernels. This expressive power isthe reason why SVR has potential to be more accurate than priormethods. 3) SVR is computationally efficient, which makes itattractive for inclusion in a tool that can be deployed and usedin the wide area. For our application, we extend the basic SVRpredictor with a confidence interval estimator based on the as-sumption that prediction errors are normally distributed, an as-

1063-6692/$26.00 © 2010 IEEE

MIRZA et al.: A MACHINE LEARNING APPROACH TO TCP THROUGHPUT PREDICTION 1027

sumption that we test in our laboratory experiments. Estimationof confidence intervals is critical for online prediction since re-training can be triggered if measured throughput falls outside aconfidence interval computed through previous measurements.

We begin by using laboratory-based experiments to inves-tigate the relationship between TCP throughput and measure-ments of path properties including available bandwidth ,queuing delays , and packet loss . The lab environmentenables us to gather highly accurate passive measurements ofthroughput and all path properties and develop and test ourSVR-based predictor over a range of realistic traffic conditions.Our initial experiments focus on bulk transfers and compareactual throughput of TCP flows to predictions generated by ourSVR-based tool. Our results show that throughput predictionscan be improved by as much as a factor of 3 when includingpath properties in the SVR-based tool versus a history-basedpredictor. For example, our results show that the SVR-basedpredictions are within 10% of actual 87% of the time for bulktransfers under heavy traffic conditions (90% average utiliza-tion on the bottleneck link). Interestingly, we find that the pathproperties that provide the most improvement to the SVR-basedpredictor are and , respectively, and that includingprovides almost no improvement to the predictor.

Next, we expand the core SVR-based tool in three ways.First, the initial tests were based entirely on passive trafficmeasurements, which are unavailable in the wide area, so wetested our SVR-based approach using measurements ofand provided by the BADABING tool [25]. The reduction inaccuracy of active versus passive measurements resulted in acorresponding reduction in accuracy of SVR-based throughputpredictions for bulk transfers under heavy traffic conditionson the order of about 35%—still a significant improvementon history-based estimates. Also, throughput prediction basedon training plus lightweight active measurements results in adramatically lower network probe load than prior history-basedmethods using long-lived TCP transfers and heavyweightprobe-based estimates of available bandwidth such as de-scribed in [11]. We quantify this difference in Section VII.Second, unlike prior work that focuses only on bulk transfers,we enabled SVR to predict throughput accurately for a widerange of file sizes. We found that a training set of only threefile sizes results in accurate throughput predictions for a widerange of file sizes. Third, He et al. showed that “level shifts”in path conditions pose difficulties for throughput prediction[11], suggesting the need for adaptivity. To accomplish this, weaugmented the basic SVR predictor with a confidence intervalestimator as a mechanism for triggering a retraining process.We show in Section VI-C1 that our technique is able to adaptto level shifts quickly and to maintain high accuracy on pathswhere level shifts occur. We implement these capabilities in atool called PathPerf1 that we tested in the wide area. We presentresults from 18 diverse wide area paths in the RON testbed [3]and show that PathPerf predicts throughput accurately under abroad range of conditions in real networks.

II. RELATED WORK

Since seminal work by Jacobson and Karels established thebasic mechanisms for modern TCP implementations [13], it has

1PathPerf will be openly available for download at http://www.wail.cs.wisc.edu/waildownload.py.

been well known that many factors affect TCP throughput, suchas the TCP implementation, the underlying network structure,and the dynamics of the traffic sharing the links on the path be-tween two hosts. A number of studies have taken steps towardunderstanding TCP behavior, including [1] and [2], which de-veloped stochastic models for TCP based on packet loss charac-teristics. A series of studies develop increasingly detailed math-ematical expressions for TCP throughput based on modelingthe details of the TCP congestion control algorithm and mea-surements of path properties [4], [8], [10], [17], [18]. Whileour predictor also relies on measurement of path properties, theSVR-based approach is completely distinguished from prior for-mula-based models.

A large number of empirical studies of TCP file transfer andthroughput behavior have provided valuable insight into TCPperformance. Paxson conducted one of the most comprehensivestudies of TCP behavior [19], [20]. While that work exposed aplethora of issues, it provided some of the first empirical data onthe characteristics of packet delay, queuing, and loss within TCPfile transfers. Barford and Crovella’s application of critical pathanalysis to TCP file transfers provides an even more detailedperspective on how delay, queuing, and loss relate to TCP per-formance [6]. In [5], Balakrishnan et al. studied throughput fromthe perspective of a large Web server and showed how it varieddepending on end-host and time-of-day characteristics. Finally,detailed studies of throughput variability over time and corre-lations between throughput and flow size can be found in [31]and [32], respectively. These studies inform our work in termsof the basic characteristics of throughput that must be consid-ered when building our predictor.

Past studies of history-based methods for TCP throughputprediction are based on standard time series forecastingmethods. Vazhkudia et al. compare several different simpleforecasting methods to estimate TCP throughput for transfers oflarge files and find similar performance across predictors [30].A well-known system for throughput prediction is the Net-work Weather Service [28]. That system makes bulk transferforecasts by attempting to correlate measurements of priorlarge TCP file transfers with periodic small (64 kB) TCP filetransfers (referred to as “bandwidth probes”). The DualPatssystem for TCP throughput prediction is described in [16].That system makes throughput estimates based on an expo-nentially weighted moving average of larger size bandwidthprobes (1.2 MB total). Similar to our work, Lu et al. found thatprediction errors generally followed a normal distribution. Asmentioned earlier, He et al. extensively studied history-basedpredictors using three different time series forecasts [11]. OurSVR-based method includes information from prior transfersfor training, but otherwise only requires measurements fromlightweight probes and is generalized for all files sizes, not justbulk transfers.

Many techniques have been developed to measure path prop-erties (see CAIDA’s excellent summary page for examples [9]).Prior work on path property measurement directs our selectionof lightweight probe tools to collect data for our predictor. Re-cent studies have focused on measuring available bandwidth ona path. AB is defined informally as the minimum unused ca-pacity on an end-to-end path, which is a conceptually appealingproperty with respect to throughput prediction. A number ofstudies have described techniques for measuring AB including


[14], [26], and [27]. We investigate the ability of AB measure-ment as well as other path properties to enhance TCP throughputpredictions.

Finally, machine learning techniques have not been widelyapplied to network measurement. One notable exception is innetwork intrusion detection (e.g., [12]). The only other appli-cation of SVR that we know of is to the problem of using IPaddress structure to predict round-trip time latency [7].

III. A MULTIVARIATE MACHINE LEARNING TOOL

The main hypothesis of this work is that history-based TCPthroughput prediction can be improved by incorporating mea-surements of end-to-end path properties. The task of throughputprediction can be formulated as a regression problem, i.e., pre-dicting a real-valued number based on multiple real-valuedinput features. Each file transfer is represented by a featurevector of dimension . Each dimension is an observedfeature, e.g., the file size, proximal measurements of pathproperties such as queuing delay, loss, available bandwidth,etc. Given , we want to predict the throughput . Thisis achieved by training a regression function ,and applying to . The function is trained using trainingdata, i.e., historical file transfers with known features and thecorresponding measured throughput.

The analytical framework that we apply to this problem isSupport Vector Regression (SVR), a state-of-the-art machinelearning tool for multivariate regression. SVR is the regressionversion of the popular Support Vector Machines [29]. It has asolid theoretical foundation and is favored in practice for itsgood empirical performance. We briefly describe SVR belowand refer readers to [21] and [23] for details and to [15] as anexample of an SVR software package.

To understand SVR, we start from a linear regression function. Assume we have a training set of file

transfers . Training involves estimatingthe -dimensional weight vector and offset so that isclose to the truth for all training examples . Thereare many ways to measure “closeness”. The traditional measureused in SVR is the -insensitive loss, defined as

ifotherwise.

(1)

This loss function measures the absolute error between predic-tion and truth, but with a tolerance of . The value is applica-tion-dependent in general, and in our experiments we set it tozero. Other loss functions (e.g., the squared loss) are possible,too, and often give similar performance. They are not exploredin this paper.

It might seem that the appropriate way to estimate the pa-rameters is to minimize the overall loss on the trainingset . However if is large compared to thenumber of training examples , one can often fit the trainingdata perfectly. This is dangerous because the truth in trainingdata actually contain random fluctuations, and is partly fit-ting the noise. Such will generalize poorly, i.e., causing badpredictions on future test data. This phenomenon is known asoverfitting. To prevent overfitting, one can reduce the degree of

freedom in by selecting a subset of features, thus reducing .An implicit but more convenient alternative is to require to besmooth,2 defined as having a small parameter norm . Com-bining loss and smoothness, we estimate the parametersby solving the following optimization problem:

(2)

where is a weight parameter to balance the two terms. Thevalue of is usually selected by a procedure called cross val-idation, where the training set is randomly split into two parts,then regression functions with different are trained on onepart and their performance measured on the other part, and fi-nally one selects the value with the best performance. In ourexperiments, we used using cross validation. Theoptimization problem can be solved using a quadratic program.

Nonetheless, a linear function is fairly restrictive andmay not be able to describe the true function . A standard math-ematical trick is to augment the feature vector with nonlinearbases derived from . For example, if , one canaugment it with . The linear re-gressor in the augmented feature spacethen produces a nonlinear fit in the original feature space. Note

has more dimensions than before. The more dimensionshas, the more expressive becomes.

In the extreme (and often beneficial) case, can even haveinfinite dimensions. It seems computationally impossible to esti-mate the corresponding infinite-dimensional parameter . How-ever, if we convert the primal optimization problem (2) into itsdual form, one can show that the number of dual parameters isactually instead of the dimension of . Furthermore, thedual problem never uses the augmented feature explicitly.It only uses the inner product between pairs of augmented fea-tures . The function is known asthe kernel and can be computed from the original feature vec-tors . For instance, the Radial Basis Function (RBF) kernel

implicitly corresponds to aninfinite-dimensional feature space. In our experiments, we useda RBF kernel with , again selected by cross vali-dation. The dual problem can still be efficiently solved using aquadratic program.

SVR therefore works as follows. For training, one collects atraining set and specifies a kernel .SVR solves the dual optimization problem, which equivalentlyfinds the potentially very high-dimensional parameter andin the augmented feature space defined by . This produces apotentially highly nonlinear prediction function . The func-tion can then be applied to arbitrary test cases and pro-duces a prediction. In our case, test cases are the file size forwhich a prediction is to be made and current path propertiesbased on active measurements.

IV. EXPERIMENTAL ENVIRONMENT AND METHODOLOGY

This section describes the laboratory environment and exper-iments that we used to evaluate our throughput predictor.

2If � are the coefficients of a polynomial function, the function will tend tobe smooth (changes slowly) if �� is small or noisy if �� is large.


Fig. 1. Laboratory testbed. Cross traffic flowed across one of two routers at hop B, while probe traffic flowed through the other. Optical splitters connected EndaceDAG 3.5 and 3.8 passive packet capture cards to the testbed between hops B and C and hops C and D. Measurement traffic (file transfers, loss probes, and availablebandwidth probes) flowed from left to right. Congestion in the testbed occurred at hop C.

A. Experimental Environment

The laboratory testbed used in our experiments is shownin Fig. 1. It consisted of commodity end-hosts connectedto a dumbbell-like topology of Cisco GSR 12000 routers.Both measurement and background traffic was generated andreceived by the end-hosts. Traffic flowed from the sendinghosts on separate paths via Gigabit Ethernet to separate CiscoGSRs (hop B in the figure) where it was forwarded on OC12(622 Mb/s) links. This configuration was created in order toaccommodate a precision passive measurement system. Trafficfrom the OC12 links was then multiplexed onto a single OC3(155 Mb/s) link (hop C in the figure), which formed the bottle-neck where congestion took place. We used an AdTech SX-14hardware-based propagation delay emulator on the OC3 linkto add 25 ms delay in each direction for all experiments andconfigured the bottleneck queue to hold approximately 50 msof packets. Packets exited the OC3 link via another Cisco GSR12000 (hop D in the figure) and passed to receiving hosts viaGigabit Ethernet.

The measurement hosts and traffic generation hosts wereidentically configured workstations running FreeBSD 5.4.The workstations had 2-GHz Intel Pentium 4 processorswith 2 GB of RAM and Intel Pro/1000 network cards. Theywere also dual-homed, so all management traffic was on aseparate network than that depicted in Fig. 1. We disabledthe TCP throughput history caching feature in FreeBSD 5.4,controlled by the variable net.inet.tcp.inflight.enable, to allowTCP throughput to be determined by current path propertiesrather than throughput history.

A key aspect of our testbed was the measurement systemused to establish the true path properties for our evaluation.Optical splitters were attached to both the ingress and egresslinks at hop C, and Endace DAG 3.5 and 3.8 passive moni-toring cards were used to capture traces of all packets enteringand leaving the bottleneck node. By comparing packet headers,we were able to identify which packets were lost at the con-gested output queue during experiments and accurately mea-sure available bandwidth on the congested link. Furthermore,the fact that the measurements of packets entering and leavinghop C were synchronized at a very fine granularity (i.e., a singlemicrosecond) enabled us to precisely measure queuing delaysthrough the congested router.

B. Experimental Protocol

We generated background traffic by running the Harpoon IPtraffic generator [24] between up to four pairs of traffic gen-eration hosts as illustrated in Fig. 1. Harpoon produced open-loop self-similar traffic using a heavy-tailed file size distribu-tion, mimicking a mix of application traffic such as Web andpeer-to-peer applications common in today’s Internet. Harpoonwas configured to produce average offered loads ranging fromapproximately 60% to 105% on the bottleneck link (the OC3between hops C and D).

As illustrated in Fig. 2, measurement traffic in the testbed con-sisted of file transfers and active measurements of AB, L, and Q.We also measured all three metrics passively using packet traces.All measurements are aggregates for all flows on the path. Mea-surements were separated by 30-s intervals. For the measurementtraffic hosts, we increased the TCP receive window size to 128 kB(with the window scaling option turned on). In receive windowlimited transfers, file transfer throughput was approximately 21Mb/s. That is, if the available bandwidth on the bottleneck linkwas 21 Mb/s or more, the flow was receive window lim-ited; otherwise, it was congestion window limited. Weincreased the receive window size because we wanted file trans-fers to be -limited so we could evaluate our predictor thor-oughly; -limited transfers have constant throughput for agiven size, so any reasonable predictor performs well on

-limited transfers. For this reason, we do not report any re-sults for -limited flows.

Series of measurements were collected and then split up intomutually exclusive training and test sets for SVR. Notions ofseparate training and test sets are not required for history-basedmethods; rather, predictions are made over the continuous no-tion of distant and recent history. In our evaluation of history-based methods, we use the final prediction of the training set asthe starting point for the test set.

Fig. 2 shows that we gather three types of measurementsfor each experiment: Oracular Passive Measurements (OPM),Practical Passive Measurements (PPM), and Active Measure-ments (AM). Oracular measurements, taken during the filetransfer, give us perfect information about network conditions,so we can establish the best possible accuracy of our predictionmechanism. In practice, this oracular information is not avail-able for making predictions. AM, taken before the file transfer,are available in practice and can thus be used for prediction.


Fig. 2. Measurement traffic protocol and oracular and practical path measure-ments used for SVR-based throughput prediction.

PPM, passive measurements taken at the same time AM aretaken, show the best possible accuracy of our prediction mech-anism with practical measurements, or show how much betterSVR would perform if the active measurements had perfectaccuracy.

AB is defined as the spare capacity on the end-to-end path.YAZ estimates available bandwidth using a relatively low-over-head, iterative method similar to PATHLOAD [14]. The time re-quired to produce a single AB estimate can vary from a fewseconds to tens of seconds. The passive measurement value ofAB is computed by subtracting the observed traffic rate on thelink from the realizable OC3 link bandwidth.

We use BADABING for active measurements of L and Q.BADABING requires the sender and receiver to be time-syn-chronized. To accommodate our wide area experiments, theBADABING receiver was modified to reflect probes back to thesender, where they were timestamped and logged as on theoriginal receiver. Thus, the sender clock was used for all probetimestamps. BADABING reports loss in terms of loss episodefrequency and loss episode duration, defined in [25]. Althoughwe refer to loss frequency and loss duration together as L orloss for expositional ease, these two metrics are two differentfeatures for SVR. Q is defined as the difference between theRTT of the current BADABING probe and the minimum probeRTT seen on the path. We set the BADABING probe probabilityparameter to 0.3. Other parameters were set according to[25]. Passive values of Q and L are computed using the samedefinitions as above, except that they are applied to all TCPtraffic on the bottleneck link instead of BADABING probes.

For experiments in the wide area, we created a tool, PathPerf.This tool, designed to run between a pair of end-hosts, initiatesTCP file transfers and path property measurements (using ourmodified version of BADABING) and produces throughput esti-mates using our SVR-based method. It can be configured to gen-erate arbitrary file size transfers for both training and testing andinitiates retraining when level shifts are identified as describedin Section VII.

C. Evaluating Prediction Accuracy

We denote the actual throughput by and the predictedthroughput by . We use the metric relative prediction error

introduced in [11] to evaluate the accuracy of an individualthroughput prediction. Relative prediction error is definedas In what follows, we use thedistribution of the absolute value of to compare differentprediction methods.

V. BUILDING A ROBUST PREDICTOR

This section describes how we developed, calibrated, andevaluated our prediction mechanism through an extensive setof tests conducted in our lab testbed.

A. Calibration and Evaluation in the High-Traffic Scenario

The first step in developing our SVR-based throughput pre-dictor is to find the combination of training features that lead tothe most accurate predictions over a wide variety of path con-ditions. We trained the predictor using a feature vector for eachtest that contained a different combination of our set of targetpath measurements (AB, Q, L) and the measured throughput.The trained SVR model is then used to predict throughput fora feature vector containing the corresponding sets of networkmeasurements, and we compare the prediction accuracy for thedifferent combinations.

We also compare the accuracy of SVR to the exponentiallyweighted moving average (EWMA) History-Based Predictor(HB) described in [11], . We usean value of 0.3 because it is used in [11].

For most of the results reported, we generated an average of140 Mb/s of background traffic to create high utilization on ourOC3 (155 Mb/s) bottleneck link. We used one set of 100 ex-periments for training and another set of 100 experiments fortesting. An 8-MB file was transferred in each experiment.

Fig. 3(a)–(h) show scatter plots comparing the actual andpredicted throughput using different prediction methods as dis-cussed below. A point on the diagonal represents perfect pre-diction accuracy; the farther a point is from the diagonal, thegreater the prediction error.

1) Using Path Measurements From an Oracle: Fig. 3(a)shows the prediction accuracy scatter plot for the HB method.Fig. 3(b)-(g) show the prediction error with SVR using Orac-ular Passive Measurements (OPM) for different combinationsof path measurements in the feature vector. For example,SVR-OPM-Queue means that only queuing delay measure-ments were used to train and test, while SVR-OPM-Loss-Queuemeans that both loss and queuing delay measurements wereused to train and test.

Table I shows relative prediction errors for HB forecastingand for SVM-OPM-based predictions. Values in the table in-dicate the fraction of predictions for a given method within agiven accuracy level. For example, the first two columns of thefirst row in Table I mean that 32% of HB predictions have rel-ative prediction errors of 10% or smaller, while 79% of SVR-OPM-AB predictions have relative prediction errors of 10% orsmaller. We present scatter plots in addition to tabular data toprovide insight into how different path properties contribute tothroughput prediction in the SVR method.

From Fig. 3(a), we can see that the predictions for the HBpredictor are rather diffusely scattered around the diagonal, andthat predictions in low-throughput conditions tend to have largerelative error. Fig. 3(b)-(d) show the behavior of the SVR pre-dictor using a single measure of path properties in the feature


Fig. 3. Comparison of prediction accuracy of HB, SVR-OPM, and SVR-AM in high background traffic conditions. (a) HB. (b) SVR-OPM-AB. (c) SVR-OPM-loss. (d) SVR-OPM-queue. (e) SVR-OPM-AB-loss. (f) SVR-OPM-loss-queue. (g) SVR-OPM-AB-loss-queue. (h) SVR-AM-loss-queue.

TABLE IRELATIVE ACCURACY OF HISTORY-BASED (HB) THROUGHPUT PREDICTION AND SVR-BASED PREDICTORS USING DIFFERENT TYPES OF ORACULAR PASSIVE

PATH MEASUREMENTS (SVR-OPM) IN THE FEATURE VECTOR. TABLE VALUES INDICATE THE FRACTION OF PREDICTIONS WITHIN A GIVEN ACCURACY LEVEL

vector—AB, L, and Q, respectively. The AB and Q graphs havea similar overall trend: Predictions are accurate (i.e., points areclose to the diagonal) for high actual throughput values, but farfrom the diagonal, almost in a horizontal line, for lower valuesof actual throughput. The L graph has the opposite trend: Pointsare close to the diagonal for lower values of actual throughputand form a horizontal line for higher values of actual throughput.

The explanation for these trends lies in the fact that file trans-fers with low actual throughput experience loss, while file trans-fers with high actual throughput do not experience any loss.When loss occurs, the values of AB and Q for the path are nearlyconstant. AB is almost zero, and Q is the maximum possiblevalue (which depends on the amount of buffering available atthe bottleneck link in the path). In this case, throughput dependson the value of L. Hence, L appears to be a good predictor whenthere is loss on the path, and AB and Q, being constants in thiscase, have no predictive power, resulting in horizontal lines, i.e.,a single value of predicted throughput. On the other hand, whenthere is no loss, L is a constant with value zero, so L has no pre-

dictive power, while AB and Q are able to predict throughputquite accurately.

Fig. 3(e) and (f) show improvements in prediction accuracyobtained by using more than one path property in the SVR fea-ture vector. We can see that when L is combined with AB orQ, the horizontal lines on the graphs are replaced by pointsmuch closer to the diagonal. Combining AB or Q with L allowsSVR to predict accurately in both lossy and lossless conditions.Since both AB and Q help predict throughput in lossless condi-tions, do we really need both AB and Q, or can we use just oneof the two and still achieve the same prediction accuracy? Toanswer this question, we compared AB-Loss and Loss-Queuepredictions with each other and with AB-Loss-Queue predic-tions [i.e., Figs. 3(e)–(g)]. The general trend in all three casesis the same: The horizontal line of points is reduced or elimi-nated, suggesting that prediction from nonconstant-value mea-surements is occurring for both lossy and lossless network con-ditions. If we compare the AB-Loss and Loss-Queue graphsmore closely, we observe two things. First, in the lossless predic-


TABLE IIRELATIVE ACCURACY OF HISTORY-BASED (HB) THROUGHPUT PREDICTION

AND SVR-BASED PREDICTORS USING TRACE-BASED PASSIVE PATH

MEASUREMENTS (PPM) OR ACTIVE PATH MEASUREMENTS (AM).TABLE VALUES INDICATE THE FRACTION OF PREDICTIONS

WITHIN A GIVEN ACCURACY LEVEL

tion case, the points are closer to the diagonal in the Loss-Queuecase than in the AB-Loss case. Second, in the Loss-Queue case,the transition in the prediction from the lossless to the lossy caseis smooth, i.e., there is no horizontal line of points, while in theAB-Loss case there is still a horizontal line of points in the actualthroughput range of 11–14 Mb/s. This suggests that Q is a moreaccurate predictor than AB in the lossless case. The relative pre-diction error data of Table I supports this: SVR with a featurevector containing Loss-Queue information predicts throughputwithin 10% of actual for 87% of transfers, while a feature vectorcontaining AB-Loss measurements predicts with the same ac-curacy level for 78% of transfers. Finally, there is no differ-ence in accuracy (either qualitatively or quantitatively) betweenLoss-Queue and AB-Loss-Queue.

The above discussion suggests that AB measurements are notrequired for highly accurate throughput prediction and that acombination of L and Q is sufficient. This observation is notonly surprising, but rather good news. Prior work has shownthat accurate measurements of AB require at least moderateamounts of probe traffic [22], [26], and some formula-basedTCP throughput estimation schemes take as a given that ABmeasurements are necessary for accurate throughput prediction[11]. In contrast, measurements of L and Q can be very light-weight probe processes [25]. We discuss measurement overheadfurther in Section VII.

2) Using Practical Passive and Active Path Measurements:So far, we have considered the prediction accuracy of SVRbased on only Oracular Passive Measurements (OPM). Thisgives us the best-case accuracy achievable with SVR and alsoprovides insight into how SVR uses different path propertiesfor prediction. Table I shows that HB predicts 32% of transferswithin 10% of actual, while SVR-OPM-Loss-Queue predicts87%, an almost threefold improvement. In practice, however,perfect measurements of path properties are not available, so inwhat follows we assess SVR using measurements that can beobtained in practice in the wide area.

Table II presents relative prediction error data for HB,SVR-PPM, and SVR-AM. Due to space limitations, we presentonly Loss-Queue and AB-Loss-Queue results for SVR. Wechoose these because we expect AB-Loss-Queue to have thehighest accuracy as it has the most information about path prop-erties, and Loss-Queue because it is very lightweight and hasaccuracy equal to AB-Loss-Queue for SVR-OPM. We wish to

examine three issues: first, whether our finding that Loss-Queuehas the same prediction accuracy as AB-Loss-Queue from theSVR-OPM case holds for the SVR-PPM and SVR-AM case;second, whether SVR-PPM and SVR-AM have the same ac-curacy; and third, how SVR-AM accuracy compares with HBprediction accuracy.

All prediction accuracy results in the SVR-PPM and SVR-AMcolumns in Table II are very similar. AB-Loss-Queue has ap-proximately the same accuracy as Loss-Queue for bothSVR-PPM and SVR-AM. This is encouraging because it is con-sistent with the observation from SVR-OPM, i.e., that we canachieve good prediction accuracy without having to measureAB. SVR-AM has accuracy similar to SVR-PPM, i.e., usingactive measurement tools to estimate path properties yieldspredictions almost as accurate as having ground-truth passivemeasurements. This is important because in real wide-areapaths, instrumentation is generally not available for collectingaccurate passive measurements.

Finally, we compare HB with SVR-AM. AlthoughFig. 3(a) and 3(h) are qualitatively similar, SVR-AM has atighter cluster of points around the diagonal for high actualthroughput than HB. Thus, SVR-AM appears to have higheraccuracy than HB. As Table II shows, SVR-AM-Loss-Queuepredicts throughput within 10% of actual accuracy 49% ofthe time, while HB does so only 32% of the time. Hence, forhigh-traffic scenarios, SVR-AM-Loss-Queue, the practicallydeployable lightweight version of the SVR-based predictionmechanism, significantly outperforms HB prediction.

The above observations raise the question of why SVRperforms so much better than HB at high accuracy levels. HB isa weighted average that works on the assumption that transfersclose to each other in time will have similar throughputs.HB loses accuracy when there are rapid and large changesin throughput because in that case there is a large differencebetween the weighted average and individual throughputs.SVR predicts throughput by constructing a function mappingpath properties to throughput values. This function is ableto capture complex nonlinear relationships between the pathproperties and throughput. As long as the mapping betweenpath properties and resulting throughput is consistent, SVRcan predict throughput accurately, regardless of the variationbetween the throughput of neighboring transfers. SVR accuracybreaks down only when the mapping between path propertiesand throughput becomes inconsistent, e.g., when path condi-tions are so volatile that they change between the active pathmeasurement and the file transfer, or when an aggregate pathmeasure does not hold for the transfer, such as when there isloss on the bottleneck link, but the file transfer experiencesmore or less loss than the aggregate traffic. Thus, SVR outper-forms HB because: 1) it has access to additional information,in the form of path properties such as AB, L, and Q, while HBhas access only to past throughputs; and 2) it constructs a muchmore sophisticated functional mapping of path properties tothroughput compared to the simple time-series analysis thatHB uses. The fact that SVR’s supremacy over HB is limited tohigh accuracy levels, and HB catches up with SVR for loweraccuracy levels (predicted throughput within 40%–90% ofactual, Table II), is a feature of the distribution of throughputsin our test set. We can see from the graphs in Fig. 3(a)–(h)that the average throughput of the test set is between 10 and


Fig. 4. Normal Q-Q plots for prediction errors with (a) oracular measurements(SVR-OPM) and (b) active measurements (SVR-AM) of Loss-Queue.

15 Mb/s. We expect the HB prediction to be around the averagethroughput. Fig. 3(a) shows that this is indeed the case: All HBpredictions are between 10 and 15 Mb/s. All but one transfershave actual throughput between 5 and 18 Mb/s, so the relativeerror is 100% or less for lowthroughputs (all predictions within a factor of two of actual)and 80% or less for high through-puts. If the range of actual throughputs had been wider but theaverage had been the same, HB would have been less able tomatch SVR accuracy for relative error values of 40%–90%.

A remaining question is why Q is a better predictor ofthroughput than AB. When we compare AB and Q predic-tion behavior in Fig. 3(b) and (d), we see that as throughputdecreases, i.e., as the network moves from lossless to lossyconditions, AB loses its predictive power sooner than Q does,around 15 Mb/s instead of 11 Mb/s for Q. As soon as thelink experiences loss, AB is zero. While Q eventually reachesa maximum value under sufficiently high congestion, underlight loss it retains its resolution because Q will be differentdepending on the percentage of packets that experience themaximum possible queuing delay and those that experience lessthan the maximum. Hence, Q is a better predictor of throughputthan AB most likely because it can be measured with betterresolution for a wider range of network conditions.

3) The Nature of Prediction Error: Lu et al. [16] observedin their study that throughput prediction errors were approxi-mately normal in distribution. As the authors noted, normalitywould justify standard computations of confidence intervals. Weexamined the distribution of errors in our experiments and alsofound evidence suggestive of normality.

Fig. 4 shows two normal quantile-quantile (Q-Q) plots forSVR-OPM-Loss-Queue [Fig. 4(a)] and SVR-AM-Loss-Queue[Fig. 4(b)]. Samples that are consistent with a normal distribu-tion form approximately a straight line in the graph, particularlytoward the center. In Fig. 4(a) and (b), we see that the throughputprediction samples in each case form approximately straightlines. These observations are consistent with normality in thedistribution of prediction errors. Error distributions from otherexperiments were also consistent with the normal distributionbut are not shown due to space limitations. We further discussthe issues of retraining and of detecting estimation problems inSection VII.

B. Evaluation of Prediction Accuracy for Different File Sizes

Thus far, we have predicted throughput only for bulk transfersof 8 MB. However, our goal is accurate prediction for a range of

TABLE IIIRELATIVE ACCURACY OF SVR-BASED PREDICTOR USING ORACULAR PASSIVE

MEASUREMENTS (OPM) AND TRAINING SETS CONSISTING OF ONE, TWO,THREE, SIX, OR EIGHT DISTINCT FILE SIZES

file sizes, not just bulk transfers, so in this section we evaluateprediction accuracy for a broad range of file sizes, somethingnot considered in prior HB prediction studies.

We conducted experiments using background traffic at an av-erage offered load of 135 Mb/s. We used nine different trainingsets, each of size 100, consisting of between one and nine uniquefile sizes. The file sizes for the training sets were between 32 kB

bytes and 8 MB bytes . The first training set con-sisted of 100 8-MB files; the second of 50 32-kB files and 508-MB files; the third of 33 each of 32-kB bytes , 512-kB

bytes , and 8-MB bytes files; the fourth of 25 each of-, -, -, and -byte files; and so on, dividing the

range of the exponent, 15 to 23, into more numerous but smallerintervals for each subsequent training set.

Test sets for our experiments consist of 100 file sizes between2 kB bytes and 8 MB bytes . To generate file sizes forthe test set, uniformly distributed random numbers between 11and 23 were generated. The file size was set to bytes, roundedto the closest integer; note that kB, and MB.This log-based distribution of file sizes contains a much greaternumber of smaller files than a uniform linear distribution offile sizes from 2 kB to 8 MB would. The bias toward smallerfile sizes is important for proper evaluation of predictor perfor-mance for small files because it generates a more even distribu-tion between files that spend all their transfer time in slow-start,those that spend some time in slow-start and some in congestionavoidance, and those that spend a negligible amount of time inslow-start. If the uniform linear distribution had been used, mostof the file sizes generated would have spent a negligible amountof time in slow-start. We also use a wider range of small filesin the test set compared to the training set—the test set startswith 2-kB files, while the training set starts with 32 kB—be-cause we want to see how well our predictor extrapolates forunseen ranges of small file sizes.

Tables III and IV present prediction accuracy for training setsconsisting of one, two, three, six, or eight distinct file sizes forSVR-OPM and SVR-AM. Graphs in Fig. 5 present SVR-AM re-sults for one, two, and three file sizes in the training set. We donot include graphs for remaining training sets due to space lim-itations: They are similar to those for two and three file sizesin the training set. The first observation is that for the singlefile size in training, the prediction error is very high. This in-accuracy is expected because the predictor has been given noinformation about the relationship between size and throughputfor small files. The second observation is that for more than one


Fig. 5. Scatter plots for the SVR-based predictor using one, two, or three distinct file sizes in the training set. All results shown use active measurements to trainthe predictor. Testing is done using a range of 100 file sizes from 2 kB to 8 MB. (a) SVR-AM with file size of 8 MB in training set. (b) SVR-AM with file sizes of32 kB and 8 MB in training set. (c) SVR-AM with file sizes of 32 kB, 512 kB, and 8 MB in training set.

TABLE IVRELATIVE ACCURACY OF SVR-BASED PREDICTOR USING ACTIVE

MEASUREMENTS (AM) AND TRAINING SETS CONSISTING OF ONE,TWO, THREE, SIX, OR EIGHT DISTINCT FILE SIZES

file size, prediction becomes dramatically more accurate, i.e.,the predictor is able to successfully extrapolate from a handfulof sizes in training to a large number of sizes in testing. Thethird observation is that relative error is low for large file sizes(corresponding to high actual throughput), while it is higher forsmall files (low actual throughput), or in other words, predictingthroughput accurately for small files is more difficult than forlarge files. For small files, throughput increases rapidly with filesize during TCP’s slow-start phase due to the doubling of thewindow every RTT. Also, packet loss during the transfer of asmall file has a larger relative impact on throughput. Due tothe greater variability of throughput for small files, predictingthroughput accurately is more difficult. The fourth observationis that for small file sizes (i.e., small actual throughput), the erroris always that of overprediction. The smallest file in the trainingset is 32 kB, while the smallest file in the test set is 2 kB. Thisdifference is the cause of overprediction errors: Given the highvariability of throughput for small file sizes due to slow-start ef-fects and losses having a greater relative impact on throughput,without a broader training set, the SVR mechanism is unable toextrapolate accurately for unseen ranges of small file sizes.

A final observation is that prediction accuracy reaches a max-imum at three file sizes in the training set, and there is no cleartrend for four to nine file sizes in the training set. The numberof transfers in our training set is always constant at 100, so forthe single training size, there are 100 8-MB transfers for the twotraining sizes, there are 50 32-kB transfers and 50 8-MB trans-fers, and so on. We believe that accuracy is maximum at threetraining sizes in our experiments because there is a tradeoff be-tween capturing a diversity of file sizes and the number of sam-

ples for a single file size. We would not see maximum accuracyoccurring at three file sizes and would instead see an increase inaccuracy with increasing number of file sizes in the training setif we kept the number of samples of a single file size constant at100 in the training set and allowed the size of the training set toincrease from 100 to 200, 300, etc., as we increase the numberof file sizes in the training set.

VI. WIDE AREA EXPERIMENTS

To further evaluate our SVR-based TCP throughput predic-tion method, we created a prototype tool called PathPerf thatcan generate measurements and make forecasts on wide areapaths. We used PathPerf to conduct experiments over a set ofpaths in the RON testbed [3]. This section presents the resultsof our wide area experiments.

A. The RON Experimental Environment

The RON wide area experiments were conducted in January2007 over 18 paths between seven different node locations. Twonodes were in Europe (in Amsterdam, The Netherlands, andLondon, U.K.), and the remainder were located at universitiesin the continental United States (Cornell University, Ithaca,NY; University of Maryland, College Park; University of NewMexico, Albuquerque; New York University, New York; andUniversity of Utah, Salt Lake City). Of the 18 paths, two aretrans-European, nine are trans-Atlantic, and seven are transcon-tinental U.S. The RON testbed has a significantly larger numberof available nodes and paths, but two considerations limited thenumber of nodes and paths that we could use. The first consid-eration was that the nodes should have little or no other CPUor network load while our experiments were running; this isrequired for BADABING to measure loss accurately. The secondissue was that we could not use any nodes running FreeBSD 5.xbecause the TCP throughput history caching feature in FreeBSD5.x, controlled by the variable net.inet.tcp.inflight.enable, is onby default and interfered with our experiments. Consequently,we were restricted to using nodes running FreeBSD 4.7, whichdoes not have the throughput history caching feature.

We use the following convention for path names: A–B meansthat A is the TCP sender and B is the receiver, i.e., the TCPdata flow direction is from A to B, and TCP ack flow is in thereverse direction. A–B and B–A are considered two differentpaths because routing and background traffic level asymmetrybetween the forward and reverse directions can lead to majordifferences in throughput.


Fig. 6. Wide area results for HB and SVR predictions. The HB and SVR columns present the fraction of predictions with relative error of 10% or less. Trainingand test sets consisted of 100 samples each. Two megabytes of data were transferred. Labels use node initials; see Section V for complete node names.

TABLE VMINIMUM RTTS FOR THE 18 RON PATHS USED FOR WIDE-AREA EXPERIMENTS

The wide area measurement protocol was the following:1) run BADABING for 30 s; 2) transfer a 2-MB file; 3) sleep 30 s;and 4) repeat the above 200 times.

In Section V, we showed that available bandwidth measure-ments are not required for accurate TCP throughput predic-tion, so we omit running YAZ in the wide area. Thus, the widearea prediction mechanism is the SVR-AM-Loss-Queue protocolfrom Section V. We reduced the size of the file transfers from 8MB in the lab experiments to 2 MB in the wide area experimentsbecause the typical throughput in the wide area was an order ofmagnitude lower compared to the typical throughput in the lab(1–2 Mb/s versus 10–15 Mb/s).

The measurements were carried out at different times of theday for different paths depending upon when the nodes wereavailable, i.e., when the nodes had little or no other CPU andnetwork activity in progress. Depending again on node avail-ability, we ran a single set of experiments on some paths andmultiple sets on others. We can only conduct active measure-ments in the wide area because the infrastructure required forpassive measurements is not present.

Table V lists the minimum round-trip times observed on thewide area paths over all BADABING measurements taken on eachpath. The shortest paths have a minimum RTT of 8 ms, and thelongest a minimum RTT of 145 ms. In Table V, we ignore pathdirections, e.g., we list Amsterdam–London and London–Ams-terdam as a single entry because minimum RTT is the same inboth directions.

B. Wide Area Results

Fig. 6 compares the accuracy of the SVR and HB throughputpredictions for the 18 wide area paths. The results in Fig. 6 are

obtained by dividing the 200 measurements gathered for eachpath into two consecutive sets of 100, using the first 100 as thetraining set and the second 100 as the test set. Some paths featuremore than once because node availability allowed us to repeatexperiments on those paths.

We observe two major trends from Fig. 6. First, for themajority of experiments, prediction accuracy is very highfor both HB and SVR: Most paths have greater than 85%of predictions within 10% of actual throughput. Second,for five out of the 26 experiments—Cornell–Amsterdam,London–Maryland–1, London–Utah–2, London–Utah–5, andNYU–Amsterdam—the SVR prediction accuracy is quite low,as low as 25% for London–Utah–2. The reasons for this pooraccuracy will be analyzed in detail in Section VI-C.

Next, we take a more detailed approach to assessing predic-tion accuracy. We divide the 200 samples into sets of and

for values of from 1 to 199. The first samples areused for training, and the remaining samples are usedfor testing. This allows us to understand the tradeoff betweentraining set size and prediction accuracy, which is important fora practical online prediction system where it is desirable to startgenerating predictions as soon as possible. This analysis alsoallows us to identify the points in a trace where an event thathas an impact on prediction accuracy, such as a level shift, oc-curs and whether retraining helps maintain prediction accuracyin the face of changing network conditions. We present this datafor SVR only; for HB, there is no division of data into trainingand test sets because retraining occurs after every measurement.

Fig. 7 presents the SVR prediction accuracy for , 5,and 10 for those experiments in Fig. 6 that had high predic-tion accuracy for . For all but three of the experimentsin Fig. 7, there is little difference between prediction accuracyfor training set sizes of 1, 5, 10, and 100. This is because thereis little variation in the throughput observed during the experi-ments. A path with little or no variation in observed throughputover the course of an experimental run is the easy case for bothSVR and HB throughput predictors, so these experiments willnot be discussed any further in this paper. For three experimentsin Fig. 7, Amsterdam–London, London–Utah–1, and Utah–Cor-nell, the prediction accuracy for values of 1, 5, and 10 is sig-nificantly lower compared to that for . The reason forthis poor accuracy will be discussed in detail in Section VI-C.

C. Detailed Analysis of Wide Area Results

In this section, we analyze the reasons for poor SVR predic-tion accuracy for five paths from Fig. 6 (Cornell–Amsterdam,


Fig. 7. Effect of training set size on SVR prediction accuracy in the wide area. For paths with high prediction accuracy in Fig. 6, this table shows how large atraining set size was needed to achieve high accuracy. Labels use node initials; see Section V for complete node names.

Fig. 8. London–Utah–5: An example of a wide area path with level shifts. Timeon the x-axis is represented in terms of file transfer number. (a) Throughputprofile. (b) Prediction accuracy.

London–Maryland–1, London–Utah–2, London–Utah–5,and NYU–Amsterdam) and three paths from Fig. 7 (Ams-terdam–London, London–Utah–1, and Utah–Cornell). We findthat there are two dominant reasons for poor SVR predictionaccuracy: background traffic level shifts and changes in back-ground traffic on small timescales. We find that retraining canimprove prediction accuracy for level shifts but not for changesin network conditions on small timescales.

In the analysis that follows, we use a pair of graphs perexperiment to illustrate the details of each experiment. Con-sider Fig. 8(a) and (b) for London–Utah–5. The first graph, thethroughput profile, is a time series representation of the actualthroughput observed during the experiment. The second graph,prediction accuracy, is a more detailed version of the bar graphsof Fig. 7, incorporating all possible values of . At an x value of

, the first of the 200 samples are used as the training set, andthe remaining samples as the test set. The y value is thefraction of samples in the test set of whose predictedthroughput is within 10% of the actual throughput. As the valueof increases, the training set becomes larger compared to thetest set because the total number of samples is fixed at 200. Forvalues of close to 200, the test set is very small, and thus theprediction accuracy values, whether they are high or low, arenot as meaningful.

1) Level Shifts: Level shifts were responsible for poorprediction accuracy in four experiments: Utah–Cornell,London–Utah, London–Utah–2, and London–Utah–5. Wediscuss London–Utah–5 in detail as a representative example.

Fig. 8(a) and (b) are the throughput profile and prediction ac-curacy graphs for London–Utah–5. Fig. 8(a) shows that a level

shift from a throughput of 1.4 to 0.6 Mb/s occurs after 144 filetransfers, and a level shift from 0.6 back to 1.4 Mb/s occursafter 176 transfers. Fig. 8(b) shows that prediction accuracy de-creases from 0.84 at one file transfer to a global minimum of0.40 at 144 file transfers, and then increases very rapidly to 1.00at 145 file transfers. This result can be explained by the factthat the level shift is not included in the training data. Thus, theprediction function will generate samples at the first throughputlevel accurately and those at the second throughput level inac-curately. Prediction accuracy decreases from 1 to 144 becausethere are relatively fewer samples at the first level and moreat the second level in the test set as the value of x increases,so there are relatively more inaccurate predictions, leading toa decreasing trend in prediction accuracy. If the division intotraining and test sets is at the level shift boundary, in this case thefirst 144 file transfers, the training set consists only of measure-ment samples before the level shift and the test set of samplesafter the level shift. All predictions will be inaccurate (assumingthe level shift changes the throughput by more than 10%, ourthreshold) because the prediction function has never encoun-tered the new throughput level. Hence, we observe minimumprediction accuracy at the level shift boundary, i.e., 144 trans-fers. If the division into training and test set includes a levelshift, some samples with the new network conditions and resul-tant new throughput are included in the training set. The predic-tion function is now aware of the new network conditions andis able to make better predictions in the new conditions. In ourexample, including only one sample from after the level shiftin the training set, the 145th sample, is sufficient to allow allthroughputs at the lower levels to be predicted accurately. Thatis, the SVR predictor needs the minimum possible training setsize (one single sample) for the new network conditions beforeit can generate accurate predictions.

Fig. 9(a) and (b) compare the behavior of SVR and HBpredictors for a level shift. The training set for SVR consistedof the first 145 samples, i.e., 144 samples at the first throughputlevel and one sample at the second throughput level. The testset consisted of the remaining 55 samples. For HB, recallthat there is no separation into training and test sets, and re-training occurs after every measurement sample. ComparingFig. 9(a) (SVR) and (b) (HB), we see that the SVR-predictedthroughput follows the actual throughput very closely, whilethe HB-predicted throughput takes some time to catch up withactual throughput after a level shift. If the SVR predictor has


Fig. 9. London–Utah–5: Comparison of SVR and HB prediction accuracyfor background traffic with level shifts. The x-axis is the file transfer timelinestarting at 145. The y-axis is throughput. (a) SVR prediction accuracy. (b) HBprediction accuracy.

Fig. 10. London–Maryland–1: An example of a wide area path with throughputchanges on small timescales. (a) Throughput profile. (b) prediction accuracy.

knowledge of the level shift, its prediction accuracy is muchbetter than that of HB. After the second level shift (176 sam-ples) no further training of the SVR predictor is required topredict the remaining 23 correctly. The predicted throughput inFig. 9(a) follows the actual throughput closely at the level shiftafter 176 transfers even though the training set consists of onlythe first 146 samples.

The above example shows that the SVR predictor has twoadvantages over the HB predictor. First, it can adapt instanta-neously, i.e., after a single training sample, to a level shift, whilethe HB predictor takes longer. Second, it shows that unlike theHB predictor, the SVR predictor needs to be trained only oncefor a given set of conditions. The results for the other three widearea experiments that contained level shifts are similar to thoseof the London–Utah–5 experiment (omitted due to space con-siderations).

2) Changes Over Small Timescales: Network condi-tion changes over small timescales reduced SVR predictionaccuracy for Amsterdam–London, Cornell–Amsterdam,London–Maryland–1, and NYU–Amsterdam. We considerLondon–Maryland–1 as a representative example.

Fig. 10(a) and (b) present the throughput profile and pre-diction accuracy of the London–Maryland–1 experiment.Fig. 10(a) shows that until about the 60th file transfer,throughput is fairly steady around 2.7 Mb/s, after which itstarts to vary widely in the range between approximately 1.2and 2.7 Mb/s. Fig. 10(b) shows that prediction accuracy is ata maximum of 65% at one file transfer, decreases in accuracybetween one and 60 transfers, and varies between 50% and60% between 60 and 180 transfers.

Unlike for level shifts, after the throughput profile changesand measurement samples of the new network conditions are in-cluded in the training set, prediction accuracy does not improve.

Recall that we measure network conditions using BADABING for30 s and then transfer a file. In this experiment, after the 60thtransfer, the network conditions vary so rapidly that they changebetween the BADABING measurement and the file transfer. Con-sequently, there is no consistent mapping between the measurednetwork conditions and the transfer throughput, so a correctSVR predictor cannot be constructed, leading to poor predic-tion accuracy.

The HB predictor also performs poorly in these conditions.As shown in Fig. 6, for 100 training samples for London–Mary-land–1, HB has a prediction accuracy of 59%, while SVR hasan accuracy of 55%. When network conditions vary as rapidlyas in the above example, it is not possible to predict throughputaccurately using either SVR or HB because of the absence ofa consistent relationship in network conditions just before andduring a file transfer.

One difference we observed in experiments with levelshifts versus experiments with throughput changes on smalltimescales was that level shifts occurred under lossless networkconditions, while throughput changes on small timescalesoccurred under conditions of sustained loss. Thus, if only theaverage queuing delay on a network path changes, we observe alevel shift; if a network goes from a no-loss state to a lossy state,we observe throughput changes on small timescales. A possibleexplanation is that if the network is in a lossy state, a particularTCP flow may or may not experience loss. Since TCP backsoff aggressively after a loss, flows that do experience loss willhave significantly lower throughput compared to those that donot experience loss, leading to large differences in throughputof flows. However, if only average queuing delay on a pathchanges, every flow on the path (unless it is extremely short)experiences the change in queuing delay, leading to a change inthroughput of all flows, i.e., a level shift, on the path.

VII. DISCUSSION

This section addresses two key issues related to running Path-Perf in operational settings.

Network Load Introduced by PathPerf: Traffic introduced byactive measurement tools is a concern because it can skew thenetwork property being measured. Furthermore, network opera-tors and engineers generally wish to minimize any impact mea-surement traffic may have on customer traffic.

For history-based TCP throughput estimation methods, theamount of traffic introduced depends on the measurement pro-tocol. For example, two approaches may be taken. The firstmethod is to periodically transfer fixed-size files; the second isto allow a TCP connection to transfer data for a fixed time pe-riod. The latter approach was taken in the history-based evalua-tion of He et al. [11]. To estimate the overhead of a history-basedpredictor using fixed-duration transfers, assume that: 1) the TCPconnection is not -limited; 2) that the fixed duration of datatransfer is 50 s (as in [11]); 3) throughput measurements are ini-tiated every 5 min; and 4) throughput closely matches availablebandwidth. For this example, assume that the average availablebandwidth for the duration of the experiment is approximately50 Mb/s. Over a 30-min period, nearly 2 GB in measurementtraffic is produced, resulting in an average bandwidth of about8.3 Mb/s.

In the case of PathPerf, measurement overhead in trainingconsists of file transfers and queuing/loss probe measurements.


In testing, overhead consists solely of loss measurements.Assume that we have a 15- min training period followed by a15-min testing period. Assume that file sizes of 32 kB, 512 kB,and 8 MB are transferred during the training period, using10 samples of each file, and that each file transfer is preceededby a 30-s BADABING measurement. With a probe probability of0.3, BADABING traffic for each measurement is about 1.5 MB.For testing, only BADABING measurements must be ongoing.Assume that a 30-s BADABING measurement is initiated every3 min. Thus, over the 15-min training period about 130 MBof measurement traffic is produced, resulting in an averagebandwidth of about 1.2 Mb/s for the first 15 min. For the testingperiod, a total of 7.5 MB of measurement traffic is produced,resulting in a rate of about 66 kb/s. Overall, PathPerf produces633 kb/s on average over the 30-min measurement period, dra-matically different from a standard history-based measurementapproach. Even if more conservative assumptions are madeon the history-based approach, the differences in overhead aresignificant. Again, the reason for the dramatic savings is thatonce the SVR predictor has been trained, only lightweightmeasurements are required for accurate predictions.

Detecting Problems in Estimation: An important capabilityfor throughput estimation in live deployments is to detect whenthere are significant estimation errors. Such errors could beindicative of a change in network routing, causing an abruptchange in delay, loss, and throughput. It could also signala pathological network condition, such as an ongoing de-nial-of-service attack leading to endemic network loss alonga path. On the other hand, it may simply be a measurementoutlier with no network-based cause.

As discussed in Section V-A3, normality allows us to usestandard statistical machinery to compute confidence intervals(i.e., using measured variance of prediction error). We show thatprediction errors are consistent with a normal distribution andfurther propose using confidence intervals as a mechanism fortriggering retraining of the SVR as follows. Assume that wehave trained the SVR predictor over measurement periods(i.e., we have throughput samples and samples of and ).Assume that we then collect additional throughput samples,making predictions for each sample and recording the error. Wetherefore have error samples between what was predicted andwhat was subsequently measured. Given a confidence level, e.g.,95%, we can calculate confidence intervals on the sample errordistribution. We can then, with low frequency, collect additionalthroughput samples to test whether the prediction error exceedsthe interval bounds. (Note that these additional samples may beapplication traffic for which predictions are used.) If so, we maydecide that retraining the SVR predictor is appropriate. A dangerin triggering an immediate retraining is that such a policy maybe too sensitive to outliers regardless of the confidence intervalchosen. More generally, we can consider a threshold of con-secutive prediction errors that exceed the computed confidenceinterval bounds as a trigger for retraining.

Predictor Portability: In this paper, since we have only con-sidered predictors trained and tested on the same path, a numberof features, such as the identity of a path and MTU, are im-plicit. This makes SVR simpler to use for path-specific pre-diction because a lot of path details that would be required byother approaches, such as formula-based prediction, can safelybe hidden from the user.

If two paths are similar in all features, implicit and explicit,then a predictor trained on one path using only Q and L (andpossibly AB) can be used directly for the other path. If pathsdiffer in implicit features, then those features will have to beadded explicitly to the predictor to allow the predictor to beportable. Fortunately, SVR extends naturally to include newfeatures. If a complete set of features effecting TCP throughputcan be compiled, and there is a wide enough set of consistenttraining data available, then a universal SVR predictor thatworks on just about any wireline Internet path might be a possi-bility. A potential challenge here is identifying and measuringall factors—path properties, TCP flavors, operating systemparameters—that might affect throughput. However, this chal-lenge can be surmounted to quite an extent by active pathmeasurements, using TCP header fields, and parsing operatingsystems configuration files to include operating parameters asSVR features.

ACKNOWLEDGMENT

The authors would like to thank D. Andersen for providingaccess to the RON testbed and A. Bizzaro for help with theWAIL lab testbed. They also thank the anonymous reviewersfor their helpful suggestions.

REFERENCES

[1] A. Abouzeid, S. Roy, and M. Azizoglu, “Stochastic modeling of TCPover lossy links,” in Proc. IEEE INFOCOM, Tel-Aviv, Israel, Mar.2000, vol. 3, pp. 1724–1733.

[2] E. Altman, K. Avachenkov, and C. Barakat, “A stochastic model ofTCP/IP with stationary random loss,” in Proc. ACM SIGCOMM, Aug.2000, pp. 231–242.

[3] D. G. Andersen, H. Balakrishnan, M. Frans Kaashoek, and R. Morris,“Resilient overlay networks,” in Proc. 18th ACM SOSP, Banff, AB,Canada, Oct. 2001, pp. 131–145.

[4] M. Arlitt, B. Krishnamurthy, and J. Mogul, “Predicting short-transferlatency from TCP arcana: A trace-based validation,” in Proc. ACMIMC, Berkeley, CA, Oct. 2005, p. 19.

[5] H. Balakrishnan, S. Seshan, M. Stemm, and R. Katz, “Analyzingstability in wide-area network performance,” in Proc. ACM SIGMET-RICS, Seattle, WA, Jun. 1997, pp. 2–12.

[6] P. Barford and M. Crovella, “Critical path analysis of TCP transac-tions,” in Proc. ACM SIGCOMM, Stockholm, Sweden, Aug. 2000, pp.127–138.

[7] R. Beverly, K. Sollins, and A. Berger, “SVM learning of IP addressstructure for latency prediction,” in Proc. SIGCOMM Workshop MiningNetw. Data, Pisa, Italy, Sep. 2006, pp. 299–304.

[8] N. Cardwell, S. Savage, and T. Anderson, “Modeling TCP latency,”in Proc. IEEE INFOCOM, Tel-Aviv, Israel, Mar. 2000, vol. 3, pp.1742–1751.

[9] “Cooperative association for Internet data analysis,” 2006 [Online].Available: http://www.caida.org/tools

[10] M. Goyal, R. Buerin, and R. Rajan, “Predicting TCP throughput fromnon-invasive network sampling,” in Proc. IEEE INFOCOM, New York,Jun. 2002, vol. 1, pp. 180–189.

[11] Q. He, C. Dovrolis, and M. Ammar, “On the predictability of largetransfer TCP throughput,” in Proc. ACM SIGCOMM, Philadelphia, PA,Aug. 2005, pp. 145–156.

[12] K. Ilgun, R. Kemmerer, and P. Porras, “State transition analysis: Arule-based intrusion detection approach,” IEEE Trans. Softw. Eng., vol.21, no. 3, pp. 181–199, Mar. 1995.

[13] V. Jacobson and M. Karels, “Congestion avoidance and control,” inProc. ACM SIGCOMM, Palo Alto, CA, Aug. 1988, pp. 314–329.

[14] M. Jain and C. Dovrolis, “End-to-end available bandwidth measure-ment methodology, dynamics and relation with TCP throughput,”IEEE/ACM Trans. Networking, vol. 11, no. 4, pp. 537–549, Aug. 2003.

[15] T. Joachims, “Making large-scale svm learning practical,” in Advancesin Kernel Methods—Support Vector Learning, B. Schölkopf, C.Burges, and A. Smola, Eds. Cambridge, MA: MIT Press, 1999.


[16] D. Lu, Y. Qiao, P. Dinda, and F. Bustamante, “Characterizing and pre-dicting TCP throughput on the wide area network,” in Proc. 25th IEEEICDCS, Columbus, OH, Jun. 2005, pp. 414–424.

[17] M. Mathis, J. Semke, J. Mahdavi, and T. Ott, “The macroscopic be-havior of the TCP congestion avoidance algorithm,” Comput. Commun.Rev., vol. 27, no. 3, pp. 67–82, Jul. 1997.

[18] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCPthroughput: A simple model and its empirical validation,” in Proc.ACM SIGCOMM, Vancouver, BC, Canada, Sep. 1998, pp. 303–314.

[19] V. Paxson, “End-to-end Internet packet dynamics,” in Proc. ACM SIG-COMM, Cannes, France, Sep. 1997, pp. 139–152.

[20] V. Paxson, “Measurements and analysis of end-to-end Internet dy-namics,” Ph.D. dissertation, Univ. California, Berkeley, 1997.

[21] B. Schölkopf and A. J. Smola, Learning With Kernels. Cambridge,MA: MIT Press, 2002.

[22] A. Shriram, M. Murray, Y. Hyun, N. Brownlee, A. Broido, M.Fomenkov, and K. Claffy, “Comparison of public end-to-end band-width estimation tools on high-speed links,” in Proc. PAM, 2005, pp.306–320.

[23] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,”Stat. Comput., vol. 14, pp. 199–222, 2004.

[24] J. Sommers and P. Barford, “Self-configuring network traffic genera-tion,” in Proc. ACM IMC, Taormina, Italy, Oct. 2004, pp. 68–81.

[25] J. Sommers, P. Barford, N. Duffield, and A. Ron, “Improving accuracyin end-to-end packet loss measurements,” in Proc. ACM SIGCOMM,Philadelphia, PA, Aug. 2005, pp. 157–168.

[26] J. Sommers, P. Barford, and W. Willinger, “A proposed frameworkfor calibration of available bandwidth estimation tools,” in Proc. IEEEISCC, Jun. 2006, pp. 709–718.

[27] J. Strauss, D. Katabi, and F. Kaashoek, “A measurement study of avail-able bandwidth tools,” in Proc. ACM IMC, Miami, FL, Nov. 2003, pp.39–44.

[28] M. Swany and R. Wolski, “Multivariate resource performance fore-casting in the network weather service,” in Proc. Supercomput., Bal-timore, MD, Nov. 2002, pp. 1–10.

[29] V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed.New York: Springer, 1995.

[30] S. Vazhkudai, J. Wu, and I. Foster, “Predicting the performance of widearea data transfers,” in Proc. IEEE IPDPS, Fort Lauderdale, FL, Apr.2002, pp. 34–43.

[31] Y. Zhang, L. Breslau, V. Paxson, and S. Shenker, “On the character-istics and origins of Internet flow rates,” in Proc. ACM SIGCOMM,Pittsburgh, PA, Aug. 2002, pp. 309–322.

[32] Y. Zhang, N. Duffield, V. Paxson, and S. Shenker, “On the constancy ofinternet path properties,” in Proc. Internet Meas. Workshop, San Fran-cisco, CA, Oct. 2001, pp. 197–211.

Mariyam Mirza received the B.S.E. degree in com-puter science from Princeton University, Princeton,NJ, in 2002.

She is a graduate student of computer science atthe University of Wisconsin–Madison. Her researchinterests include network measurement and perfor-mance evaluation.

Joel Sommers (M’09) received the Ph.D. degreein computer science from the University of Wis-consin–Madison in 2007.

He is an Assistant Professor of computer scienceat Colgate University, Hamilton, NY. His current re-search focus is the measurement and analysis of net-worked systems and network traffic.

Paul Barford received the B.S. degree in electricalengineering from the University of Illinois at Urbana-Champaign in 1985, and the Ph.D. degree in com-puter science from Boston University, Boston, MA,in December 2000.

He is an Associate Professor of computer scienceat the University of Wisconsin–Madison. He is theFounder and Director of the Wisconsin Advanced In-ternet Laboratory, and his research interests are in themeasurement, analysis and security of wide area net-worked systems and network protocols.

Xiaojin Zhu received the B.Sc. degree in com-puter science from Shanghai Jiao Tong University,Shanghai, China, in 1993, and the Ph.D. degreein language technologies from Carnegie MellonUniversity, Pittsburgh, PA, in 2005.

He is an Assistant Professor of computer scienceat the University of Wisconsin–Madison. From 1996to 1998, he was a Research Member with IBM ChinaResearch Laboratory, Beijing, China. His research in-terests are statistical machine learning and its appli-cations to natural language processing.

tcp

Education

tcp prediction

tcp throughput prediction

inputtcp throughput

formulabased methods

combination of path

best path

torybased predictors

tcp implementations