INTERNATIONAL JOURNAL OF NETWORK MANAGEMENTInt. J. Network Mgmt 2002; 12: 323–330 (DOI: 10.1002/nem.451)
Traffic behavior analysis and modeling ofsub-networks
By Yen-Wen ChenŁ†
In this paper, the characteristics of sub-network traffic is analyzed fromthe correlation point of view. It is easy to see that the traffic histogram ofa sub-network has a 24-hour seasonal variation due to daily usagebehavior. The auto correlation factor (ACF) and partial auto correlationfactor (PACF) tests are applied first to examine the correlation of thetraffic among consecutive hours and the correlation with a specific hour.The seasonal auto-regressive integrated moving average (ARIMA) model isapplied to characterize the above properties of the network traffic.Modeling performance is evaluated by examining the coincidence of thehistogram and the moving average of traffic volume between the actualtraffic collected from the network and the traffic generated by the proposedmodel. The experimental results illustrate that the proposed model caneffectively capture traffic behaviors of the sub-network and can then beused as a suitable traffic model for analysis of Internet performance.Copyright 2002 John Wiley & Sons, Ltd.
Introduction
A s various applications and services aredeployed by the Internet, network band-width is always not satisfied due to the
increasing needs. Several techniques such as IPover ATM (Asynchronous Transfer Mode), IP overSDH (Synchronized Digital Hierarchy), and IP overDWDM (Dense Wavelength Division Multiplex-ing) have been studied and developed recentlyfor providing broadband transport of the Inter-net. These techniques also support the capabilityof flexible bandwidth allocation for the provisionof QoS (quality of services). It is clear that thetraffic management issue will play an importantrole for QoS provision no matter which technique
is adopted by the broadband Internet.1 – 4 How-ever, the characteristics of Internet traffic has notbeen grasped because there are so many servicesprovided which are inherently different from tra-ditional telecommunication services. Therefore, itis not suitable to manage Internet traffic or analyzeInternet performance by using the traffic modelsproposed for traditional voice or data services.
In order to manage the network bandwidth effec-tively, it is important to capture network trafficbehavior so that the network controller can dynam-ically adjust the network bandwidth accordingly.Recently, literature has been proposed for the mea-surement of the Internet traffic.5 – 8 In reference 5, ameasurement infrastructure of Internet traffic withpassive and active measurement elements was
Yen-Wen Chen teaches at the Graduate Institute of Communication Engineering, National Central University, Taiwan, ROC.
ŁCorrespondence to: Yen-Wen Chen, Graduate Institute of Communication Engineering, National Central University, Jung-Li, Tao-Yuan
Taiwan 320, ROC.†E-mail: [email protected]
Contract/grant sponsor: National Science Council; Contract/grant number: NSC 89-2213-E-015 004.
Contract/grant sponsor: Ministry of Education; Contract/grant number: 90-H-FA07-1-4.
Copyright 2002 John Wiley & Sons, Ltd. Published online 22 May 2002
324 Y.-W. CHEN
suggested; in reference 6, a tool called ‘ntop’ wasproposed for Internet traffic measurement. Sincethe acquisition of real Internet traffic needs a widegeographical spread of collaborations, and covera large number of research and commercial net-works, two projects were created for collecting thetraffic statistics continuously.7,8 The National Lab-oratory for Applied Network Research (NLANR)7
has created a network analysis infrastructure, withpassive/active measurements and control moni-toring, for the purpose of collection and publicationof traffic raw data and visualization. The PingERproject8 created 20 monitoring sites around theworld in December 1999 to monitor the end-to-end performance of the Internet. However, as weknow, Internet traffic is quite different from thetraditional traffic in its self-similar property.9,10
Basically, the self-similar property demonstrateslong-range dependence characteristics with heavy-tailed probability densities.9 However, as the statis-tics of Internet traffic varies from service to service(e.g. WWW and E-mail), the traffic characteristicsof a specific service may also be not the same indifferent sites due to the different network archi-tectures and users’ behavior. Therefore, it is noteasy to model Internet traffic well in a generalaspect. It is more meaningful to develop a model-ing procedure.
I n order to manage the network bandwidtheffectively, it is important to capture
network traffic behavior so that the networkcontroller can dynamically adjust thenetwork bandwidth accordingly.
As Internet traffic is aggregated from many sub-networks, in this paper, we consider the modelingof Internet traffic in a hierarchical manner. Thisconcept is similar to the routing hierarchy, interiorand exterior routings, of Internet. The lower-leveltraffic model considers the traffic characteristicsof an individual service domain (e.g. a sub-network) because it is easier to capture servicesand users’ behavior in a specific domain; anda higher-level model deals with the aggregatedtraffic of these service domains. Analyzing thetraffic characteristics of a specific sub-network is
important for sub-network management, because,when comparing with a large network, it ismore practical and effective to perform bandwidthallocation in a sub-network.
In this paper, we focus on the traffic behaviorof a sub-network and propose an ARIMA-basedmodel to characterize it. Normally a sub-network,which can be a private network, a local areanetwork, or a campus network, has a more regulardaily traffic behavior. Hence, this daily behaviorcan be considered as a 24-hour phenomenon inARIMA modeling. As the traffic characteristics ofdifferent Internet services may be different, in thispaper, the total traffic and the Http/Web trafficof a sub-network are modeled and compared,separately. The modeling performance is evaluatedby comparing the ARIMA-generated traffic and theactual traffic in several ways. Besides comparingthe match of the histograms, the moving averagetraffic volume is also examined to show thetendency of the traffic increasing/decreasing.
This paper is organized as follows. In thefollowing section, an overview of the ARIMAmodeling technique is introduced. In the thirdsection the proposed modeling scheme is appliedto characterize the total traffic and the Http trafficof a specific sub-network. It is noted that theACF and PACF tests are applied to examine thecorrelation in lag-hours prior to traffic modelingand techniques for the model verification arealso described. The performance of the proposedmodel is evaluated in the fourth section and theconclusion is stated in the last section.
Overview of ARIMA ModelAn ARIMA model is one of the time series
analysis methods. Basically, an ARIMA modelcan be applied to describe both the autoregressive(AR) and moving average (MA) characteristics ofa non-stationary time series.11,12 A general ARIMAmodel of a non-stationary time series Zt, denotedas ARIMA (p,d,q), can be stated as follows:
�p�B��1 � B�dZt D C C �q�B�at �1�
where �p�B� and �q�B� represent the AR and MAparts, respectively; B is the backshift operator(BiZt D Zt�i) and at is the white noise process(i.e. i.i.d. innovation process). It is noted that d isthe difference order to make a stable trend of the
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330
TRAFFIC BEHAVIOR ANALYSIS 325
time series (e.g. d D 1 implies a linear trend of thetime series). And if the time series demonstrates theseason’s phenomenon, then the seasonal ARIMA isapplied to model both the spatial and the temporalcharacteristics. Then, a seasonal ARIMA modelcan be denoted as ARIMA (p,d,q)(P,D,Q)s, where(p,d,q) and (P, D, Q)s represent the spatial andtemporal parameters, respectively; and s is theseason period. Therefore, ARIMA (p, d, q)(P, D, Q)s
can be represented in the following form:
p�Bs��p�B��1 � Bs�D�1 � B�dZt D C C Q�Bs��q�B�at
�2�The coefficients of the AR part and MA partare �1, �2 . . . , �p, �s, �2s, . . . , �Ps and �1, �2, . . . , �q, �s,�2s, . . . , �Qs respectively.
It is noted that the parameters of the seasonalARIMA are estimated in an iterative procedure fora suitable model. First, we will determine whetherthe trend of the time series is stable or not. TheDickey–Fuller (DF), Augmented Dickey–Fuller(ADF), Phillips–Perron (PP), and Weighted Sym-metric (WS) methods can be applied for the sta-bility test of the series. The default test methodused in the software tool SAS is DF test. Normally,at most d D 2 is enough to make the time seriesbecome stationary. In order to determine a suitablemodel for the time series, the parameters of AR andMA parts (i.e., p, q, P, Q) will be estimated throughthe ACF (Auto-Correlation Factor) and PACF (Par-tial ACF) tests. Thus, for the ACF with lag k timeperiods, �k is represented as
�k D Cov�Zt , ZtCk�pVar�Zt �
√Var�ZtCk�
�3�
and the PACF, �kk, is
�kk D Cov[�Zt � ZtCk�, �ZtCk � ZtCk �]√Var�Zt � Zt �
√Var�ZtCk � ZtCk�
�4�
where OZi is the linear estimation value of Zi.Basically, the values of ACF and PACF will be tails-off after specific numbers and the parameters p, q,P, Q are selected to be larger than these numbers.Basically, the coefficients of the AR and MA partscan be calculated by using maximum likelihoodestimation (MLE). However, if the order of the ARpart is higher than 1 or the MA part is included thenthe non-linear estimation method will be appliedfor better modeling performance. The derivedmodel will be tested for acceptance. Whether
the derived model is accepted is determined bythe T-test results first and the ACF values canbe applied to examine the residuals between thereal traffic and the approaching traffic estimatedby the ARIMA model (the white noise). TheAIC(m) (Akaike’s Information Criterion with mparameters)11 value can also be used for selectionof a suitable model. If the derived model can passthe T-test and the ACF of the white noise is smallerthan the standard deviation, then the derivedmodel is acceptable, otherwise the parameters andcoefficients will be refined and the above testprocedure is repeated. From our experience, theselection of a suitable ARIMA traffic model is aheuristic task and needs a trial-and-error process.The ACF/PACF examinations, white noise test,and T-test only provide the assistance during themodeling procedure. For traffic generation by thederived ARIMA model, it is shown12 that theapproach of the minimum mean square errorforecast can effectively predict the consecutivetraffic recursively.
Experimental ExamplesIn this paper, 5-day traffic statistics of a specific
sub-network are collected for experiment. Thestatistics of the total traffic and the Http trafficare studied separately. The reason of consideringthe total traffic and the Http traffic individually isto illustrate the traffic characteristics of a specificservice and of the aggregated traffic (the totaltraffic) can be modeled by the ARIMA modelwith a seasonal phenomenon because they are allmeasured from a sub-network.
The total traffic volume (packet size) wascollected by a traffic monitor which was attachedat the outgoing link of the sub-network. Thehistogram of a 5-day (120 hours) traffic statisticsis illustrated in Figure 1. As shown in Figure 1,the traffic statistics demonstrates a typical 24-hourseasonal phenomenon.
In order to determine a suitable model for thisseries, the auto-correlation and partial autocorrela-tion tests are performed and the results are statedin Figure 2(a) and (b). The test results reveal thatit is highly correlated with 24-hour period accord-ing to the ACF values; and the PACF values areless significant after a one hour lag. This temporal
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330
326 Y.-W. CHEN
0
200000
400000
600000
800000
1000000
1200000
Day1 Day2 Day3 Day4 Day5 Day6
Pac
kets
Figure 1. The 5-day total traffic histogram of a sub-network
0123456789
101112131415161718192021222324
−1 0 1
Autocorrelations
Correlation Coefficients
(a)
0123456789
101112131415161718192021222324
−1 0 1
Partial Autocorrelations
Correlation Coefficients
(b)
Figure 2. The traffic autocorrelation and partialautocorrelation of the sub-network
series becomes stable after the first-order differ-ence by the DF test. Thus, the temporal behaviorof this series can be represented as (1,1,0). For thespatial part, this series is also stationary after thefirst order difference and according to the ACFand PACF values, theoretically, the experimentaltraffic series can be modeled by AR(1). However,the white noise values cannot be converged if theAR(1) model is applied. Therefore, during modelfitting, the matching error (or the residuals) can beregarded as the white noise values and the spatialpart can be further modeled by using the mov-ing average for compensation. Hence, this series is
1
ARIMA(0,1,1)(1,1,0)s NOINT
White Noise Tests
Significance Probabilities
23456789
10111213141516171819202122232425262728293031323334353637383940414243444546474849
1 .1 .01 .001
Figure 3. The white noise test of the ARIMA(0,1,1)(1,1,0)24 model
characterized by ARIMA (0,1,1)(1,1,0)24. Thus,
�1 � B24��1 � �24B24��1 � B�Zt D �1 � �1B�at �5�
The values of �24 and �1 are determined by usingthe non-linear estimation approach as �0.387 and0.888, respectively. Therefore, we have
�1 � B24��1 C 0.387B24��1 � B�Zt D �1 � 0.888B�at �6�
The AIC value of the derived model is calculatedas 532 and the white noise test is shown inFigure 3 which illustrates that the model can passthis test.
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330
TRAFFIC BEHAVIOR ANALYSIS 327
Model parameters Value Std-err T P
Moving Average, Lag 1 0.888572 0.064335 13.81158 2.55E-12
Seasonal Autoregressive, Lag
24 �0.38662 0.110964 �3.48424 0.002102
Table 1. The T-test results of the ARIMA (0,1,1)(1,1,0)24 model with ˛ D 0.05
0
50000
100000
150000
200000
250000
Pac
kets
Day1 Day2 Day3 Day4 Day5 Day6
Figure 4. The 5-day Http traffic histogram of a sub-network
The T-test is also performed for the derivedARIMA model. The test results (with significantvalue ˛ D 0.05) are listed in Table 1. Thetest results illustrate that the derived model isacceptable.
A similar procedure is applied to characterizethe Http/Web traffic of the sub-network. Thehistogram of the Http traffic is shown in Figure 4.We performed the ACF and PACF tests and foundthat it is less significant after a 2-hour lag. However,the white noise test and the T-test show that theAR(2) is not a suitable model. We compared themodels with more lag-hours correlation (increasethe p-values) and the models compensated by theMA part to find an acceptable model. After a tuningprocess, we found that ARIMA (4,1,0)(2,1,0)24 isrecognized as the acceptable model of the specificHttp traffic. The coefficients of the derived ARIMAmodel are estimated by using the non-linearapproach and we have
�1 C 0.37B C 0.33B2 C 0.26B3 C 0.38B4��1 � B�
ð �1 C 0.83B24 C 0.6B48��1 � B24�Zt D at �7�
The T-test and the white noise test are alsoapplied to verify whether the derived model is
acceptable. The results of the above tests areillustrated in Table 2 and Figure 5, respectively.
Modeling PerformanceIn the experiment, the first 4-day traffic data is
used for the model derivation while the derivedmodel is applied for the traffic generation of thefifth day. Comparisons of the traffic histogrambetween the actual traffic and the generated trafficare shown in Figure 6 and Figure 7, respectively.Generally, this illustrates that the data generatedby the derived model can approximately fit thehistogram of the actual data. The generatedtraffic demonstrates the same seasonal (daily)phenomenon as the actual traffic. It also illustratesthat the matching of the peak or bottom valuesare not very good, the reason being that the peakand the bottom values of the traffic series varyconsiderably (from 4000 packets to 11,000 packets)and the model generated traffic is not so sufficientlysensitive to catch the change of large traffic volumewithin a very short time.
Additionally, the 3-hour moving average trafficvolumes of the actual traffic and the generated
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330
328 Y.-W. CHEN
Model parameters Value Std err T P
Autoregressive, Lag 1 �0.3661 0.094595 �3.87025 0.001121
Autoregressive, Lag 2 �0.32683 0.097024 �3.36852 0.003422
Autoregressive, Lag 3 �0.26349 0.103537 �2.54484 0.020318
Autoregressive, Lag 4 �0.37971 0.095175 �3.9896 0.00086
Seasonal Autoregressive, Lag
24 �0.82677 0.117924 �7.01108 1.52E-06
Seasonal Autoregressive, Lag
48 �0.60367 0.109344 �5.52088 3.05E-05
Model Variance (sigma squared) 4.9E C 08
Table 2. The T-test results of the ARIMA (4,1,0)(2,1,0)24 model with ˛ D 0.05
1
White Noise Tests
Significance Probabilities
23456789
10111213141516171819202122232425262728293031323334353637383940414243444546474849
1 .1 .01 .001
Figure 5. The white noise test of the ARIMA(4,1,0)(2,1,0)24 model
traffic are compared in this paper. The 3-hourmoving average (AV3(Zt)) is calculated from theaverage of the traffic volume of consecutive 3hours as
AV3�Zt � D �Zt�1 C Zt C ZtC1�/3 �8�
The reason of comparing the 3-hour movingaverage is that the traffic volume illustrates ahigh 1-hour lag correlation property. Though the
derived Http model considers the 4-hour lag cor-relation, the ACF/PACF tests reveals that theinfluence is less significant after a 1-hour lag. Thenit is meaningful to consider a traffic volume withthe volumes of its previous and next hour as theburst behavior. The moving-average behavior canalso be regarded as the increase/decrease trend ofthe histogram. The 3-hour moving average trafficbehaviors of the model-generated traffic and theactual traffic are compared in Figures 8 and 9,respectively. The poor match on the peak/bottomvalues of the histogram is also reflected in themoving-average characteristics. From the experi-mental results shown in Figures 8 and 9, we dis-cover that the large matching difference occurredwhen the actual traffic changes significantly, e.g.during the twentieth hour to the twenty-fourthhour. Although the proposed model cannot fitthe peak/bottom traffic volumes very well, thismodel still performs well in characterizing thehistogram and the increasing/decreasing trend ofsub-network traffic.
ConclusionsThe Internet is actually a network of networks
and to analyze the overall Internet traffic is diffi-cult. In this paper we suggest that network trafficbe studied in a hierarchical manner so that traf-fic behavior can be analyzed and modeled inmore details. The traffic characteristics of the sub-networks may be the first step toward Internettraffic modeling. This idea is like the ‘divide -and-conquer’ concept used in many computeralgorithms. Generally, the traffic behavior of a
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330
TRAFFIC BEHAVIOR ANALYSIS 329
0
200000
400000
600000
800000
1000000
1200000
Pac
kets
Actual Traffic Generated Traffic
Day1 Day2 Day3 Day4 Day5 Day6
Figure 6. ARIMA (0,1,1)(1,1,0)24 modeling of total traffic in a sub-network
0
50000
100000
150000
200000
250000
Day1 Day2 Day3 Day4 Day5 Day6
Pac
kets
Actual Traffic Generated Traffic
Figure 7. ARIMA (4,1,0)(2,1,0)24 modeling of Http traffic in a sub-network
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
1 3 5 7 9 11 13 15 17 19 21 23
Time
Pac
kets
Actual Traffic
Generated Traffic
Figure 8. The 3-hour moving average behavior oftotal traffic
0
20000
40000
60000
80000
100000
120000
140000
160000
1 3 5 7 9 11 13 15 17 19 21 23
Pac
kets
Actual Traffic
Generated Traffic
Time
Figure 9. The 3-hour moving average behavior ofthe Http traffic
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330
330 Y.-W. CHEN
specific sub-network demonstrates a 24-hour cyclicphenomenon due to routine usage of the usersand the servers. Hence, the traffic represents thetemporal and spatial correlation characteristics. Inorder to model both the temporal and spatialcorrelation properties, in this paper the seasonalARIMA-based traffic modeling scheme is pro-posed. Although the ACF and PACF can be helpfulfor the selection of correlation parameters, wefound that the white noise test and T-test actu-ally play a major role in determining an acceptablemodel. The experimental results illustrate that theproposed scheme can effectively characterize thetraffic behavior of a sub-network. Either the totaltraffic or the traffic of a specific service (Http)in a sub-network can be successfully modeled byARIMA. Thus, the proposed modeling procedurecan be easily to be adopted in other sub-networksor intranets.
In this paper we only consider the modeling ofthe total traffic and a specific service (Http) of asub-network individually. It is also interesting toconsider the correlation among various kinds ofInternet services or different servers. Thus Httptraffic may be correlated to the e-mail or theftp services; traffic of the domain name servermay be correlated to the total traffic of thesub-network. If the correlation property amongdifferent services/servers can further be studied,then a traffic usage profile of a sub-networkcan be modeled and this profile can not onlybe used for the bandwidth management butcan also be applied as heuristics for networkintrusion detection.13 – 15 Other further study issuesinclude the adaptation of the proposed model forthe improvement of sharply increased/decreasedtraffic modeling and the aggregated model of thesub-network ARIMA models.
AcknowledgementsThis research was supported in part by grants
from the National Science Council (NSC 89 2213-E-015 004) and the Ministry of Education (90-H-FA07-1-4).
References1. Davie B, Lawrence J, McCloghrie K, Rekhter Y,
Rosen E, Swallow G, Doolan P. MPLS using LDPand ATM VC Switching. IETF rfc-3035, Jan. 2001.
2. Rosen E, Viswanathan A, Callon R. Multiprotocollabel switching architecture. IETF, draft-ietf-mpls-arch-06.txt, Aug. 1999.
3. Rajagopalan B, Pendarakis D, Saha D,Ramamoorthy RS, Bala K. IP Over optical networks:architecture aspects. IEEE Communication Magazine2000; 38: No. 9, 94–102.
4. Ghani N, Dixit S, Wang TS. On IP-Over-DWDMintegration. IEEE Communication Magazine March2000; 72–84.
5. Caceres R, et al. Measurement and analysis of IPnetwork usage and behavior. IEEE CommunicationMagazine May 2000; 38: No. 5, 144–151.
6. Deri L, Suin S. Effective traffic measurement usingntop. IEEE Communication Magazine May 2000; 38:No. 5, 138–143.
7. McGregor T, Braun H, Brown J. TheNLANR network analysis infrastructure. IEEECommunication Magazine May 2000; 38: No. 5,122–128.
8. Leland WE, et al. On the self-similar nature ofethernet traffic. IEEE/ACM Trans. Network Jan. 1994;1–15.
9. Listanti M, Eramo V, Sabella R. Architecturaland technological issues for future optical Internetnetworks. IEEE Communication Magazine Sept. 2000;82–92.
10. Paxson V, Floyd S. Wide area traffic: the failure ofPoisson modeling. IEEE/ACM Trans. on NetworkingJune 1995.
11. Akaike H. A new look at the statistical modelidentification. IEEE Trans. on Automatic Control 1974;19: 716–723.
12. Box GEP, Pierce DA. Distribution of residualautocorrelations in autoregressive integratedmoving average time series models. Journal ofAmerican Statistical Association 1970; 65: No. 332,1509–1526.
13. Sommer P. Intrusion detection system as evidence.Computer Networks 1999; 31: 2477–2487.
14. Schneier B, Kelsey J. Tamperproof audit logs asa forensic tool for intrusion detection systems.Computer Networks and ISDN Systems 1999.
15. Lim A. Network security for corporate managementwith a focus on virtual private networks.1998 International Telecommunication SymposiumProceedings Vol. III, September, 15–17.�
If you wish to order reprints for this or anyother articles in the International Journal ofNetwork Management, please see the SpecialReprint instructions inside the front cover.
Copyright 2002 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2002; 12:323–330