wcnc

Upload: phuoc-vu

Post on 11-Oct-2015

9 views

Category:

Documents


0 download

TRANSCRIPT

  • Effects of Downsampling on Statistics of Discrete-Time Semi-Markov Processes

    Phuoc Vu, Shuangqing Wei and Benjamin Carroll

    AbstractIn this paper, we first present a novel approachto finding the statistics of the downsampled sequence of adiscrete time semi-Markov process in terms of the sojourn timedistributions and states transition matrix of the resulting process.Moreover, we further show that the statistics of the originalsemi-Markov process cannot be uniquely determined given thedownsampled sequence. This suggests a singularity issue resultingfrom the downsampling regardless of the bandwidth of theorginal process. Numerical results based on derived theoreticalinvestigation have been further verified using simulations. Ourfindings provide a more profound understanding on the limitationof using semi-Markov models in characterizing the dynamics ofnodes activities in wireless networks.

    KeywordsSemi-Markov process, down-sampling, discrete timeMarkov renewal process, z-transform, truncated distribution, so-journ time distribution, states transition matrix

    I. INTRODUCTIONThere has been growing interest in applying semi Markov

    process to model on-off duty cycle of different nodes in wire-less LANs. In [1], the authors propose that semi Markov chaincan be used to design lifetime model for each sensor nodeby considering the power consumption in different operationalmodes and the energy overheads incurred during transitions.Besides, semi Markov chain was also adopted for differenttypes of Measurement-Based Model for Dynamic SpectrumAccess in WLAN Channels. [2] introduces the problem ofdynamically sharing the spectrum in the time-domain byexploiting whitespace between the bursty transmissions of aset of users, represented by an 802.11b based wireless LAN(WLAN). In [3], Kadiyala and etc propose a new semi-Markovprocess-based model to compute the network parameters, suchas saturation throughput, for the IEEE 802.11 Distributed Co-ordination Function employing the Binary Exponential Backoff(BEB).

    In addition, semi- Markov models are used to characterizethe dynamics of wireless network nodes. However, to the bestof our knowledge, there is a lack of deep understanding inregard to how operations in a practical set-up such as sampling,superposition, and even mislabelling due to near-far effect af-fect the restoration of the statistics of the original semi-Markovprocesses. Such issues were exemplified in one of our recentworks [4]. In [4], we have adopted Bayesian Hidden Semi-Markov Model (HSMM) for detecting wireless RF devices.Specifically, we have employed multiple USRPs to simulateboth coordinated and non-coordinated transmissions of wire-less nodes in a small scale network. The generated RF traceswere then collected via downsampling by a monitoring USRPnode where an off-line non-parametric learning algorithm wasexecuted to partition and label the collected RF traces. In our

    1The authors are with the Division of Electrical and Computer Engineering,School of EECS, Louisiana State University, Baton Rouge, LA 70803, USA(Email: [email protected]; [email protected] and [email protected])

    experimental study, we have noticed that the learning algorithmhas done a decent job in segmenting RF traces into meaningfulstates. However, the identified post-sampling states transitionsdemonstrate some unseen patterns not evident in the originalprocesses, which has thus prompted us to seek answers to suchissues. Furthermore, and more importantly, the experimentalworks have prompted us to question if the statistics of semi-Markov processes are recoverable or not given those of theresulting discrete time sequences under the aforementionedoperations,i.e. downsampling.

    It should be noted that our objective is quite differentthan that addressed by traditional sampling theorems, whichare about estimating the original band-limited random processgiven its downsampled sequences. Rather, we are interestedin only the original statistics which are captured by bothstates transition probabilities and sojourn time distributions forsemi-Markov processes. The findings in this paper could helpus understand more profoundly the fundamental limitation inlearning the nodes activity patterns in wireless networks underthe widely used semi-Markov models when downsampling isnecessary due to concerns of computational cost, as what weexperienced in our experiments. The findings in this papercould help us understand more profoundly the fundamentallimitation in learning the nodes activity patterns in wirelessnetworks under the widely used semi-Markov models whendownsampling is necessary due to concerns of computationalcost, as what we experienced in our experiments. Other oper-ations on semi-Markov processes such as super-position andfalse-labeling have also been studied and will be reported in[5].

    The rest of the paper is organized as follows. Section IIpresents the set up and specifications of our experiments andcorresponding findings on learning nodes activity patterns ofcoordinated transmissions between two nodes. Section IIIfirst presents background notations on Semi Markov processes,and then provides analytical solutions to the statistics ofdownsampled sequences, as well as the justification on the sin-gularity issue in restoration the statistics of the original semi-Markov processes. In Section IV, we compare the numericalresults derived under the analytical framework with those usingsimulations to further demonstrate the validity of our findings.Finally, we conclude in Section V.

    II. EXPERIMENTAL RESULTS FROM IDENTIFICATION OFTHE TRANSMISSIONS OF WIRELESS RF DEVICES

    A. Modelling and specificationIn our experimental study, we have implemented the non-

    parametric learning algorithms proposed in [6] to learn the hid-den sates of wireless RF devices under the framework of Hid-den Semi-Markov Processes (HSMM). This was accomplishedby programming several USRPs to transmit data accordingto semi-Markovian behavior as implemented through custom

  • Python programs using GNU Radio. The Python programsenable two USRPs to coordinate their activity through thehost PC so that no packet collisions are produced as theytransmit data over the programs execution. A third USRP wasthen utilized to collect wireless RF traces of the generatedactivity to use as inputs to the Bayesian HSMM algorithm.It is the goal of the Bayesian HSMM algorithm to identifythe number of devices present in each collected RF trace byexamining the statistical properties of the received signal overtime and to also identify collision instances when two USRPsattempt to transmit data packets simultaneously. Collectionof the wireless traces over the ISM band was performed byrunning the USRP Python program. The inputs to this programspecify the center frequency of interest, the sampling rate withwhich the received signal appearing at the antenna are digitizedby the USRPs ADC. Due to the large amount of data storedwithin the file at the utilized sampling rate of 500 kHz forthese experiments, the data was subsequently down-sampledto a rate of 1 kHz to allow the Bayesian HSMM algorithm tobe conducted in reasonable amounts of time. More descriptionsof the experimental set up and configurations can be found in[4]. Here we present the experimental results for a simple caseof two coordinated OFDM users.

    B. Result for Two Coordinated OFDM UsersWe considered experiment to assess the Bayesian HSMM

    algorithms ability to discern between two coordinated USRPsin a transmission environment in which no collisions mayoccur. Both USRPs were chosen to send OFDM symbolswith BPSK as the underlying modulation and the experimentalsettings are shown in the following table. A Python programwas used to generate a realization of a Markov chain state se-quence through the specification of an idle/busy state transitionmatrix P and a second uniformly distributed random variablex U(0, 1) used to select which USRP will be selectedfor transmission during any busy state. If x < 0.5, USRP1 is chosen for transmitting packets, otherwise USRP 2 willsend its own packets. The busy state durations correspondingto USRP 1/2s transmissions is governed by the amount ofpackets, packet size, and OFDM symbol bandwidth chosenfor each USRP. Whenever USRP 1 is chosen to transmitits packets according to the generated Markov chain, it willsend 5 packets with probability 0.5. Likewise, it will send10 packets with equal probability. The duration of the idlestate Doff is generated through draws from an exponentialrandom variable with a specified mean. The idle state durationsare further bounded by minimum and maximum values toprevent extremely short durations or extremely long durations.As an example, suppose Doff Exp(0, 2) with bounds of[0.3, 0.5]. If the drawn value of Doff falls within [0.3, 0.5],then the value is kept, otherwise, the idle duration will bethe closest interval bound. Figure 3 depicts the labeled statesequence after the final iteration of the algorithm, along withthe sample magnitudes for each point in the RF trace. A legendfor mapping each state to its corresponding can be found inthe following. There are 4 states after the final iteration: State1 with dark blue color is Idle state whose duration distributionfollows Poiss(1 = 98.06); State 2(cyan) and state 4 (orange)are USRP 1 whose duration distributions follow Poiss(2 =125.038564) and Poiss(3 = 141.050548), respectively; State3 with green color is USRP 2 with Poiss(3 = 141.050548).Instances of USRP 2s transmissions were well reflected by

    Fig. 1. USRP Generation of Experiment 1

    Fig. 2. Labeled State Sequence for Experiment 1. Dark Blue = Idle State,Orange and Cyan = USRP 1, Green = USRP 2

    the state sequence due to the large difference in transmissionpower with respect to USRP 1. The inferred model thusseems to indicate that the observed transmissions over thechannel indeed correlate to a well-controlled wireless channelfor coordinating users, since there is very little probabilitythat two users transmissions would immediately succeed eachother in time.

    A =

    0 0.610113 0.378314 0.0115730.901059 0.000000 0.095198 0.0037420.998246 0.000217 0.000000 0.0015370.997800 0.001911 0.000289 0.000000

    From Figure 3, we can also observe many fast switching fromUSRP 1 to USRP 2 and also we observed that the busy statesare not always followed by the idle state any more but thereis transition from a busy state to another busy state, i.e. thecoordinated property is not fully restored after down-sampling.We can thus conclude that even for two coordinated USRPstransmissions after the downsampling operations includingA/D and further downsampling to reduce computational cost,the statistics of the original hidden semi Markov process aredistorted, which is somewhat expected. However how exactlyit affects and if we can restore the statistics. We provide theanalytical solution for this problem in the next section.

    III. AN ANALYTICAL APPROACH FOR THE PROBLEMWe present an analytical approach to elaborating issues

    seen above from the experiments. Some notation and denitionsare in order. All vectors and matrices are represented withlower and upper case boldface fonts respectively. Sets arerepresented with calligraphic fonts. Random variables are rep-resented with italic fonts while their realisation is representedby lower case italic fonts. Let In be the identity matrix of

  • size n n while 1n denotes the column vector of n ones.Throughout the paper, we reseve the lower case letter forprobability distribution, for example, hi(k) denotes the sojourntime distribution of state i of the semi- Markov process. Wedenote letter with to define the cumulation distribution, forexample, hi denotes the cumulative sojourn time distributionof state i. We keep upper case letter for the z-transformcorresponding to the distribution in time domain, for example,Hi(z) denotes the z-transform of the sojourn time distributionin state i. And in the paper, we use X to shows the originalsequence while Y defined as the down-sampled semi- Markovprocess.

    A. Review of Relevant Background on Semi- Markov ProcessesDefine E as the state space of the semi-Markov process:

    E = {1, 2, 3, .., s}. Let N be the set of integers, i.e. N ={0, 1, 2, 3, ..}. Define the set of non-negative matrices on EEas E . Consider a semi-Markov chain with state space E andlet { i, j, k } be the three states from the set E given by thefollowing diagram in Figure 2. Here denote {Xn} as the states

    Fig. 3. A sample path of a discrete time semi Markov chain

    of the chain at the nth arrival and {Tn+1} as the sojourn timeof the semi-Markov process at that nth arrival and {Sn} be thecorresponding jump time. Similar to the approach of [7], wedefine a discrete-time semi-Markov kernel: a matrix valuedfunction q E is said to be discrete-time semi-Markovkernel if the following three conditions are met:

    i. 0 qij(k) 1ii. qij(0) = 0;

    k=0 qij(k) 1 with i, j E

    iii.k=0

    jE qij(k) = 1, for i E

    Here the right continuous jump at Sn occurs at the state Xn,whose duration is Tn+1. We can define the element of thekernel q as

    qij(k) = P (Xn+1 = j, Tn+1 = k | Xn = i). (1)Intuitively, qij(k) is the probability that the semi-Markov chainjumps from state i to state j with the time spent during state ias k units of time. We want to make a remark about the subtledifference between the index of n and k, respectively. Here theindex n refers to the states of the Semi- Markov process andalso refers to the arrival nature and the index k refers to timeepochs of the Semi Markov chain/sequence or refer to the timenature. The transition matrix of the embedded Markov chain(Xn) defined by:

    pij = P (Xn+1 = j | Xn = i) (2)with i, j E and n N. Sojourn times distribution in a givenstate depends on the current state as well as the next state. Forall i, j E we denote hij(k) be the sojourn time distributionin state i and the next state is j. We can write

    hij(k) = P (Tn+1 = k | Xn = i,Xn+1 = j). (3)

    We have the following relationship:

    qij(k) = pijhij(k) (4)

    The sojourn time distribution in a given state i can be writtenas:

    hi(k) =E

    hij(k). (5)

    Define Z = (Zk), k N, to be a semi-Markov chain withZk = XNk , k N with Nk = max(n N | Sn k). ThenNk is the discrete counting process of the number of jumps in[1, k] which is N and Zk gives the system state at time k.Define the cumulative sojourn time distribution as:

    hi(k) =E

    kl=1

    hij(l) =E

    kl=1

    qij(l)m=0 qij(m)

    . (6)

    The transition function of semi-Markov chain Z is the matrix-valued function P E defined by

    pij(k) = P (Zk = j | Z0 = i) (7)with i, j E. The transition function P can be computed as

    pij(k) = Iij(k)(1 hi(k)) +rE

    kl=0

    qir(l)prj(k l) (8)

    where Iij(k) is the indicator function, Iij(k) = 1 if i = j andIij(k) = 0 otherwise. We have in matrix form:

    p = I h+ q p. (9)The cumulated semi-Markov kernel q = qij defined by:

    qij(k) = P (Xn+1 = j, Tn+1 k | Xn = i) =kl=0

    qij(l).

    (10)We have the result for the elements of the transition matrix ofthe embedded markov chain as:

    pij = qij() =k=0

    qij(k). (11)

    The stationary distribution of the semi-Markov process canbe calculated as follows. Let v = [v(1)v(2)...v(n)] be thestationary distribution of the embedded markov chain. In otherword, v = vP where P is transition matrix. We define themean sojourn time in any state i as mi = E(S1 | X0 = i) =

    k1 khi(k).From that the j-element of the stationary distribution of thesemi-Markov chain is given by: pij =

    v(j)mjE v(i)mi

    . We have pi= (pij), j E, is the stationary distribution of the semi-Markovchain.

    B. Statistical Properties of Down-sampled Semi-Markov Se-quences

    1) Main results for the down-sampling problem: We firstpropose a result, which plays an important role in understand-ing the characterized behaviors of the down-sampled sequence:

    Proposition 1. The resulting process after downsampling asemi-Markov process is also a semi-Markov process.

    Proof: We want to show that the downsampled sequenceof the semi Markov process is also a semi Markov process. For

  • the finite alphabet set of the state space, we define again ourdowmsampling as following: For the down-sample factor of mwe keep the first letter and delete the next m1 letters. One ob-servation is that after the downsampling the state space of theresulting sequence is the same as the state space of the originalsequence. From Section A, since X is a homogeneous semi-Markov chain, qXij (k) does not depend on n, where from Equa-tion (1) we have: qij(k) = P (Xn+1 = j, Tn+1 = k | Xn = i).Here the homogeneous property implies that P (Xn+1 =j, Tn+1 = k | Xn = i,Xn1, .., X0, Sn, Sn1, .., S0) =P (Xn+1 = j, Tn+1 = k | Xn = i). We need to showthat for the Y sequence, P (Yn+1 = j, TYn+1 = k | Yn =i, Yn1, .., Y0, SYn , S

    Yn1, .., S

    Y0 ) = P (Yn+1 = j, T

    Yn+1 =

    k | Yn = i). Equivalently, we can show from our defini-tion in section A that P (Yn+1 = j, TYn+1 = k | Yn =i, Yn1, .., Y0, SYn , S

    Yn1, .., S

    Y0 ) = P (YSYn+1 = j, S

    Yn+1

    SYn = k | YSYn = i, YSYn1 , ..., YSY0 , SYn , SYn1, .., SY0 ). Wetake the simple case for down-sampling factor of m=2 andthe results can be generalized to m. Now, from our down-sampling result, Yk = X2k we have: P (YSYn+1 = j, S

    Yn+1

    SYn = k | YSYn = i, YSYn1 , ..., YSY0 , SYn , SYn1, .., SY0 ) =P (X2SYn+1 = j, S

    Yn+1 SYn = k | X2SYn =

    i,X2SYn1 , ..., X2SY0 , SYn , S

    Yn1, .., S

    Y0 ). As X is a homoge-

    neous semi Markov process and for steady-state result, thetransition function PXij (k) = P (Xk = j | X0 = i) so we canrewrite the above equation as: P (Yn+1 = j, TYn+1 = k | Yn =i, Yn1, .., Y0, SYn , S

    Yn1, .., S

    Y0 ) = P (X2SYn+1 = j, S

    Yn+1

    SYn = k | X2SYn = i = P (Yn+1 = j, TYn+1 = k | Yn = i).,this concludes the proof of Proposition 1

    Since the down-sampled sequence is also a semi-Markovprocess, we are interested in finding the statistical propertiesof the resulting process. More specifically, we want to first findhow downsampling is reflected in the statistics of the resultingsequence in terms of its sojourn time distribution and transitionprobabilities matrix. We next establish the foundations of ourwork in the following result. Only a simple, but non-trivial casewith 3-state semi-Markov processes is given, whose results canbe extended to more general cases in the similar manner. Thiscase is also reflecting a set-up in our experiments when weonly consider activity patterns of two nodes, together with theinterlaced idle states.

    Proposition 2. For a given 3 states semi-Markov process, therelationship between the transition function pij(k) and thesemi-Markov kernel Qij(k) for i, j {1, 2, 3} in the transformdomain are given by:

    Q12(z) =

    P12 P13P32 P33 / P22 P23P32 P33

    (12)

    Q13(z) =

    P12 P13P22 P23 / P22 P23P32 P33

    (13)where |A| denotes determinant of a non-singular matrix A.

    Similar expressions can be written forQ21(z), Q23(z), Q31(z), Q32(z).

    Proof: Here we provide the sketch of the proof. FromEquation (8), we take the z-transform on both sides of that

    equation. In terms of matrix form, we have: 1 Q12 Q13Q21 1 Q23Q31 Q32 1

    P11 P12 P13P21 P22 P23P31 P32 P33

    =H1(z) 0 0

    0 H2(z) 0

    0 0 H3(z)

    (14)

    where Hi(z) = zz1 (1 j 6=i

    qij(z)qij(1)

    ). So from the aboveequation we can solve the system of linear equations:Q12(z)P22(z) + Q13(z)P32(z) = P12(z) Q12(z)P23(z) +Q13(z)P33(z) = P13(z) and then we obtain the solutions asin Equations (12) and (13). Similarly we can apply the sametechnique and solve for Q21(z), Q23(z), Q31(z), Q32(z), thisconcludes the proof of Proposition 2

    . From Proposition 2, we can find the relationships betweena given semi-Markov kernel from state i to state j, i, j {0, 1, 2} Qij(z) and Pmn(z), m,n {0, 1, 2}. The sojourntime distribution: Hij(z) =

    Qij(z)pij

    . We can get pij by lettingpij = Qij(1). Finally, taking the numerical inverse z-transformto get hij(k).

    Next, we present next result that can help us understandthe connection between sequence X and sequence Y in termsof the relationship between their respective z-transform of thetransition function.

    Proposition 3. The relationship between the transition func-tion pXij (k) and the transition function p

    Yij(k) in transform

    domain is given by:

    PYij (z) =1

    m

    (PXij (z

    1m ) +

    m1n=1

    PXij (z1m exp(

    i2pinm

    )

    )(15)

    Proof: Since the results are quite straight-forward and hasbeen discussed in [8]. From the down-sampling we have:

    pYij(k) = pXij (mk) (16)

    with i, j E . Hence applying the frequency domain on bothsides of the above equation, or in z-transform domain and theproperties of the down-sampling with factor m, we obtain theimmediate result for Equation (15).

    Now, we want to apply the propositions given to findthe statistics of the down-sampled sequence Y given thestatistics of the original sequence X . The method to use isto write all the quantities in terms of the semi-Markov kernel:qXij (k) = p

    Xijh

    Xij (k), with i, j E where pXij denotes the ij-

    element of the embedded Markov chain transition matrix of{Xk}. Then from Proposition 3 we can find the transitionfunction PY (z). Solving for PYij (z) and then from that wecan find QYij(z) by the following relationships: Q

    Y01(z) =

    PY01(z)PY22(z)PY02(z)PY21(z)

    PY11(z)PY22(z)PY12(z)PY21(z)

    The rest of the expressions QYij(z)can be found explicitly from Proposition 2. So for each QYij(z)we can have two derivations. The transition probability matrix:pYij = Q

    Yij(1) in the z-domain from the above equations. And

    the sojourn time distribution: HYij (z) =QYij(z)

    pYij. Finally, taking

    the numerical inverse z-transform to get hYij(k).2) Reverse downsampling problem: In this subsection, we

    provide an analytical solution to answer the beginning questionthat given the observed random process after down-sampling,how much of the statistics of the original sequence we canrestore.

  • Proposition 4. There are infinitely many solutions for thereverse down-sampling problem, i.e. there is a singularity issueand we cannot recover the statistics of the original sequenceafter downsampling.

    Proof: Suppose the statistical properties of the Y sequenceare known, namely the transition probability matrix and thesojourn time distribution of the Y sequence. We want tofind the statistics of the X sequence. We would like tofind transition probability matrix and hXij (k) in terms of thedown-sampling factor m and statistics of the Y sequence.Writing all the quantities in terms of the semi-Markov kernelfrom Equations (2) through (7) and using the z transformand the properties of the down-sampling with factor m byProposition 3, we need to solve the functional equation (15).From Proposition 3 we solve for PXij (z) and then from thatwe can find QXij (z) using the same method as in the forwardproblem. The first question to address involves the functionalequation (15). Let Gij(z) be a function on complex domainthat satisfies: 1m (Gij(z

    1m )+

    m1n=1 Gij(z

    1m exp(i2pinm )) = 0.

    Also, from the complex domain, we always have

    z1m +

    m1n=1

    z1m exp(

    i2pinm

    ) = 0. (17)

    We want to find all the Q functions that satisfy the above twoproperties. One possible function Gij(z) can be given by:

    Gij(z) = apzp + ap1zp1 + ..+ a1z (18)

    where p < m. And this is one of the many solutions thatwe can find for Equation (15). Hence, we have one of thesolutions for functional equation (15) as PXij (z) = P

    Yij (z

    m)+Gij(z). Therefore, we can construct infinitely many solutionsto the restoration problem on the statistics of the originalsemi-Markov processes, based on the downsampled sequencestatistics only, thereby demonstrating the singularity issue inrestoring the pre-downsampling statistics.

    IV. NUMERICAL RESULTS FROM FORWARD ANDBACKWARD PROBLEMS

    A. A case study for the dowm-sampling problemWe apply our approach in z-transform domain for a case

    study of 3-state semi-Markov process. Again our results canbe applied to general n states semi-Markov process but wepresent here a simple case to illustrate our theoretical results.Suppose that a semi-Markov process with 3 states are givenby the following transition matrix:

    P =

    [0 0.4 0.61 0 01 0 0

    ]There are 3 states in the process: state A and B are busy statesand C is an idle state. A, B, and C are states whose durationor number of symbols is subject to duration distribution.Suppose the duration distribution of each state is given by thefollowing table: The probability mass function or the sojourn

    TABLE I. DURATION PARAMETERS

    State Duration distribution Parameter(s)A Poisson =15B Poisson =20

    idle Poisson =9

    time distribution of state i can be given by hi(k) =ki exp(i)

    k!From Equations (1) through (7) we can write the q matrix as:

    q(k) =

    0 0.415kexp(15)

    k! 0.615kexp(15)

    k!20kexp(20)

    k! 0 09kexp(9)

    k! 0 0

    For the downsampling factor of m = 4, the analytical solutionfor the transition matrix of the embedded Markov chain forthe downsampled sequence (Y ) is

    PYanalytical =

    [0 0.7375 0.2625

    0.945 0 0.0550.872 0.128 0

    ](19)

    The stationary distribution of the semi Markov chain Y (k)is pi= [0.1784 0.4161 0.4055]. We set up the simulation usingMATLAB. First we generate the X sequence from its transitionprobability matrix and sojourn time distribution as specifiedfrom above. Then we perform the down-sampling operation bykeeping the first symbol, deleting the next 3 symbols and so onto form the X sequence. By estimation, we find the followingtransition matrix for simulation result of the Y sequence

    PYsim =

    [0 0.7651 0.2349

    0.9101 0 0.08990.8912 0.1188 0

    ]We adopt the Frobenius norm as the metric to measure thedistance between two matrices PYanalytical and P

    Ysim [9]. The

    squared or Frobenius norm of a matrix Ann is defined as A F or Frobenius norm of A

    A 2F=ni=1

    nj=1

    a2ij = trace(ATA).

    where AT is the transpose of matrix A. Our result indicatesthat the Frobenius norm of PYanalytical is 1.0197 and theFrobenius norm of PYanalytical PYsim equals to 0.075 or7.36% relative error. Next, we compared the numerical sojourntime distribution of the down-sampled sequence Y with thesimulation results from the analytical solution above. Thefollowing diagrams show the histogram of the analytical resultand the simulation results for the case m = 4.

    Fig. 4. Histogram of the sojourn time distribution h01 for simulation vsanalytical of the Y sequence

    We illustrate above histogram of the sojourn time distributiononly for a particular transition from state C to state A, as anexample and the rest of the sojourn time distributions can becompared similarly. Then we perform the comparison usingtwo-sample Kolmogorov-Smirnov test (KS test). The KS testreturns a test decision for the null hypothesis that the datain vectors from analytical and simulations are from the samedistribution while the alternative hypothesis asserts that theyare from different distributions [10]. The remaining histograms

  • of the sojourn time distribution hYij , i, j {0, 1, 2} can becompared in the similar manner and the KS test results acceptthe null hypothesis that the data in vectors from analytical andsimulations are from the same distribution. We can confirmthat our analytical results together with the simulation resultsare agreed within 5% confidence level.

    B. A case study for the reverse problemHere we set up our study by starting from the down-

    sampled sequence, namely Y sequence. We apply our ana-lytical results from section III to find two X sequences. Thenfrom each of these sequences, we compare their correspondingstatistics, i.e. the transition probability matrix and sojourntime distribution. We want to show an example that there aretwo X sequences with different statistics (different transitionprobability matrix and sojourn time distribution) that afterdownsampling can get the same Y sequence. The method touse is: from Equations (1) to (7) we want to show that there aretwo functions PXij (z) that satisfies functional equation (15) andeach gives a unique transition probability matrix and sojourntime distribution for the X sequence. The transition probabilityof the embedded Markov chain of the downsampled sequence{Y } is given by Equation (11). Given that the sojourn timedistribution of each of the three states as calculated from theforward case of the above example. From that following theprevious steps we can compute the transition function matrixPYij (z) for the Y sequence. From Equation (15) we proposetwo functions Qij(z)s that is Q1ij(z) = 0 and Q

    2ij(z) = z

    3.For the first case we have PXij (z) = P

    Yij (z

    2) and then thetransition matrix is:

    PX1analytical =

    [0 0.4315 0.5685

    0.9768 0 02320.9102 0.0898 0

    ]

    For the second case we have PXij (z) = PYij (z

    2)+ z3 and thenthe transition matrix is:

    PX2analytical =

    [0 0.7054 0.2946

    0.3426 0 0.65740.8901 0.1099 0

    ]Using this analytical results, the simulation with down-

    sample 4 and 200 number of runs, taking the average value ofresults, the transition probability matrix for the Y sequenceby simulation results are very close to the analytical solu-tion with the Frobenius norm of PYanalytical PYsim1 equalsto 0.045 or 2.36% relative error, the Frobenius norm ofPYanalytical PYsim2 equals to 0.045 or 1.65% relative error.The KS test confirms that the sojourn time distribution are thefrom the same distribution with 5% confidence interval for thetwo Y sequence.

    Next, we perform KS test for the two X sequences. Thereturned value of h = 1 indicates that KS test rejects thenull hypothesis at the default 5% significance level. And therest of the comparison show that the KS test rejects the nullhypothesis that the data in vectors from two X sequences arefrom the same distribution. We also find that the Frobeniusnorm of PX1 PX2 equals to 1.045 or 39.65% relative error.

    Remark We can confirm that there are at least two Xsequences for the resulting Y sequence with downsamplingfactor of 4. Such singularity issues persists for any down-sampling factor m > 1, which means that the statistics ofthe original discrete time semi-Markov process can not be

    recovered with a unique solution given its down-sampled semi-Markov sequence, thereby creating a singularity issue due todownsampling.

    V. CONCLUSIONIn this paper, we present the problem of down-sampling a

    discrete time semi-Markov random process and its applicationto understand the statistical properties fast-switching behaviorsof wireless Radio Frequency traces in dynamic setting. Theresulting process, after down-sampling operation, is also asemi-Markov process or Markov renewal process. We presenta novel approach to find the statistical properties for down-sampling a discrete time semi Markov process. We showin our theoretical and numerical results that the statistics ofthe original sequence cannot be recovered, which impliessingularity issue is resulted in recovering the statistics of theoriginal semi-Markov processes because of downsampling.The findings in this paper could help us understand moreprofoundly the fundamental limitation in learning the nodesactivity patterns in wireless networks under the widely usedsemi-Markov models when downsampling is necessary dueto concerns of computational cost, as what we experiencedin our experiments. In our future works, we will present theresults on the study of the effects of hidden issues, as well assuper-position and mis-labeling on the statistics of the resultingprocess.

    REFERENCES[1] D. Jung, T. Teixeira, A. Barton-sweeney, and A. Savvides, Model-

    based design exploration of wireless sensor node lifetimes, in InEuropean conference on Wireless Sensor Networks (EWSN) 07, 2007,pp. 2931.

    [2] S. Geirhofer, L. Tong, and B. M. Sadler, Dynamic spectrumaccess in wlan channels: Empirical model and its stochasticanalysis, in Proceedings of the First International Workshopon Technology and Policy for Accessing Spectrum, ser. TAPAS06. New York, NY, USA: ACM, 2006. [Online]. Available:http://doi.acm.org/10.1145/1234388.1234402

    [3] M. Kadiyala, D. Shikha, R. Pendse, and N. Jaggi, Semi-markov processbased model for performance analysis of wireless lans, in PervasiveComputing and Communications Workshops (PERCOM Workshops),2011 IEEE International Conference on, 2011, pp. 613618.

    [4] B. Carroll, Learning and identification of wireless network internodedynamics using software defined radio, Masters thesis, LSU, May2013.

    [5] P. D. Vu, Graphical models in characterizing the dependency relation-ship in wireless networks and social networks, Master Thesis, LSU,August 2014.

    [6] M. J. Johnson and A. S. Willsky, Bayesian nonparametrichidden semi-markov models, J. Mach. Learn. Res., vol. 14,no. 1, pp. 673701, Feb. 2013. [Online]. Available:http://dl.acm.org/citation.cfm?id=2502581.2502602

    [7] V. Barbu and N. Limnios, Semi-Markov Chains and Hidden Semi-Markov Models toward Applications: Their Use in Reliability and DNAAnalysis. New York, NY, USA: Springer, 2008.

    [8] M. V. Jelena Kovacevic, Wavelets and Subband Coding. Prentice Hall,USA: Springer, 1995.

    [9] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed, 3rd ed.Baltimore, MD: Johns Hopkins, 1996.

    [10] F. J. Massey, The kolmogorov-smirnov test for goodness of fit,Journal of the American Statistical Association, vol. 46, no. 253, pp.6878, 1951.