reliable estimation of minimum embedding dimension through statistical analysis...

Reliable Estimation of Minimum EmbeddingDimension Through Statistical Analysis of

Nearest Neighbors

David ChelidzeProfessor, Member of ASME

Nonlinear Dynamics LaboratoryDepartment of Mechanical, Industrial and Systems Engineering

University of Rhode Island, Kingston, RI 02881, USAEmail: [email protected]

False nearest neighbors (FNN) is one of the essential meth-ods used in estimating the minimally sufficient embeddingdimension in delay-coordinate embedding of deterministictime series. Its use for stochastic and noisy deterministictime series is problematic and erroneously indicates a finiteembedding dimension. Various modifications to the origi-nal method have been proposed to mitigate this problem,but those are still not reliable for noisy time series. Here,nearest-neighbor statistics are studied for uncorrelated ran-dom time series and contrasted with the corresponding deter-ministic and stochastic statistics. New composite FNN met-rics are constructed and their performance is evaluated fordeterministic, stochastic, and random time series. In addi-tion, noise-contaminated deterministic data analysis showsthat these composite FNN metrics are robust to noise. AllFNN results are also contrasted with surrogate data analysisto show their robustness. The new metrics clearly identifyrandom time series as not having a finite embedding dimen-sion and provide information about the deterministic part ofstochastic processes. These metrics can also be used to dif-ferentiate between chaotic and random time series.

1 Delay-Coordinate EmbeddingThe embedding theorem [1] has paved the way to phase-

space-based nonlinear time series analysis. It requires onlytwo parameters: the embedding dimension and the delaytime [2]. Given a deterministic scalar time series {xi}N

i=1 ∈R sampled uniformly with ts sampling time, reconstructedphase-space vectors, {xi}

N−(d−1)τi=1 ∈ Rd , are given by

xi = [xi,xi+τ, . . . ,xi+(d−1)τ]T , (1)

where d is the embedding dimension, τts is the actual delaytime, and τ is a positive integer or digital delay time. Forthe appropriate choice of delay parameters (d and τ), the re-sulting reconstruction is diffeomorphically equivalent to theoriginal dynamical system’s phase space [2]. There are othergeneralizations to delay-coordinate embedding with variabledelays [3, 4], and they can yield lower-dimensional embed-dings (e.g., in case of quasiperiodic signal [4]). Here, wedo not consider these generalizations and focus on obtainingoptimal constant delay embeddings.

The estimation of optimal delay time can be approachedfrom a purely geometrical perspective [5] or by consideringlinear/nonlinear (auto)correlations in the time series [6, 7].In some cases, selection of the optimal delay time and em-bedding dimension cannot be decoupled (e.g., when mu-tual information changes with the embedding dimension [8]).Some studies have advocated focusing on the so-called em-bedding window (τe = (d−1)τ) that describes the total timespan covered by all coordinates in each reconstructed phase-space points [9–12]. Others use model-based parameter es-timation [12–16] to estimate both optimal delay and embed-ding dimension simultaneously. More recently, the optimalshort-term prediction metric (based on artificial neural net-work model) was used for estimating both appropriate de-lays and embedding dimension [17]. A similar idea, in con-junction with average mutual information for the delay es-timation, was used in estimating embedding dimension inRef. [18]. Some of these methods are applicable in cer-tain scenarios, others have some deficiencies, as discussedin Refs. [8, 19, 20]. However, to focus on estimating mini-mum embedding dimension, we either assume that the delayis determined a priori or chosen conservatively so that a spa-tial decorrelation transform [21] can be used. This restrictionallows for a compact comparison between different method-

David Chelidze, CND-16-1488, 1

A

A0

B

B0

C

C 0 D0

D

Fig. 1. The basic idea of FNN is illustrated by a three-dimensionalclosed curve (thick blue line) and its two-dimensional projection (thindashed red line). The thin dotted lines show two-dimensional projec-tions onto other planes. The points that are false neighbors in theprojection (A′ and B′) are far apart in higher dimensions (A and B),while the true neighbors (C′ and D′) are close neighbors (C and D).

ologies considered in this paper.This paper focuses on one of the popular methods used

for estimating the minimally sufficient embedding dimensioncalled false nearest neighbors (FNN) [22]. There are othermethods that utilize convergence in the estimates of somenonlinear measures or metrics (e.g., [23]) but they are ei-ther more complex or cumbersome in the applications. Theidea behind the FNN [22] method is based on the unique-ness property of a dynamical system’s phase-space trajec-tory. The nearest neighbor of a point in a d-dimensionalembedding is labeled false if the pair of points are closeonly due to the projection from a higher-dimensional (e.g.,(d +1)-dimensional) phase space. Thus, the FNN will sepa-rate if the data is embedded into a (d+1)-dimensional space,while the true neighbors will remain close (see Fig. 1 for thegraphical illustration of this concept). If one is able to de-tect all the FNN, then the minimally sufficient embeddingdimension can be identified as the least dimension needed toachieve zero fraction of the FNN.

There are three distinct variations of the FNN algorithm.We label them as: the original or Kennel [22], improved [24],and reliable [21] algorithms. Variations of them are fre-quently used in very diverse fields such as stock market priceforecasting [25], astrophysics [26], engineering [27], and bi-ology [28]. We will discuss each of the algorithms and theirproperties in what follows with applications to deterministic,stochastic, and uncorrelated random time series. Particularattention will be given to the observed deficiencies in each ofthe algorithms and the proposed solutions will be discussed

and tested. The effect of additive noise in deterministic timeseries on the performance of the algorithms will also be con-sidered.

The paper begins by discussing the data sets we usein the analysis, which is followed by the description of thethree main algorithms and their performance on these testdata sets. To explain the observed deficiencies and prob-lems, detailed statistical analysis of nearest neighbors is pre-sented. Finally, new composite FNN metrics and algorithmsare proposed based on this statistical analysis, and their per-formance and robustness is evaluated on the test data sets.

2 Data Sets Used in the AnalysisFor the purely white random time series, we generate

6×104 point samples of normally and uniformly distributedrandom numbers. Both uncorrelated random data sets arenormalized to have unit variance, and the arbitrarily chosenτ = 5 (the choice is arbitrary since there are no nonzero au-tocorrelations in these time series for τ 6= 0) is used in delay-coordinate embedding. In what follows, we will label theserandom data sets as Normal and Uniform, correspondingly.

For the stochastic data containing deterministic compo-nents, we use an autoregressive process of order two (a dis-cretized noise-driven damped oscillator, just like in [24]):

xn+1 = (2−ω2−ρ)xn +(ρ−1)xn−1 +ηn , (2)

where ηn is white noise, ω = 2π/20 is the frequency, and atotal of four 6×104-point time series are generated for eachvalue of ρ = [0.02,0.05,0.2,0.5]. The first two values of ρ

represent the small amount of stochasticity in mainly deter-ministic data, which can also be viewed as deterministic datacontaminated by multiplicative noise. In contrast, the lat-ter two values of ρ reflect dominant randomness. In whatfollows, each of these time series will be labeled AR(0.02),AR(0.05), AR(0.2), and AR(0.5), respectively. The delayτ = 5 is used for all four time series as dictated by the naturalfrequency of the oscillator, autocorrelation, and the averagemutual information (see Fig. 2). The first two time serieshave really long correlation times, while the last two havemuch shorter correlation times of approximately 25 and 15,respectively.

In addition to the white random and correlated stochastictime series, we also include data for two chaotic time series.The first is generated from the Lorenz equation [29]:

x=−83

x+yz , y=−10(y− z) , & z=−xy+28y−z . (3)

The Lorenz time series is sampled with a ts = 0.02 time in-terval and a total of 6×104 points are recorded starting from


Delay, =0 10 20 30

Auto

corr

elation

-1

-0.5

0

0.5

1

Delay, =0 10 20 30

AM

I(b

its)

0

2

4

6

8DuffingLorenzAR(0.02)AR(0.05)AR(0.2)AR(0.5)

Fig. 2. Linear and nonlinear correlations in chaotic and stochasticdata sets

-10 0 10

-15-10-5051015

-2 0 2-2

-1

0

1

2

-50 0 50-50

0

50

-20 0 20

-30-20-100102030

-10 0 10

-15-10-5051015

-10 0 10

-10

-5

0

5

10

Fig. 3. Reconstructed phase portraits for Lorenz (a), Duffing (b),AR(0.02) (c), AR(0.05) (d), AR(0.2) (e), and AR(0.5) (f)

the initial condition (x0,y0,z0) = (20,5,−5). In the follow-ing analysis, we only use the y coordinate, with τ = 8 delayas indicated by the mutual information (Fig. 2), and label thecorresponding results as Lorenz.

The second chaotic time series is generated using thedouble-well Duffing equation [30]:

x+0.2 x− x+ x3 = 0.33cos t . (4)

The total of 6×104 steady state response points are recordedusing a ts = 0.1 sampling time interval. In the analysis, onlythe x time series is used with τ = 14 delay, also indicated bythe mutual information (Fig. 2), and the results are labeled asDuffing. The phase portraits of deterministic and correlatedstochastic time series are shown in Fig. 3.

For the Lorentz and Duffing equations, using just xvariable time series, one expects to need three- and four-dimensional embeddings, respectively, to eliminate all theFNNs [19, 20, 31]. Strictly speaking, the embedding theo-rem [2] guarantees the embedding if the dimension is greaterthan twice the fractal dimension of an attractor (i.e., for d = 5for both Lorenz and Duffing). Table 1 shows the standard ex-pected embedding parameters for the considered determinis-tic time series. The white noise time series are theoreticallyinfinite dimensional. However, for our finite time series, allthe false nearest neighbors will be eliminated if the embed-

ding dimension is equal to the data size (i.e., we have onlyone embedded data point). The same is also true for the cor-related stochastic time series in a strict sense. However, thesecorrelated random data all possess a deterministic dampedoscillator backbone that is low-dimensional and is guaran-teed to be embedded in three dimensions, while just two-dimensional embedding should generically work, too. Thus,one would expect the decrease in FNNs due to the presenceof a deterministic structure in the data, especially for largecorrelation times.

Table 1. Expected embedding parameters for the deterministic data

System Delay Time, τ Embedding Dimension, d

Lorenz 8 3

Duffing 14 4

Additive noise effects are investigated by adding fourdifferent noise levels to the Lorenz and Duffing data. In par-ticular, 5 %, 10 %, 20 % and 40 % noise (determined by theratio of the standard deviation of noise σn over the standarddeviation of signal σx) are added to the time series and the re-sults are labeled accordingly. By thw conventional definitionof signal to noise ratio (SNR) (SNRdB = 10log10 σ2

x/σ2n),

these correspond to 26.02 dB, 20 dB, 13.98 dB, and 7.96 dBSNR, respectively.

All the data sets are successively embedded into d =1, . . . ,20 dimensions. For each d-dimensional embedding wefind the nearest-neighbor distance for each point, as well asthe distance between the (d +1)-th components of the samepair of points. In addition, results are contrasted with a sur-rogate time series analysis similar to the one used in [21]. In-stead of going through an exercise of frequency and time do-main randomization and normalization procedures [32], theyhave used a much simpler strategy. For each d-dimensionalnearest neighbor pair, the (d + 1)-dimensional coordinatedistance is estimated by randomly selecting one of the co-ordinates from all the data. This is justifiable, if all the linearcorrelations are absent from the components of the recon-structed phase space (the needed coordinate transformationis discussed in Section 3.3.1), where the noise would ap-pear to be effectively white. Other algorithmic details aredescribed in the following sections.

3 Overview of False Nearest Neighbor Algorithms3.1 The Original Algorithms

The original FNN method was proposed by Kennel et al.in 1992 [22]. Working in each d-dimensional reconstructedphase space, for each point xi its nearest neighbor point x j(i)


is identified as false if:

∣∣xi+dτ− x j(i)+dτ

∣∣> r∥∥xi−x j(i)

∥∥ , (5)

where r is an a priori fixed threshold value (typically chosento be 10 in Ref. [22]).

To simplify notation and streamline discussion, we de-fine the following variables:

δi ,∣∣xi+dτ− x j(i)+dτ

∣∣ , and εi ,∥∥xi−x j(i)

∥∥ . (6)

In what follows, we will also imply the functional depen-dence of index j on the index i (i.e., x j is always the nearestneighbor to xi using the appropriate nearest-neighbor met-ric). Then, the probability that any point xi and its nearestneighbor x j are FNN can be written as:

fnn(d;r), Prob{δi > r εi} , (7)

and we refer to it as original fraction. Then, the idea is thatthe smallest dimension d for which fnn(d;r) reaches zero isthe minimally sufficient embedding dimension. The origi-nal fraction provides misleading results for the white randomand correlated stochastic time series for which it erroneouslyshows low embedding dimensions and the surrogate analysisdoes not provide the differentiation [21]. In fact, results foruncorrelated random time series and for large amplitude au-toregresive processes are virtually identical and indicate theexistence of a low-dimensional attractor. This is a problemfor noisy deterministic time series, since we cannot defini-tively attribute zero FNN to the actual existence of a low-dimensional attractor.

Hegger and Kantz [24] were first to clearly identify theshortcomings of this definition when noise or noisy deter-ministic data were considered. For uniformly distributedwhite noise, the original fraction fnn(d;r) is derived to bea function of a threshold r as:

fnn(d;r) = 1−2〈εi〉 r+⟨ε

2i⟩

r2 , (8)

where εi = εi(d) is the distance between the nearest neigh-bors in d dimensions. For uncorrelated noise time se-ries of size N, 〈εi〉 ' N−1/d . Therefore, this FNN frac-tion approaches zero when 〈εi〉 ' r−1 or when N ' rd .To always differentiate noise (which is inherently infinite-dimensional) from deterministic data (finite-dimensional),one always needs N ≥ rd to get the nonzero original frac-tion for uncorrelated noise time series. Unfortunately, this isnot always feasible for experimental data.

To address this problem, in [22], the nearest neigh-bors were also counted as false if the distance in (d + 1)-dimensions was greater than the a times σx of the data (e.g.,a = 2 in [22]). Thus, the FNN condition changed to:

fnn(d;r), Prob{(δi > r εi) ∨

(δ

2i + ε

2i > a2

σ2x)}

, (9)

which we call the Kennel fraction.Figure 4 shows the results of this algorithm applied to

our data sets. This solution seems to improve performancefor the white random time series by increasing the FNN frac-tion for higher embedding dimensions. However, it exhibitsa dip in the FNN fraction at lower dimensions (i.e., it un-derestimates the FNN fraction for large r and low d). Thisdip is also present for the correlated stochastic time seriesespecially for ones with the larger stochastic components.However, the surrogates also show a similar dip and divergefrom actual data after the minimum is reached. Even withthis divergence the correlated stochastic data is hard to in-terpret and this solution is inadequate [21]. This will alsobe problematic for noisy deterministic time series, since thisdip cannot be uniquely attributed to the existence of a low-dimensional attractor. In addition, [22] points out that thissolution may introduce false neighbors in large d for an in-sufficient amount of deterministic data.

3.2 The Improved AlgorithmIn [24], it is supposed that if the distance between the

nearest neighbors is larger than σx, these points cannot beconsidered true nearest neighbors and should be discardedin computation. In this approach, the nearest neighbors forwhich the d-dimensional distances are greater or equal to 1/rtimes σx are disregarded and the original fraction is usedfor FNN. Unfortunately, for large values of d and r, thiscauses the number of available points to decrease drasticallyto zero for both white random and correlated stochastic datasets. The current algorithm for false nearest neighbors, as de-scribed in TISEAN package [19,33] has the following form:

fnn(d;r),∑

N−(d−1)τi=1 Θ(δi− rεi)Θ(σx− rεi)

∑N−(d−1)τi=1 Θ(σx− rεi)

, (10)

where Θ is the Heavyside function and fnn is referred to asthe TISEAN fraction. They claim that this solution for whiterandom data leads to 50–60 % FNN for large N, indepen-dent of choice of d and r. However, in our white randomand dominantly stochastic test cases, the number of pointssuitable for averaging drops to zero precipitously for larged values. In addition, it indicates low-dimensional embed-ding for stochastic or random data and does not provide anydifferentiation with the surrogate analysis. Clearly this is-sue needs to be resolved for the algorithm to be useful for


d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(a) d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(b)

d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(c) d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(d)

d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(e) d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(f)

d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(g) d0 5 10 15 20

f nn(d

;r)

0

0.2

0.4

0.6

0.8

1

(h)

Fig. 4. Kennel fraction of FNN using Eq. (9) for Lorenz (a), Duffing(b), Normal (c), Uniform (d), AR(0.02) (e), AR(0.05) (f), AR(0.2) (g),and AR(0.5) (h). The gradual decrease in the FNN fraction is precip-itated by incrementing r from zero to 20 by two. Lines with circlesreflect the corresponding surrogate data results.

noisy or stochastic time series containing deterministic com-ponents.

3.3 The Reliable AlgorithmIn [21], some of the ideas in [24] were further expanded

and a new FNN algorithm was proposed in conjunction withnew and interesting false nearest strands (FNS) idea. Thesenew FNN and FNS methods were shown to be usable fornoisy or stochastic time series. In [21] several important im-provements to the FNN algorithm were proposed: (1) onlynearest neighbors that are temporarily uncorrelated are con-sidered to focus exclusively on spatial proximity; (2) insteadof FNN, FNS were advocated to make sure only true nearestneighbors were considered in the calculations even for noisydata; and (3) a spatial decorrelation transform was introducedto deal with linear correlation bias due to small delays usedin the reconstruction.

3.3.1 Spatial DecorrelationThe spatial decorrelation step is aimed at negating the

effect of the too-small delay used in the reconstruction,

which makes the FNN fraction small irrespective of the ac-tual dynamics. In addition, this step makes the method in-sensitive to the appropriate choice of the time delay, unless itis chosen to be unreasonably large.

The basic idea of the decorrelation is to counteractthe bunching of all reconstructed points near a thin tubealong the hyperdiagonal of a phase space. Given the origi-nal (d+1)-dimensional reconstructed phase-space trajectoryY ∈ R(d+1)×[N−τ(d+1)], where the i-th column is given by thei-th delay vector xi ∈ Rd+1, Y is first transformed into itsproper orthogonal space Y using singular value decomposi-tion [34]:

Y =UCV T , ⇒ Y =UTY , (11)

Then Y is rotated in the phase space so that the first d-coordinates are only a function of the first d coordinates ofthe original phase space Y . To accomplish this we take thelast column of Y , u ∈ Rd+1, and then the needed orthogonaltransformation is given by the following Householder ma-trix [34]:

Q = H (u−|u|ed+1) , (12)

where ed+1 is the unit vector in the (d + 1) direction. Thenthe decorrelated phase space is given by:

Y = QY . (13)

In our test samples, the choice of delay time is close tooptimal and the use of decorrelation does not provide anysignificant benefits, but in practice it can be a differentiatingfactor. Thus, we have included this step in the results pre-sented from now on.

3.3.2 Temporal DecorrelationAs opposed to the TISEAN fraction, Ref. [21] advocates

to look at just the distance between the (d+1)-th coordinatesof the nearest neighbors. If we only consider the FNN- andnot the FNS-based metric we will get:

fnn(d;s), Prob{δi > sσx} , (14)

where, to identify true nearest neighbors for each point xi, itsnearest neighbor x j is determined such that:

j = argminj

∥∥xi−x j∥∥ , subject to |i− j|> w , (15)

where w is a Theiler window [35,36] used to remove tempo-ral correlations. We call the metric defined by Eq. (14) the re-liable fraction. Ref. [21] discusses the appropriate choice of


the Theiler window and advocates to use two or three timesthe first minimum of average mutual information as a goodrule of thumb. Another alternative is to match the w timescale to the point at which the average mutual informationreaches its asymptotic zero value.

The results for Eqs. (14) and (15) using a Theiler win-dow of four times the first minimum of the average mutualinformation are shown in Fig. 5. For the deterministic timeseries, we observe an initial underestimation of the FNNfraction with the increase in s value. The fractions reachzero for an appropriate embedding dimension and then showgradual increase for small s values with the increase in d val-ues. This gradual increase is more pronounced for the Duff-ing data and is absent for larger s values. For the white ran-dom data sets the FNN fraction behaves as expected, show-ing linear decrease as s is increased, while remaining con-stant for all values of d. For the correlated stochastic datathis metric is more interesting and shows trends intermediatebetween the trends for the purely white random and deter-ministic data. Specifically, it can clearly be used with larges values to differentiate small stochastic component from thelarger deterministic data—plots (e,f). The results for the sur-rogates are clearly separated from the actual data, especiallyfor larger s values, and remain constant for all d. Thus, thereliable FNS fraction does not show any advantages over thereliable FNN fraction for noise-free data.

3.3.3 False Nearest Strands

In the same paper [21], the shortcomings of the previ-ous FNN methods for the deterministic data contaminated bysmall random noise are also identified. In this case, the near-est neighbors identified through the analysis may be proxi-mal not due to the deterministic dynamics or the projections,but due to noise. If the proportion of the misidentified near-est neighbors is large, then the algorithm will overestimatethe corresponding FNN fraction. To identify only true (interms of deterministic dynamics and projection only) nearestneighbors, they propose to consider nearest neighbor strands(NNS). The following procedure is used to identify NNS:

1. Identify nearest neighbor x j(i) for each point xi given byEq. (15).

2. If there is any pair of points (for a minimal k = k?) froma set of preceding nearest neighbors {xi−k,x j(i−k)}w

k=1such that j(i) = k? + j(i− k?), then [xi,x j(i)] are as-signed to the strand containing [xi−k? ,x j(i−k?)]. Oth-erwise, [xi,x j(i)] are assigned to a new pair of nearestneighbor strands.

3. After examining all the data, Ns number of NNS are ob-tained, each containing a set of nearest neighbor pointsSk (k = 1, . . . ,Ns). Then the k-th strand is designated

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(a) d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(b)

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(c) d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(d)

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(e) d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(f)

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(g) d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(h)

Fig. 5. Reliable FNN fraction of Eq. (14) applied to Lorenz (a), Duff-ing (b), Normal (c), Uniform (d), AR(0.02) (e), AR(0.05) (f), AR(0.2)(g), and AR(0.5) (h). The gradual decrease in the FNN fraction isprecipitated by incrementing s from zero to two by 0.1. Lines withcircles reflect the corresponding surrogate data results.

false if 〈δi〉k > sσx, where

〈δi〉k , 〈δi〉i∈Sk=

1‖Sk‖ ∑

i∈Sk

δi . (16)

Alternatively we can define Reliable FNS Fraction as:

fns(d;s), Prob{〈δi〉k > sσx} . (17)

The results of the FNS calculations for our test time se-ries are shown in Fig. 6. The decrease in the FNS fractionwith the increase in s is still present as in Fig. 5. For the deter-ministic data, some gradual increase in this fraction with theincrease in d for small values of s is also present. In addition,and in contrast with the reliable FNN fraction, this metricdoes not converge to zero as fast and indicates larger min-imal embedding dimension for the deterministic data. Theresults for the white random data are still indistinguishablefrom the ones for the surrogate data. However, the whiterandom data trends show a gradual increase in the FNS frac-tion not observed in Fig. 5. For the correlated stochastic data


d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(a) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(b)

d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(c) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(d)

d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(e) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(f)

d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(g) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(h)

Fig. 6. Reliable FNS fraction of Eq. (17) applied to Lorenz (a), Duff-ing (b), Normal (c), Uniform (d), AR(0.02) (e), AR(0.05) (f), AR(0.2)(g), and AR(0.5) (h). The gradual decrease in the FNS fraction is pre-cipitated by incrementing s from zero to two by 0.1. Lines with circlesreflect the corresponding surrogate data results.

sets, FNS trends are clearly delineated from the surrogatesfor the low level of stochasticity, and show results similar tothe white random data for the larger stochastic components.To accurately estimate embedding dimension in determinis-tic cases, one needs to set some threshold value for the FNSfraction since it does not reach zero as fast as in the reliableFNN fraction. This task is made easier by the clear separa-tion from the surrogates FNS fraction, which stays relativelylevel with the increase in d. For the larger s values, the Duff-ing FNS fraction also indicates lower than expected embed-ding dimension.

To see the advantage of these new metrics when dealingwith noisy data, we apply them to Duffing data with addi-tive Gaussian noise as shown in Fig. 7. For the reliable FNNalgorithm, the new fraction floor rises with the noise level,but it is still usable for small values of s and low 5 % noiselevel Fig. 7(a). At the same time, the surrogate’s FNN frac-tion stays constant with respect to d and asymptotes downwith the increase in s to approximately 50 % fraction. TheFNS data looks better for the low noise levels Fig. 7(b,d) andexhibits similar surrogate behavior. Therefore, it providesbetter separation between the surrogate and actual data FNS

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(a) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(b)

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(c) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(d)

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(e) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(f)

d0 5 10 15 20

7 f nn(d

;s)

0

0.2

0.4

0.6

0.8

1

(g) d0 5 10 15 20

7 f ns(d

;s)

0

0.2

0.4

0.6

0.8

1

(h)

Fig. 7. Reliable FNN (first column: a,c,e,g) and FNS (second col-umn: b,d,f,h) algorithm applied to Duffing data with 5 % (a,b), 10 %(c,d), 20 % (e,f), and 40 % (g,h) noise. The gradual decrease in thefractions is precipitated by incrementing s from zero to two by 0.1.Lines with circles reflect the corresponding surrogate data results.

fractions compared to the FNN fractions. One might be ableto estimate embedding dimension for the low noise levels inFig. 7 if an appropriate threshold value is set. However, bothreliable fractions cannot identify the embedding dimensionfor the larger noise levels.

Other solutions for the noisy data were also proposedsuch as the minimum description length principle [3], or uni-fied approach to phase space reconstruction [4]. They havetheir respective merits, but in both cases are much more com-plicated procedures than the FNN algorithms. Here, we wantto further expand on some ideas in [21] and propose an FNNalgorithm that is usable for noisy or stochastic time serieswith deterministic components. We begin with exploring thenearest-neighbor statistics for deterministic, white random,and correlated stochastic time series.

4 Nearest-Neighbor StatisticsWe hypothesize that the problem with the classical

FNNs lies with the definition of the fraction of FNNs. Toinvestigate this question, let us investigate nearest-neighborstatistics for our test time series. First, consider the ex-


0 5 10 15 200

1

2

3

Model

DuffingLorenzNormalUniformAR(0.02)AR(0.05)AR(0.2)AR(0.5)

0 5 10 15 200

1

2

3

Model

DuffingLorenzNormalUniformAR(0.02)AR(0.05)AR(0.2)AR(0.5)

Fig. 8. Expected value of nearest-neighbor distances in d dimen-sions without (a) and with (b) temporarily decorrelated nearest neigh-bors

pected average nearest-neighbor distance 〈εi(d)〉 for whiterandom time series. Given a d-dimensional reconstructionof normally distributed random time series, the distance be-tween any two points will be distributed according to a χd(d-dimensional chi distribution [37]). We also propositionthat the expected value of the distance between the nearestneighbor points will be proportional to the mean of the corre-sponding χd distribution and to the average density of pointsin d dimensions (

√d/N)1/d . This will result in the following

approximation:

〈εi(d)〉 '√

2Γ((d +1)/2)

Γ(d/2)

(√d

N

)1/d

, (18)

where Γ is the gamma function. Figure 8(a) shows the ex-pected values of nearest-neighbor distances versus the em-bedding dimension for all eight test time series, without ac-counting for temporarily correlated nearest neighbors. Theincrease between the mean nearest-neighbor distance is moredrastic for the white random data than the deterministic timeseries, and Eq. (18) follows this increase for the white ran-dom time series very closely.

The deterministic time series show more power-law typeincrease after the needed embedding dimension is reached(d ∼ 3–4, in both case), and it has a considerably lower ratethan for the random data. The nearest-neighbor distance fordeterministic data is expected to grow according to local di-vergence rate eλmaxdτts , where λmax is maximal Lyapunov ex-ponent, after the minimum embedding dimension is reached.This can be expressed as:

〈εi〉 ≈ ε

(eλmaxdτts −1

), ∀d ≥ m , (19)

where ε is a constant reflecting the point density in thephase space. The parameters that follow the observed curvesclosely are: ε = 0.0404± 0.0008 and λmaxτts = 0.0689±0.0008 for the Lorenz data, and ε = 0.5110± 0.0742 andλmaxτts = 0.0167± 0.0020 for the Duffing data. The expo-nential growth in Eq. (19) is only expected until it reachesthe attractor size at which point it will saturate. Even if it is

allowed to grow unimpeded it will only cross Eq. (18) some-where d ∼ 60. Thus, when dealing with low-dimensionaldeterministic systems (e.g., d < 40) their expected nearest-neighbor distance will stay well below the distance expectedfor the white random data in the same dimension.

The correlated stochastic time series show behaviorsomewhere in the middle of the white random and determin-istic nearest-neighbor distance trends. They start close to thewhite random at low dimensions and then diverge to smallerdistances for the larger embedding dimensions.

The same plot was regenerated for the distance be-tween temporarily decorrelated nearest neighbors as shownin Fig. 8(b). The main observed differences are that the largestochastic component data now closely follows the white ran-dom distances, while the small stochastic component datashows the intermediate behavior. In addition, the determin-istic data have more pronounced trends clearly showing dif-ferences in the divergence rates. Fitting the Eq. (19) to thesecurves gives the following parameters: ε = 0.0333±0.0007and λmaxτts = 0.0790± 0.0008 for the Lorenz data, andε = 0.0425± 0.0020 and λmaxτts = 0.1212± 0.0021 for theDuffing data. Thus, we observe: (1) temporal correlationsconsiderably stint the divergence rates, especially for theDuffing data; and (2) divergence exponents after removingtemporal correlations for Lorenz (λmax ≈ 0.5) and Duffing(λmax ≈ 0.1), while not being suitable for estimating max-imal Lyapunov exponents, are close to their expected val-ues [31] (experimentally estimated largest Lyapunov expo-nents from data are λ1 ≈ 0.825 for Lorenz and λ1 ≈ 0.134for Duffing).

Now, let us consider the (d + 1)-th coordinates of thenearest neighbors in d dimensions, which are still uncor-related random numbers for the white random time series.Thus, the distance between these (d + 1)-th coordinates fornormally distributed variables (xi ∼ N [0,σx]) follows a chidistribution [37]1, and the corresponding expected value canbe expressed as:

〈δi〉= E[δi] =2Γ(1)Γ(1/2)

σx ≈ 1.1284σx . (20)

Now, it is easy to show that for the uniformly distributedvariables (i.e., xi ∼U[−a,b]), this expected value becomes:

〈δi〉=2√3

σx ≈ 1.1547σx . (21)

We conclude that for some general uncorrelated random timeseries the average distance between the (d + 1)-th coordi-nates is expected to be slightly greater than the correspond-ing σx (∼ 1.12–1.16 σx).

1If xi ∼N [0,σx], then xi− x j ∼N [0,2σx], and |xi−x j|2σx

∼ χ1.


A similar relationship is expected for the colored-noisetime series if the delay used in the reconstruction is greaterthan the correlation time. However, for a deterministic timeseries we expect this (d + 1) component’s distance to besmall for true nearest neighbors. The estimation results usingour synthetic time series without removing temporal correla-tions are shown in Fig. 9(a), where we plot the mean value ofthis distance at each embedding dimension. As we can see,the analytical estimates are very close to the numerical re-sults for the white random data sets, which are clearly largerthan the ones for the deterministic time series.

For the correlated stochastic data we have an initial rapiddrop in the mean (d+1) components’ distance followed by aslower gradual decrease. The initial drop is due to the deter-ministic correlations in the data caused by small delay and ismore pronounced for weakly stochastic sets. The consequentslow decrease is due to the longer temporal correlations inthe data, and it is more drastic for the strongly stochastic sets.All the correlated stochastic trends seem to level our at about30 % of σx. The (d + 1)-th distances for the deterministicdata start near 0.5∼ 0.6 FNN fraction and drop down rapidlyto near zero at the true minimal embedding dimension andthen have slow gradual increase Fig. 9(a). This gradual in-crease is caused by the maximal local divergence rate of thenearby trajectories, and is expected to follow eλmaxdτts similarto Eq. (19).

The same plots were regenerated for the temporarilydecorrelated nearest-neighbor statistics as shown in Fig. 9(b).The decorrelation had big effect on the correlated stochas-tic data: they were pushed up to the white random trendsfor the correlated data with larger stochastic components.In contrast, the AR(0.02) and AR(0.05) cases show trendsmore similar to the deterministic data, by exhibiting initialdrops followed by the very slow increase. All the stochasticdata stays above 0.45σx level and can be clearly differenti-ated from the deterministic data. The deterministic trendsdid not change during the initial drop to zero. However, thefollowing gradual increase has a more pronounced character,especially for the Duffing data. This increase is more in lineof expected exponential divergence caused by the maximalLyapunov exponent, and was not as pronounced before dueto temporarily correlated components staying close irrespec-tive of embedding dimension.

The combination of the considered distances, ε2i (d) +

δ2i (d), would give (d + 1)-dimensional distance squared for

the nearest neighbors in d dimensions. The results of thesecalculations are shown in Fig. 10. The trends in this figureare similar to the ones for δi, but all show gradual increasedue to εi. Thus, 〈δi〉 trends in Fig. 9 have a better differentia-tion between the deterministic and random or stochastic data,which remain flat as the embedding dimension is increased.

The statistical analysis described in this section can besummarized as:

0 5 10 15 2010-2

100

0 5 10 15 2010-2

100

(b)(a)

LorenzDuffingNormalUniform

AR(0.02)AR(0.05)AR(0.2)AR(0.5)

LorenzDuffingNormalUniform

AR(0.02)AR(0.05)AR(0.2)AR(0.5)

Fig. 9. Expected value of distances between the (d+1)-th coordi-nates of the nearest neighbors in d dimensions without (a) and with(b) temporarily decorrelated nearest neighbors

0 5 10 15 20

10-2

100

0 5 10 15 20

10-2

100

Fig. 10. Expected value of nearest-neighbor distances in d +1 di-mensions without (a) and with (b) temporarily decorrelated nearestneighbors (the plot legends are not shown but they are the same asin Figs. 8 and 9)

0 5 10 15 2010-2

10-1

100

clean5% noise10% noise20% noise40% noise

0 5 10 15 20

10-1

100

clean5% noise10% noise20% noise40% noise

Fig. 11. Expected nearest neighbor distance 〈δi〉 versus the em-bedding dimension d for Lorenz (a) and Duffing (b) data

1. Temporal correlations significantly alter the observed re-sults and need to be accounted for in accurate FNN anal-ysis (removal of temporarily correlated points from theFNN analysis increases the differentiating power of thenearest neighbor statistics).

2. Of the considered temporarily decorrelated metrics,(d +1)-th coordinate distance (δi)—as shown in shownin Fig. 9(b)—provides the best indication of the minimalembedding dimension and superior separation betweenthe deterministic and random/stochastic trends, whichremain constant with the increase in d. Thus, this met-ric can be used directly for estimating the embeddingdimension for the noise free deterministic data. How-ever, for noise contaminated deterministic data, it losesits ability to clearly identify the minimal embedding di-mension as shown in Fig. 11.

3. Both 〈εi〉 and 〈δi〉 for deterministic data show expo-nential growth after minimal embedding dimension isreached. Thus, the original FNN fraction shown in


0 5 10 15 20

100

105 LorenzDuffingNormalUniformAR(0.02)AR(0.05)AR(0.2)AR(0.5)

0 5 10 15 20

100

105 LorenzDuffingNormalUniformAR(0.02)AR(0.05)AR(0.2)AR(0.5)

Fig. 12. Expected value of the Kennel ratio in d dimensions without(a) and with (b) temporarily decorrelated nearest neighbors

Fig. 12 for both without (a) and with (b) temporarilydecorrelated nearest neighbors is a good differentiator ofdeterministic data for low embedding dimensions. Set-ting a threshold value near 5 ∼ 10, this ratio with tem-poral decorrelation provides accurate estimates of FNNsfor d < 10, and clear separation of deterministic trends.

5 New Composite-Fraction-Based MetricsUsing the results of the statistical analysis described in

the previous section, and combining them with the originaldefinition of FNN, the following FNN composite fraction canbe advocated:

Rnn(d;r,s), Prob{(δi > r εi) ∨ (δi > sσx)} . (22)

This new ratio has two parameters: r is the same as for theoriginal fraction, while s is the same as for the reliable frac-tion. The first part is expected to serve as good indicationof FNNs for low d, while the second provides good indica-tion of minimal embedding dimension and best separation ofdeterministic data from random/stochastic for larger d.

This composite metric still does not address the prob-lem of nearest neighbors being close to each other due to thepresence of noise instead of either dynamics or projection.Therefore, we propose a procedure similar to the FNS, whereinstead of looking up just one nearest neighbor (NN) for eachpoint xi, we look up k NNs. Then, the true, temporarily un-correlated NN (i.e. the one close only due to dynamics orprojection) is identified by evaluating the preceding and fu-ture l states of all k NNs. In particular, we evaluate the aver-age distances between the base strand and all k NN strands oflength 2l +1. The strand with the smallest average distancefrom the base strand will contain the needed true NN.

5.1 New Composite-Fraction-FNN AlgorithmThe composite FNN procedure can be summarized as

follows:

1. For each point xi, identify its temporarily un-correlated k nearest neighbors {x j(m)}k

m=1 ={x j(m) : |i− j(m)|> w, m = 1, . . . ,k

}just like in

Eq. (15) to eliminate temporarily correlated NNs.

2. Then the true nearest neighbor x j is identified as:

j(m) = argminm

∑lq=−l

∥∥x j(m)+q−xi+q∥∥

(2l +1)∥∥x j(m)−xi

∥∥ , (23)

and used in Eq. (22) to determine FNN fraction.

Equation (23) indicates that the composite fraction willalso be a function of l and k parameters:

Rnn = Rnn(d;r,s, l,k) . (24)

It is reasonable to set the value of l ≤ w/2 in Eq. (23) sincewe are focusing on temporal correlations. The choice of kcan be influenced by many factors, such as noise level, den-sity of points in the embedding, and sampling time. Here weonly present results obtained with just one nearest neighborpoint (k = 1 and l = 0) for noise free data. In addition, we usek = 10 and l = τ for data with additive noise. However, weexplicitly set these values in what follows and do not studytheir influence on the quality of the results, which is left forfuture work. Therefore, Rnn = Rnn(d;r,s,τ,k) , Rnn(d;r,s)in what follows.

The results using Eq. (22) are shown in Fig. 13 for fixeds = 0.5 and in Fig. 14 for fixed r = 4 for all the test data. Forthe deterministic data, the algorithm works well for r≥ 4 andfor all s > 0.2 values, while for the same r values it providescorrect indication of high-dimensional data for the white andcorrelated stochastic time series. In particular, the actual re-sults are indistinguishable from the surrogate analysis for therandom data and the composite fraction never drops below50 %. For correlated stochastic data with a smaller stochas-tic component, we can clearly see that there is some deter-ministic component to the data, while for a large stochas-tic component, the results are very similar to the ones forthe white noise. However, there still is the small but clearlyidentifiable separation between the surrogate and correlatedstochastic data. This is especially evident in Fig. 13, wherefor larger values of d there is a clear convergence in the FNNcomposite fraction to the same constant value for the surro-gates, and to different values for all the correlated stochasticdata sets. The separation of this surrogate FNN fraction fromthe actual FNN fraction seems to scale with d and the rela-tive magnitude of the deterministic component. Therefore,we can use this separation between the actual and surrogatedata composite fractions as an indication of some determin-istic trends in the stochastic data. However, this needs furtherinvestigation for practical applicability in the analysis, withthe purpose of relating the observed differences with the sur-rogates to the relative strength of the deterministic compo-nents.

One may expect the removal of temporal correlationsand the new definition of NN points to positively alter the


0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

Fig. 13. Eq. (22) based composite fraction FNN algorithm with s =0.5 for Lorenz (a), Duffing (b), Normal (c), Uniform (d), AR(0.02) (e),AR(0.05) (f), AR(0.2) (g), and AR(0.5) (h). The gradual decrease inthe fractions is precipitated by incrementing r from zero to 20 by two.Lines with circles reflect the corresponding surrogate data results.

Kennel fraction results. However, the observed improve-ments were marginal: (1) marginally better indication ofminimal embedding dimension for the deterministic data; (2)removal of artificial separation between the surrogates andactual data for the stochastic time series; and (3) accentu-ation exponential divergence for the deterministic data forlarger embedding dimensions. Thus, the main deficienciesthat were identified previously still remain.

5.2 New-Composite-Fraction FNS AlgorithmWe have also modified the FNS algorithm to use a new

metric specified by Eq. (22). We use exactly the same proce-dure as before, except instead of Eq. (17) we use:

Rns(d;r,s), Prob{ 〈δi〉k > r 〈εi〉k ∨ 〈δi〉k > sσx } , (25)

where

〈εi〉k , 〈εi〉i∈Sk=

1‖Sk‖ ∑

i∈Sk

εi . (26)

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

Fig. 14. Eq. (22) based composite fraction FNN algorithm with r =4 for Lorenz (a), Duffing (b), Normal (c), Uniform (d), AR(0.02) (e),AR(0.05) (f), AR(0.2) (g), and AR(0.5) (h). The gradual decrease inthe fractions is precipitated by incrementing s from zero to two by 0.1.Lines with circles reflect the corresponding surrogate data results.

The results of calculation using Eq. (25) are shown inFig. 15 for fixed s = 0.5 and in Fig. 16 for fixed r = 10 forall the test data. For all the simulations we have used w = τ.The differences with the new FNN are similar to what wasobserved previously in the reliable algorithm. The decreaseof the FNS composite fraction to zero is slower and moregradual than for the FNN composite fraction for the deter-ministic data. However, the FNS composite fraction shows amuch clearer indication for d = 3 for the Lorenz and d = 4for the Duffing data as opposed to FNN, where we have toplot data on a log scale to clearly see the minima at the samevalues. This algorithm can be clearly used to differentiatethe deterministic versus stochastic or random data using sur-rogate analysis. In addition, for the smaller stochastic com-ponent data, the surrogate analysis also provides the neededindication of the deterministic component. However, this in-dication is absent for the larger stochastic components forwhich they are indistinguishable from the random data sets.


0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

Fig. 15. Eq. (25) based composite fraction FNS algorithm withs = 0.5 applied to Lorenz (a), Duffing (b), Normal (c), Uniform (d),AR(0.02) (e), AR(0.05) (f), AR(0.2) (g), and AR(0.5) (h). The gradualdecrease in the fractions is precipitated by incrementing s from zeroto two by 0.1. Lines with circles reflect the corresponding surrogatedata results.

6 Noise Robustness of Composite Fraction AlgorithmsTo test the applicability of these new algorithms for the

noisy deterministic time series we have tested them on bothLorenz and Duffing data contaminated by additive noise asdescribed before. The composite FNN fraction results fornoisy Lorenz and Duffing are shown in Figs. 17 and 19 fork = 1 NN and Figs. 18 and 20 for k = 10 NNs, respectively.Plots demonstrate that using ten NNs in the algorithm con-siderably improves on results using just one NN, by clearlyindicating where FNNs level out. Composite FNN frac-tions with ten NNs have a clear inflection point, or elbow,in their trends at the appropriate minimal embedding dimen-sion. This is especially clear for the fixed s plots with tenNNs, where once the embedding dimension is reached FFNfractions stay constant for each noise level. For large valuesof r and s, this algorithm is capable of correctly identifyingthe deterministic component and its dimensionality, even for40 % noise.

New composite FNS plots for noisy Lorenz and Duffingdata are shown in Figs. 21 and 22, respectively. Compared tothe composite FNN fraction plots, these do not show as clear

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

0 5 10 15 200

0.5

1

Fig. 16. Eq. (25) based composite fraction FNS algorithm withr = 10 applied to Lorenz (a), Duffing (b), Normal (c), Uniform (d),AR(0.02) (e), AR(0.05) (f), AR(0.2) (g), and AR(0.5) (h). The gradualdecrease in the fractions is precipitated by incrementing r from zeroto 20 by two. Lines with circles reflect the corresponding surrogatedata results.

of an indication when the FNS composite fraction levels outor has a clear inflection point at the appropriate minimal em-bedding dimensions. Instead, it has a more gradual transitionrequiring some careful thresholding to identify the appropri-ate minimal embedding dimensions. It should be noted thatthese results are similar to the ones obtained for the stochas-tic time series with small stochastic components. Therefore,we have to be careful when making any judgments aboutthe nature of additive noise versus systematic stochasticity.However, we can still clearly use these new methods in bothcases to identify the deterministic components in the data andtheir dimensionality.

7 Summary and ConclusionsWe have described currently used false nearest neigh-

bors (FNN) algorithms, and have showed their shortcom-ings using synthetic deterministic, white random, and col-ored stochastic time series. The effect of additive noiseon the performance of the algorithms for the deterministicdata was also identified and discussed. To understand the


0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Fig. 17. Eq. (24) based FNN algorithms with k = 1 for Lorenz andthe surrogate data with 5 % (a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h)additive noise levels. The gradual decrease in the fractions is precip-itated by incrementing r from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with circles reflect the correspond-ing surrogate data results.

reasons behind the observed deficiencies, we have studiedthe statistical properties of the nearest-neighbor metrics usedin the algorithms for all data types. It was observed thatthe expected (d + 1)-dimensional distance between the d-dimensional nearest neighbors was a good indicator of min-imal embedding dimension for the clean deterministic timeseries, but it lost its differentiation when even small additivenoise was introduced into the deterministic data.

The results of the nearest neighbor analysis were used toderive new algorithms based on the composite fraction thatovercome the deficiencies of the earlier methods, and are ap-plicable even for noisy deterministic data. The concepts ofFNN and false nearest strands (FNS) were also contrasted inthis new composite framework. In addition, the FNN frame-work was enhanced by the ability to identify points that arenearest neighbors only due to dynamics or projections by es-timating the minimal distance to the several nearest strandsin d-dimensions, which considerably improved the resultsfor the noisy data. While both concepts provide results thatcan be used for estimating the minimal embedding dimen-sion even in a noisy data set, the composite FNS algorithm

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Fig. 18. Eq. (24) based FNN algorithms with k = 10 for Lorenz andthe surrogate data with 5 % (a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h)additive noise levels. The gradual decrease in the fractions is precip-itated by incrementing r from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with circles reflect the correspond-ing surrogate data results.

provides a clearer indication of the minimal embedding di-mension for the noise free data compared to the compos-ite FNN. However, the FNN algorithm has a more differ-entiating power and provides less ambiguous results for thedeterministic data contaminated with additive noise. TheseFNN/FNS composite fractions are also suitable in evaluatingthe presence and relative magnitude of a deterministic com-ponent in a stochastic or noisy data set.

AcknowledgmentsThis paper is based upon work supported by the National

Science Foundation under Grant No. 1100031.

References[1] Takens, F., 1981. “Detecting strange attractors in tur-

bulence”. In Dynamical systems and turbulence, War-wick 1980, D. A. Rand and L. S. Young, eds. Springer-Verlag, Berlin, pp. 366–381.

[2] Sauer, T., Yorke, J. A., and Casdagli, M., 1991. “Em-


0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Fig. 19. Eq. (24) based FNN algorithms with k = 1 for Duffing andthe surrogate data with 5 % (a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h)additive noise levels. The gradual decrease in the fractions is precip-itated by incrementing r from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with circles reflect the correspond-ing surrogate data results.

bedology”. Journal of statistical Physics, 65(3-4),pp. 579–616.

[3] Molkov, Y. I., Mukhin, D., Loskutov, E., Feigin, A.,and Fidelin, G., 2009. “Using the minimum descriptionlength principle for global reconstruction of dynamicsystems from noisy time series”. Physical Review E,80(4), p. 046207.

[4] Pecora, L. M., Moniz, L., Nichols, J., and Carroll, T. L.,2007. “A unified approach to attractor reconstruction”.Chaos: An Interdisciplinary Journal of Nonlinear Sci-ence, 17(1), pp. 013110–013110.

[5] Kember, G., and Fowler, A., 1993. “A correlation func-tion for choosing time delays in phase portrait recon-structions”. Physics Letters A, 179(2), pp. 72–80.

[6] Fraser, A. M., 1989. “Reconstructing attractors fromscalar time series: a comparison of singular system andredundancy criteria”. Physica D: Nonlinear Phenom-ena, 34(3), pp. 391–404.

[7] Fraser, A. M., and Swinney, H. L., 1986. “Independentcoordinates for strange attractors from mutual informa-tion”. Physical review A, 33(2), p. 1134.

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Fig. 20. Eq. (24) based FNN algorithms with k = 10 for Duffing andthe surrogate data with 5 % (a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h)additive noise levels. The gradual decrease in the fractions is precip-itated by incrementing r from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with circles reflect the correspond-ing surrogate data results.

[8] Grassberger, P., Schreiber, T., and Schaffrath, C., 1991.“Nonlinear time sequence analysis”. InternationalJournal of Bifurcation and Chaos, 1(03), pp. 521–547.

[9] Broomhead, D. S., and King, G. P., 1986. “Extractingqualitative dynamics from experimental data”. PhysicaD: Nonlinear Phenomena, 20(2), pp. 217–236.

[10] Gibson, J. F., Doyne Farmer, J., Casdagli, M., and Eu-bank, S., 1992. “An analytic approach to practical statespace reconstruction”. Physica D: Nonlinear Phenom-ena, 57(1), pp. 1–30.

[11] Rosenstein, M. T., Collins, J. J., and De Luca, C. J.,1994. “Reconstruction expansion as a geometry-basedframework for choosing proper delay times”. PhysicaD: Nonlinear Phenomena, 73(1), pp. 82–98.

[12] Kugiumtzis, D., 1996. “State space reconstruction pa-rameters in the analysis of chaotic time seriesthe role ofthe time window length”. Physica D: Nonlinear Phe-nomena, 95(1), pp. 13–28.

[13] Cao, L., Mees, A., and Judd, K., 1998. “Dynamicsfrom multivariate time series”. Physica D: NonlinearPhenomena, 121(1), pp. 75–88.


0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Fig. 21. Eq. (24) based FNS algorithms for Lorenz and the surro-gate data with 5 % (a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h) additivenoise levels. The gradual decrease in the fractions is precipitated byincrementing r from zero to 20 by two (a,c,e,g) and s from zero to twoby 0.1 (b,d,f,h). Lines with circles reflect the corresponding surrogatedata results.

[14] Judd, K., and Mees, A., 1998. “Embedding as a mod-eling problem”. Physica D: Nonlinear Phenomena,120(3), pp. 273–286.

[15] Kim, H., Eykholt, R., and Salas, J., 1999. “Nonlin-ear dynamics, delay times, and embedding windows”.Physica D: Nonlinear Phenomena, 127(1), pp. 48–60.

[16] Small, M., and Tse, C. K., 2004. “Optimal embeddingparameters: a modelling paradigm”. Physica D: Non-linear Phenomena, 194(3), pp. 283–296.

[17] Maus, A., and Sprott, J., 2011. “Neural networkmethod for determining embedding dimension of atime series”. Communications in Nonlinear Scienceand Numerical Simulation, 16(8), pp. 3294–3302.

[18] Chatzinakos, C., and Tsouros, C., 2015. “Estimation ofthe dimension of chaotic dynamical systems using neu-ral networks and robust location estimate”. SimulationModelling Practice and Theory, 51, pp. 149–156.

[19] Kantz, H., and Schreiber, T., 2004. Nonlinear time se-ries analysis, Vol. 7. Cambridge university press.

[20] Abarbanel, H., 1996. Analysis of observed chaoticdata. Springer, New York.

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200

0.2

0.4

0.6

0.8

1

0 5 10 15 200.2

0.4

0.6

0.8

1

Fig. 22. Eq. (25) based FNS algorithms for Duffing and the surro-gate data with 5 % (a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h) additivenoise levels. The gradual decrease in the fractions is precipitated byincrementing r from zero to 20 by two (a,c,e,g) and s from zero to twoby 0.1 (b,d,f,h). Lines with circles reflect the corresponding surrogatedata results.

[21] Kennel, M. B., and Abarbanel, H. D., 2002. “Falseneighbors and false strands: A reliable minimum em-bedding dimension algorithm”. Physical review E,66(2), p. 026209.

[22] Kennel, M. B., Brown, R., and Abarbanel, H. D., 1992.“Determining embedding dimension for phase-spacereconstruction using a geometrical construction”. Phys-ical review A, 45(6), p. 3403.

[23] Cao, L., 1997. “Practical method for determiningthe minimum embedding dimension of a scalar timeseries”. Physica D: Nonlinear Phenomena, 110(1),pp. 43–50.

[24] Hegger, R., and Kantz, H., 1999. “Improved false near-est neighbor method to detect determinism in time se-ries data”. Physical Review E, 60(4), p. 4970.

[25] Kazem, A., Sharifi, E., Hussain, F. K., Saberi, M., andHussain, O. K., 2013. “Support vector regression withchaos-based firefly algorithm for stock market priceforecasting”. Applied Soft Computing, 13(2), pp. 947–958.

[26] Hanslmeier, A., Brajsa, R., Calogovic, J., Vrsnak, B.,


Ruzdjak, D., Steinhilber, F., MacLeod, C., Ivezic, Z.,and Skokic, I., 2013. “The chaotic solar cycle-ii. anal-ysis of cosmogenic 10be data”. Astronomy & Astro-physics, 550, p. A6.

[27] Wernitz, B., and Hoffmann, N., 2012. “Recurrenceanalysis and phase space reconstruction of irregular vi-bration in friction brakes: Signatures of chaos in steadysliding”. Journal of Sound and Vibration, 331(16),pp. 3887–3896.

[28] Hampson, K. M., and Mallen, E. A., 2012. “Chaosin ocular aberration dynamics of the human eye”.Biomedical optics express, 3(5), pp. 863–877.

[29] Lorenz, E. N., 1963. “Deterministic nonperiodic flow”.Journal of the atmospheric sciences, 20(2), pp. 130–141.

[30] Duffing, G., 1918. Erzwungene Schwingungen beiveranderlicher Eigenfrequenz und ihre technische Be-deutung. No. 41-42. R, Vieweg & Sohn.

[31] Sprott, J. C., and Sprott, J. C., 2003. Chaos and time-series analysis, Vol. 69. Oxford University Press Ox-ford.

[32] Schreiber, T., and Schmitz, A., 1996. “Improved surro-gate data for nonlinearity tests”. Physical Review Let-ters, 77(4), p. 635.

[33] Hegger, R., Kantz, H., and Schreiber, T., 1999. “Prac-tical implementation of nonlinear time series methods:The tisean package”. Chaos: An Interdisciplinary Jour-nal of Nonlinear Science, 9(2), pp. 413–435.

[34] Golub, G. H., and Van Loan, C. F., 2012. Matrix com-putations, Vol. 3. JHU Press.

[35] Theiler, J., 1991. “Some comments on the correlationdimension of 1/ f α noise”. Physics Letters A, 155(8),pp. 480–493.

[36] Theiler, J., 1990. “Statistical precision of dimensionestimators”. Physical Review A, 41(6), p. 3038.

[37] Gut, A., 2009. An intermediate course in probability.Springer.

List of Tables

1 Expected embedding parameters for the de-terministic data . . . . . . . . . . . . . . . . 3

List of Figures1 The basic idea of FNN is illustrated by a

three-dimensional closed curve (thick blueline) and its two-dimensional projection(thin dashed red line). The thin dottedlines show two-dimensional projections ontoother planes. The points that are false neigh-bors in the projection (A′ and B′) are far apartin higher dimensions (A and B), while thetrue neighbors (C′ and D′) are close neigh-bors (C and D). . . . . . . . . . . . . . . . . 2

2 Linear and nonlinear correlations in chaoticand stochastic data sets . . . . . . . . . . . . 3

3 Reconstructed phase portraits for Lorenz (a),Duffing (b), AR(0.02) (c), AR(0.05) (d),AR(0.2) (e), and AR(0.5) (f) . . . . . . . . . 3

4 Kennel fraction of FNN using Eq. (9) forLorenz (a), Duffing (b), Normal (c), Uniform(d), AR(0.02) (e), AR(0.05) (f), AR(0.2) (g),and AR(0.5) (h). The gradual decrease in theFNN fraction is precipitated by incrementingr from zero to 20 by two. Lines with circlesreflect the corresponding surrogate data results. 5

5 Reliable FNN fraction of Eq. (14) applied toLorenz (a), Duffing (b), Normal (c), Uniform(d), AR(0.02) (e), AR(0.05) (f), AR(0.2) (g),and AR(0.5) (h). The gradual decrease in theFNN fraction is precipitated by incrementings from zero to two by 0.1. Lines with circlesreflect the corresponding surrogate data results. 6

6 Reliable FNS fraction of Eq. (17) applied toLorenz (a), Duffing (b), Normal (c), Uniform(d), AR(0.02) (e), AR(0.05) (f), AR(0.2) (g),and AR(0.5) (h). The gradual decrease in theFNS fraction is precipitated by incrementings from zero to two by 0.1. Lines with circlesreflect the corresponding surrogate data results. 7

7 Reliable FNN (first column: a,c,e,g) andFNS (second column: b,d,f,h) algorithm ap-plied to Duffing data with 5 % (a,b), 10 %(c,d), 20 % (e,f), and 40 % (g,h) noise. Thegradual decrease in the fractions is precip-itated by incrementing s from zero to twoby 0.1. Lines with circles reflect the corre-sponding surrogate data results. . . . . . . . . 7

8 Expected value of nearest-neighbor dis-tances in d dimensions without (a) and with(b) temporarily decorrelated nearest neighbors 8

9 Expected value of distances between the(d + 1)-th coordinates of the nearest neigh-bors in d dimensions without (a) and with(b) temporarily decorrelated nearest neighbors 9


10 Expected value of nearest-neighbor dis-tances in d + 1 dimensions without (a) andwith (b) temporarily decorrelated nearestneighbors (the plot legends are not shown butthey are the same as in Figs. 8 and 9) . . . . . 9

11 Expected nearest neighbor distance 〈δi〉 ver-sus the embedding dimension d for Lorenz(a) and Duffing (b) data . . . . . . . . . . . . 9

12 Expected value of the Kennel ratio in d di-mensions without (a) and with (b) temporar-ily decorrelated nearest neighbors . . . . . . 10

13 Eq. (22) based composite fraction FNN al-gorithm with s = 0.5 for Lorenz (a), Duffing(b), Normal (c), Uniform (d), AR(0.02) (e),AR(0.05) (f), AR(0.2) (g), and AR(0.5) (h).The gradual decrease in the fractions is pre-cipitated by incrementing r from zero to 20by two. Lines with circles reflect the corre-sponding surrogate data results. . . . . . . . . 11

14 Eq. (22) based composite fraction FNN al-gorithm with r = 4 for Lorenz (a), Duffing(b), Normal (c), Uniform (d), AR(0.02) (e),AR(0.05) (f), AR(0.2) (g), and AR(0.5) (h).The gradual decrease in the fractions is pre-cipitated by incrementing s from zero to twoby 0.1. Lines with circles reflect the corre-sponding surrogate data results. . . . . . . . . 11

15 Eq. (25) based composite fraction FNS al-gorithm with s = 0.5 applied to Lorenz(a), Duffing (b), Normal (c), Uniform (d),AR(0.02) (e), AR(0.05) (f), AR(0.2) (g), andAR(0.5) (h). The gradual decrease in thefractions is precipitated by incrementing sfrom zero to two by 0.1. Lines with circlesreflect the corresponding surrogate data results. 12

16 Eq. (25) based composite fraction FNS al-gorithm with r = 10 applied to Lorenz(a), Duffing (b), Normal (c), Uniform (d),AR(0.02) (e), AR(0.05) (f), AR(0.2) (g), andAR(0.5) (h). The gradual decrease in thefractions is precipitated by incrementing rfrom zero to 20 by two. Lines with circlesreflect the corresponding surrogate data results. 12

17 Eq. (24) based FNN algorithms with k = 1for Lorenz and the surrogate data with 5 %(a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h) ad-ditive noise levels. The gradual decrease inthe fractions is precipitated by incrementingr from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with cir-cles reflect the corresponding surrogate dataresults. . . . . . . . . . . . . . . . . . . . . . 13

18 Eq. (24) based FNN algorithms with k = 10for Lorenz and the surrogate data with 5 %(a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h) ad-ditive noise levels. The gradual decrease inthe fractions is precipitated by incrementingr from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with cir-cles reflect the corresponding surrogate dataresults. . . . . . . . . . . . . . . . . . . . . . 13

19 Eq. (24) based FNN algorithms with k = 1for Duffing and the surrogate data with 5 %(a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h) ad-ditive noise levels. The gradual decrease inthe fractions is precipitated by incrementingr from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with cir-cles reflect the corresponding surrogate dataresults. . . . . . . . . . . . . . . . . . . . . . 14

20 Eq. (24) based FNN algorithms with k = 10for Duffing and the surrogate data with 5 %(a,b), 10 % (c,d), 20 % (e,f), 40 % (g,h) ad-ditive noise levels. The gradual decrease inthe fractions is precipitated by incrementingr from zero to 20 by two (a,c,e,g) and s fromzero to two by 0.1 (b,d,f,h). Lines with cir-cles reflect the corresponding surrogate dataresults. . . . . . . . . . . . . . . . . . . . . . 14

21 Eq. (24) based FNS algorithms for Lorenzand the surrogate data with 5 % (a,b), 10 %(c,d), 20 % (e,f), 40 % (g,h) additive noiselevels. The gradual decrease in the fractionsis precipitated by incrementing r from zeroto 20 by two (a,c,e,g) and s from zero to twoby 0.1 (b,d,f,h). Lines with circles reflect thecorresponding surrogate data results. . . . . . 15

22 Eq. (25) based FNS algorithms for Duffingand the surrogate data with 5 % (a,b), 10 %(c,d), 20 % (e,f), 40 % (g,h) additive noiselevels. The gradual decrease in the fractionsis precipitated by incrementing r from zeroto 20 by two (a,c,e,g) and s from zero to twoby 0.1 (b,d,f,h). Lines with circles reflect thecorresponding surrogate data results. . . . . . 15


reliable estimation of minimum embedding dimension through statistical analysis...

Documents