horizontal visibility graphs from integer sequences › ~lacasa › p49.pdf · keywords: visibility...

14
This content has been downloaded from IOPscience. Please scroll down to see the full text. Download details: IP Address: 161.23.112.136 This content was downloaded on 05/08/2016 at 09:08 Please note that terms and conditions apply. Horizontal visibility graphs from integer sequences View the table of contents for this issue, or go to the journal homepage for more 2016 J. Phys. A: Math. Theor. 49 35LT01 (http://iopscience.iop.org/1751-8121/49/35/35LT01) Home Search Collections Journals About Contact us My IOPscience

Upload: others

Post on 28-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

This content has been downloaded from IOPscience. Please scroll down to see the full text.

Download details:

IP Address: 161.23.112.136

This content was downloaded on 05/08/2016 at 09:08

Please note that terms and conditions apply.

Horizontal visibility graphs from integer sequences

View the table of contents for this issue, or go to the journal homepage for more

2016 J. Phys. A: Math. Theor. 49 35LT01

(http://iopscience.iop.org/1751-8121/49/35/35LT01)

Home Search Collections Journals About Contact us My IOPscience

Page 2: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

Letter

Horizontal visibility graphs from integersequences

Lucas Lacasa

School of Mathematical Sciences, Queen Mary University of London, Mile End Road,London E14NS, UK

E-mail: [email protected]

Received 17 May 2016, revised 30 June 2016Accepted for publication 13 July 2016Published 29 July 2016

AbstractThe horizontal visibility graph (HVG) is a graph-theoretical representation of atime series and builds a bridge between dynamical systems and graph theory.In recent years this representation has been used to describe and theoreticallycompare different types of dynamics and has been applied to characterizeempirical signals, by extracting topological features from the associated HVGswhich have shown to be informative on the class of dynamics. Among someother measures, it has been shown that the degree distribution of these graphsis a very informative feature that encapsulates nontrivial information of theseriesʼs generative dynamics. In particular, the HVG associated to a bi-infinitereal-valued series of independent and identically distributed random variablesis a universal exponential law ( ) ( )( )= -P k 1 3 2 3 k 2, independent of theseries marginal distribution. Most of the current applications have howeveronly addressed real-valued time series, as no exact results are known for thetopological properties of HVGs associated to integer-valued series. In thispaper we explore this latter situation and address univariate time series whereeach variable can only take a finite number n of consecutive integer values.We are able to construct an explicit formula for the parametric degree dis-tribution ( )P kn , which we prove to converge to the continuous case for large nand deviates otherwise. A few applications are then considered.

Keywords: visibility graphs, integer sequences, time series analysis

(Some figures may appear in colour only in the online journal)

1. Introduction

In recent years methods of network science have been applied to describe the structure of timeseries and signals, proposing mappings and transformations from series to graphs with the

Journal of Physics A: Mathematical and Theoretical

J. Phys. A: Math. Theor. 49 (2016) 35LT01 (13pp) doi:10.1088/1751-8113/49/35/35LT01

1751-8113/16/35LT01+13$33.00 © 2016 IOP Publishing Ltd Printed in the UK 1

Page 3: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

aim of making some sort of graph-theoretical time series analysis. Among other approaches[1–5], the family of visibility algorithms [6, 7] is a collection of recipes which map an orderedsequence of N numbers to a graph of N vertices where an edge between each two verticesexist if a certain geometric criterion is fulfilled in the sequence. Accordingly these methodshave been shown to be very fruitful to give a topological characterization of time series anddynamics. In particular, it has been shown that (i) both the structure of complex, irregular timeseries and nontrivial ingredients of its underlying dynamics are inherited in the topology ofthe visibility graphs, and therefore (ii) simple topological properties of the graphs can be usedas time series features for description and automatic classification purposes. Examples includea topological characterization of chaotic series (and routes to chaos) [12–14] or stochasticseries [8, 20, 22, 28], and the method has been used for the description and classification ofempirical time series appearing in physics [15–19, 23–25], physiology [26, 27], neuroscience[29] or finance [21, 30] to cite only a few examples.

In most of the practical applications, the time series under study are real-valued. As amatter of fact, the set of rigorous results that have appeared in recent years assume such thing.But how does the scenario changes when the time series under study can only take a finite setof integer values, or in other words, when the dynamics run over a finite field? From acombinatoric and number theoretic viewpoint (where integer sequences abound) this is aninteresting question on itself. It is also relevant from a dynamical viewpoint, as in areas suchas Markov chain theory, symbolic dynamics or arithmetic dynamics integer-valued this isindeed the correct setting. Finally, while in practical applications empirical time series areassumed to be real-valued there are nonetheless circumstances where the empirical time seriesare inherently integer-valued.

In this work we partially fill this gap and propose as a first study to explore the propertiesof horizontal visbility graphs (HVGs) associated to random and uncorrelated integer-valuedseries. We focus on the degree distribution of these graphs as this is a metric which has beenshown to be highly informative in the real-valued (continuous) case [31]. In this continuouscase it was proved [7] that uncorrelated random processes have a universal exponential degreedistribution, independent of the marginal distribution. Here we show that when the series

Figure 1. Sample time series of 20 (real-valued) data and its associated horizontalvisibility graph (HVG).

J. Phys. A: Math. Theor. 49 (2016) 35LT01

2

Page 4: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

takes values from { })¼ n1, 2, , the exponential law is only reached asymptotically (for ¥n ), and for finite n the distribution deviates. The rest of the paper goes as follows: in

section 2 we describe the method of HVG along with a few relevant properties and we outlinethe benchmark result obtained for real-valued series. In section 3 we make a describe ourfindings for the integer-valued case, which culminate with a closed expression for theparametric degree distribution. In section 4 we conclude.

2. Preliminaries

Let { }¼x x, , N1 , Îxi (finite or infinite field) be a sequence of N data. Its horizontal visibilitygraph HVG is defined as an undirected graph of N vertices, where each vertex is labelled incorrespondence with the ordered datum xi, so that x1 is related to vertex i=1, x2 to vertexi=2, and so on. Then, two vertices i, j (assume <i j without loss of generality) share anedge if and only if ( )< " < <x x x k i k jinf , , :k i j . This is an ordering criterion which can bevisualized in figure 1.

HVG is a non crossing graph as described in algebraic combinatorics [9, 10] and can beproved to be invariant under monotonic transformations in the series [11]. It is therefore anorder statistics of the associated process, which among other things means that the structure ofthis graph is not dependent upon the marginal distribution ( )f x . In particular, the followingresult was found [7] for the degree distribution of the HVG associated to white noise:

Theorem 1. (Continuous case) Consider a (bi-infinite) time series { }-x x x..., , , ,...1 0 1 ofidentically and independently distributed random variables extracted from a continuousmarginal distribution ( )f x . Then ( )" f x the associated HVG has a universal degreedistribution ( ) ( )( )= -P k 1 3 2 3 k 2.

Our aim in this paper is to study the analogous statement in the case where instead ofhaving Îx , the data take values over a finite field , which for simplicity we consider to bea subset of the integers. Let thus consider a (bi-infinite) time series { }-x x x..., , , , ...1 0 1 ofidentically and independently distributed random variables sampled from a uniform discrete

Figure 2. Semilog plot of the numerical values of ( )P kn for =n 2, 3, 2 , 2 ,2 3

2 , 2 , 2 , 2 , 2 , 2 , 24 5 6 7 8 9 10 extracted from series of =N 105 i.i.d. uncorrelated randomvariables { }x Î ¼ n1, , .

J. Phys. A: Math. Theor. 49 (2016) 35LT01

3

Page 5: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

distribution of n integers: { }x x" = ~ ¼t x n, , 1, 2, ,t , and let us call ( )P kn the degreedistribution of the associated HVG. In figure 2 we plot in semi-log scales the numericalestimate of ( )P kn for =n 2, 3, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 (in every case wehave generated a time series of 105 data). As n increases, we can see how ( )P kn approachesthe universal exponential shape found in the continuous case (red dashed line), this conv-ergence being from above for =k 2, 3 and from below for k 4. According to theorem 1,one should indeed expect ( ) ( )=¥P k P klimn n as for large n the marginal distribution of thetime series approaches a continuous form, however numerical evidence suggests that forsmall values of n the deviations from this law are large. In the rest of the paper we develop acombinatorial framework to explicitly compute ( )P kn for finite n.

3. Integer-valued series: theoretical derivation of Pn ðk Þ

We start by noting that ( )P kn is equivalent to the probability that an arbitrary node of thegraph (whose associated datum is for convenience denoted x0) has degree k, that is x0 hashorizontal visibility of exactly k other data. Among these k neighbours, there always exist twobounding data (at right and left-hand side respectively), as indeed the minimum degree isk=2. The remaining -k 2 data are located among the bounding data. These inner data canbe ordered (sorted by size) in only -k 1 different ways, and it is easy to prove that one canlabel each of the -k 1 configurations as = ¼ -C i k, 0, 1, , 2i where the index i determinesthe number of inner data placed at the left-hand side of x0 (that is, the number of inner datataking place ‘before’ in the time series). In other words, Ci is the configuration for which outof the free -k 2 visible inner data, i of them are placed before x0 and - -k i2 are placedafter x0. On top of this, note that an arbitrary number of hidden data can take place after eachinner datum. These hidden data don’t contribute to the degree but play an important role inthe computation of the degree probabilities. We therefore split ( )P kn accordingly

( ) [ ] ( )å==

-

P k P C . 1ni

k

nk i0

2

So far, the construction follows the one elaborated in the continuous case. Note however thatin the discrete case < ¥n , by construction not all Ci are admissible given a concrete value ofn. For instance, it is easy to see that for n=2, ( ) =>P C 0k i2 1 . Actually, it is easy to provethat given n, the largest admissible degree ( ) =k n n2max , and thus ( )> =P k n2 0n . Theserestrictions were not present in the continuous case and it can be proved that in the generalcase we have the following:

Lemma 1. Given n and k, after only counting admissible configurations equation (1) iseffectively reduced into

( ) [ ]( )

( )

å== - -

- -

P k P C .ni k n

n k

nk imax 1,0

min 1, 2

Proof. The proof is based on counting the minimal number of distinct symbols in thedistribution for a given configuration to be feasible. To be able to allocate i inner data and abounding data at the left-hand side of x0, one needs for those to be visible that +n i 1,

J. Phys. A: Math. Theor. 49 (2016) 35LT01

4

Page 6: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

therefore -i n 1 (this proofs the upper limit). Respectively for the right-hand side oneneeds - -n k i1 , therefore - -i k n 1. ,

Notice that the discrete nature of each random variable x precludes using integrals tocalculate probabilities, so the approach followed to prove theorem 1 is not valid here any-more. We split again the computation by conditioning each configuration to the value of x0.Applying Bayes theorem, we have

[ ] [ ∣ ] ( ) ( )å= = ==

P C P C x m S x m , 2nk im

n

nk i n1

0 0

where [ ∣ ]=P C x mnk i 0 is the probability of Ci taking place, conditioned to x0 taking aparticular value =x m0 (where [ ]Îm n1, ) and ( )=S x mn 0 is the probability that x0 takesindeed the value m. As we assume uniformly distributed random variables we have

( )" = =mS x m n1n 0 but this condition can be removed if needed. We defineP ( ) ≔ [ ∣ ] ( )= =m P C x m S x mnki nk i n0 0 . Now, again the fact that n is finite necessarily forbidssome events, effectively reducing the number of terms in equation (2). This is summarized inthe following proposition.

Lemma 2. Given n and k, after only counting admissible events equation (2) is effectivelyreduced into

P[ ] ( )( )å=

= + - -P C m .nk i

m i k i

n

nkimax 1, 1

Proof. To be able to allocate i inner data and a bounding data at the left-hand side of=x m0 , one needs for those to be visible that +m i 1. Respectively for the right-hand side

one needs - -m k i1 . ,

Once we have formally splitted the computation of ( )P kn into configurations and con-ditioned to different values of x0 we are ready to derive rigorously this parametrized dis-tribution. Before attempting to find a general expression for ( )P kn for illustrative purposes we

Figure 3. Sample diagrams for ( )P 2n , ( )P 3n and ( )P 4n .

J. Phys. A: Math. Theor. 49 (2016) 35LT01

5

Page 7: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

need to study some particular cases first. All over these examples we make use of lemmas 1and 2.

3.1. Illustrative examples: P2 ð2Þ, Pn ð2Þ, P3 ð3Þ, Pn ð3Þ and P4 ð4Þ

( )P 22 . For n=2, the time series only takes values from { }1, 2 , and ( ) =S m 1 22 . Theprobability for each degree can be easily computed in explicit form. There is only oneconfiguration with neither inner nor hidden data, just the seed and two bounding data (seefigure 3). We can condition x0 to be both 1 and 2, so trivially

P P( ) ( ) ( ) · · ·= + = + =P 2 1 21

21

1

2

1

2

1

25 8,2 220 220

where each contribution follows the structure of the diagram P (·) [ ][ ][ ]º B S B220 , where [ ]Band [ ]S denote the probability of the bounding and seed data respectively. Note that whenm=1 the bounding data will indeed ‘bound’ x0 regardless of their value.

( )P 2n . The result for ( )P 22 can be readily generalized for an arbitrary n. In this case,( ) =S m n1 and we can easily see that the probability of a bounding data is not simply n1

but will depend on the conditioning of x0, in such a way that for =x m0 one has

[ ] [ ( )] =+ -

B B mn m

n

1.

It is therefore easy to prove by induction that

( ) [ ( )][ ( )][ ] ( ) ( )å å= = + - =+ +

= =P B m S m B

nn m

n n

n2

11

2 3 1

6. 3n

m

n

nm

n

13

1

22

2

Note that ( ) =¥Plim 2 1 3n n and thus the discrete case converges to the continuous caseasymptotically (see figure 4 for a comparison with numerical values).

Figure 4. Log-linear plot of the numerical values of ( )P kn for <k 6 and=n 2, 3, 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2 , 22 3 4 5 6 7 8 9 10 extracted from series of =N 105 i.i.d.

uncorrelated random variables { }x Î ¼ n1, , . The dashed lines correspond to thevalues for the continuous case and solid points are the predictions of the theory. Thesolid lines for the cases =k 4, 5 are fittings to rational approximations

( ) a b g» + +P k n nn2.

J. Phys. A: Math. Theor. 49 (2016) 35LT01

6

Page 8: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

( )P 33 . According to the previous formula we find ( ) =P 2 14 273 . Now ( )P 33 is the resultof two configurations C0 and C1 which are actually symmetric, therefore

( ) ( ) ( ) ( )= + =P P C P C P C3 23 33 0 33 1 33 0 so we focus on C0 without loss of generality. Ask=3, in C0 we have an inner datum (and associated hidden structure) at the right-hand sideof x0 (see figure 3) and by virtue of lemma 2 m 2, thus

P P( ) ( ) ( )= +P C 2 3 .33 0 330 330

By construction

P ( ) [ ( )][ ( )][ ( )][ ( )] · · · ·å= = ==

¥⎜ ⎟

⎡⎣⎢

⎛⎝

⎞⎠

⎤⎦⎥B S IH B2 2 2 2 2

2

3

1

3

1

3

1

3

2

3

4

54,

k

k

3300

where: (i) the bounding datum can be either 2 or 3, thus has probability 2/3, (ii) we have puttogether the inner datum (which could only be equal to 1) and its hidden structure (anarbitrary number of data hidden by the inner datum). As this inner datum needs to be equal to1, then the hidden data can only take the value 1, thus contributes with a geometric series withcommon ratio 1/3. Conversely, for m=3 the bounding data can only take one value, theinner datum is free to take the values 1 or 2, and for this latter case its hidden structure can beformed by 1s and 2s. Accordingly we find

P ( ) [ ( )][ ( )][ ( )][ ( )] · · ·

· ·

å

å

= =

+ =

=

¥

=

¥

⎜ ⎟

⎜ ⎟

⎡⎣⎢

⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎤⎦⎥

B S IH B3 3 3 3 31

3

1

3

1

3

1

3

1

3

2

3

1

3

3

54.

k

k

k

k

3300

0

Altogether, ( ) =P 3 14 543 .( )P 3n We can again generalize the latter result for an arbitrary n, as ( ) ( )=P P C3 2n n3 0 ,

where for an arbitrary n according to lemma 2 we have P( ) ( )= å =P C mn mn

nki3 0 2 andP ( ) [ ( )][ ( )][ ( )][ ( )]=m B m S m IH n m B m,nki n , where it is easy to prove by induction that

[ ( )] ( ) å å å= =-=

-

=

¥

=

-⎜ ⎟

⎡⎣⎢

⎛⎝

⎞⎠

⎤⎦⎥IH n m S m

p

n n p,

1.n

p

m

k

k

p

m

1

1

0 1

1

Now this last expression can be summed up in terms of harmonic numbers. Altogether

( ) ( )å å=+ -

+ -

= =

-⎜ ⎟⎛⎝

⎞⎠P

n

n m

n n p

n n

n3

2 1 1 4 3 1

18, 4n

m

n

p

m

2

2

1

1 2

2

where the last approximation is valid for >n 4 (for =n 2, 3, 4 ( )P 3n take the values 1/4,14/54 and 49/192 respectively). Comparison with numerics is reported in figure 4. Note thatagain we have ( ) ( )= =¥P Plim 3 2 9 3n n , and therefore the discrete case again convergesto the continuous case for large n.

From the last particular case ( )P 33 we have also learned that the the probabilitycontribution of an inner-hidden data structure indeed also depends on the conditioning of x0,as a summation dependent on m emerges due to the fact that both the inner and the hiddendata are allowed to take different values depending on the conditioning on x0. As this is inrelation to both n and the number of inner variables, then it is sensible to write formally[ ] [ ( )]ºIH IH m n i, , , i.e. the probability of this structure is a function of the conditioning ofx0, n and the configuration Ci. The concrete dependence will be evident only after the nextparticular case, but for now we can give a formal expression for a general P ( )mnki as

J. Phys. A: Math. Theor. 49 (2016) 35LT01

7

Page 9: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

P ( ) [ ( )][ ( )][ ( )][ ( )][ ( )] ( )= - -m B m IH m n i S m IH m n k i B m, , , , 2 . 5nki n

( )P 4n . This is the next nontrivial case. By symmetry we have ( ) ( ) ( )= +P P C P C4 2n n n4 0 4 1 , butnow C0 and C1 are qualitatively different configurations. Whereas for C1 there is exactly oneinner datum at each side of the seed, in C0 we will find two concatenated inner data at theright-hand side of the seed. In what follows we will see that is has a dramatic effect, as

( ) ( )¹IH m n IH m n, , 2 , , 1 2. Let us consider the easier case C1 first. Formally, each side ofthe seed is independent and therefore

[ ] [ ( )]([ ( )][ ( )])å==

P C S m IH m n B m, , 1 ,nm

n

n4 12

2

where

[ ( )] [ ( )] ( ) å å åº = =-=

-

=

¥

=

-⎜ ⎟

⎡⎣⎢

⎛⎝

⎞⎠

⎤⎦⎥IH m n IH m n S m

p

n n p, , 1 ,

1.n

p

m

k

k

p

m

1

1

0 1

1

Therefore

[ ] ( )å å=

+ --= =

-⎡⎣⎢⎢

⎤⎦⎥⎥P C

n m

n n p

1 1.n

m

n

p

m

4 12

2

31

1 2

On the other hand, for C0 we have

[ ] [ ( )][ ( )][ ( )][ ( )]å==

P C S m IH m n IH m n B m, , 0 , , 2 ,nm

n

n4 03

2

where [ ( )] =IH m n, , 0 1 and m starts at m=3 according to lemma 2. The key ingredient ofthis configuration is of course [ ( )]IH m n, , 2 . To understand its structure, let us considern=4 for simplicity. Then after a bit of algebra

( ) ·

( ) · ·

·

å å

å å å å

å å

=

= +

+

=

¥

=

¥

=

¥

=

¥

=

¥

=

¥

=

¥

=

¥

⎜ ⎟ ⎜ ⎟

⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎜ ⎟ ⎜ ⎟

⎡⎣⎢

⎛⎝

⎞⎠

⎤⎦⎥

⎡⎣⎢

⎛⎝

⎞⎠

⎤⎦⎥

⎡⎣⎢

⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎤⎦⎥

IH

IH

3, 4, 21

4

1

4

1

4

2

4,

4, 4, 21

4

1

4

1

4

2

4

1

4

1

4

1

4

3

4

1

4

2

4

1

4

3

4.

k

k

k

k

k

k

k

k

k

k

k

k

k

k

k

k

0 0

0 0 0 0

0 0

The general structure of ( )IH m n i, , can be proved by induction. Intuitively, it is a sum ofterms where each term combines the product of i contributions where in each case the hiddenvariables can take a different number of possible values. Essentially, we are enumerating thedifferent possible arrangements of i inner data (and an arbitrary number of hidden data amongeach inner datum), where the inner data take values from { }¼ -m1, , 1 ( -m 1 is the upperbound as one needs to leave room for the bounding datum). With the extra condition that theseed takes the value m and that all inner data are visible by construction, for i inner data there

are ( )-mi

1 different ways of giving values to the inner data.

J. Phys. A: Math. Theor. 49 (2016) 35LT01

8

Page 10: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

3.2. An exact formula for Pn ðk Þ

Let

( ) ( )å= =-=

¥⎜ ⎟⎛⎝

⎞⎠f z

n

z

n n z

1 1. 6

k

k

0

Then one can show that

[ ( )] ( ) · ( ) ( ) ( )

( )

å å å å> = ==

-

= +

-

= +

-

= = +

- + -

- -

IH m n i f j f j f j f j, , 0 ,

7j

m

j j

m

j j

m i

il

i

j j

m i l

l1

1

1

2

11 2

1 1

1

i i l l1 2 1 1 1

where we need to define ≔j 00 and [ ( )] ≔=IH m n i, , 0 1. The general solution to theparametric degree distribution ( )P kn is then provided by lemmas 1 and 2 together withequations (5)–(7). We are thus ready to put this altogether:

( )

( ) ( ) ( )

( )

( )

( )å å

å å

=+ -

´

= - -

- -

= + - -

= = +

- + -

=

- -

= +

- + -

- -

⎡⎣⎢⎢

⎤⎦⎥⎥

⎡⎣⎢⎢

⎤⎦⎥⎥

P kn m

n

f j f j

1

, 8

ni k n

n k

m i k i

n

l

i

j j

m i l

ll

k i

j j

m i l

l

max 1,0

min 1, 2

max 1, 13

1 1

1

1

2

1

1

l l l l1 1

where ≔j 00 . This formula gives a recipe to compute ( )P kn for arbitrary values of n and k.Unfortunately we have not been able to find an algebraic enumeration and an associatedgeneric algebraic closed form for equation (8). On the other hand, for a fixed k we have seenthat one finds suitable rational functions such as equation (3) or (4). We conjecture

( ) ≔ ( )∣ ab g

=+ +

º + +f n P kAn Bn C

Dn n n,n k fixed

2

2 2

where Î +A B C D, , , , ( )( )a = = -A D 1 3 2 3 k 2, b = B D, g = C D. In figure 4 weplot this approximation (which is exact for =k 2, 3) for <k 6, showing a perfect agreementwith the numerics. We also find in that figure that for each k, the convergence to thecontinuous case ( )P k is faster as k increases.

3.3. Asymptotic approximation

An elementary asymptotic approximation for ( )P kn can be found using a simple combinatorialargument. For a fixed n, ( )P 2n can be understood as the probability that an arbitrary datum isbounded, so ( )- P1 2n is the probability that a given datum is not bounded by its firstneighbours. In the same line, for a fixed n, ( )P kn is the probability that an arbitrary datum hasat least visibility with -k 2 inner data—whose probability can be approximated to( ( ))- -P1 2n

k 2—which is then bounded. This sort of ‘Markovian’ approximation gives

( ) ( )( ( )) ( )[ ( ) ] ( )= - =+ + - + +-

-P k P P

n n n n n

n2 1 2

2 3 1 1 2 3 1 6

6, 9n n n

kk

app 22 2 2 2

2

where the last step involves taking the limit of large n. This is an algebraic closed formequation, however this formula is not exact as it is not taking into account that inner data arecorrelated (i.e. the values that each inner data can take depend on the position of the innerdata). Still, this approximation improves when n increases, and as the argument holds exactlyin the limit ¥n we have ( ) ( )=¥ ¥P k P klim limn n n n

app , hence ( )P knapp and ( )P kn are

asymptotic. On the other hand, taking the limit in equation (9) we also have

J. Phys. A: Math. Theor. 49 (2016) 35LT01

9

Page 11: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

( ) =¥

-⎜ ⎟⎛⎝

⎞⎠P klim

1

3

2

3,

nn

kapp

2

concluding that ( )P kn indeed converges to ( )P k . It is easy to see that ( )P knapp is essentially an

exponential approximation and, according to figure 2, gives good estimates of ( )P kn for thosevalues that comply to an exponential shape. In other words, the super-exponential cutoff thatdevelops for large k for each n is badly approximated by ( )P kn

app , however for the range ofvalues of k for which each distribution approaches an exponential decay, ( )P kn

app should givea good match. A confirmation of this is shown for a particular example where =n 26 infigure 5.

4. Concluding remarks and discussion

In recent years several rigorous results have been advanced within the theory of HVGs. Yet inall the cases the series under study was assumed to be real-valued. In this work we departfrom this assumption and study the properties of the degree distribution ( )P kn associated to arandom uncorrelated series { }x x, ,...1 2 that only takes a finite number of values Ì . Noteat this point that does not need to be in the form ( )¼ n1, 2, , : as the HVG is invariant undermonotonic transformations in the series, it is only required that ( ) = ¼b b b, , , n1 2 , where

= +-b b ci i 1 and Î +c .We have observed that for any finite n, ( )P kn deviates from the universal shape ( )P k

obtained in the continuous case, and confirmed analytically that ( ) ( )=¥P k P klimn n . As wehave seen, moving from infinite to finite fields makes the problem considerably more difficultand involved to address analytically, however we have been able to show in an explicit wayhow to analytically compute ( )P kn for an arbitrary n and k, although unfortunately we havenot found a closed algebraic form.

Figure 5. Log-linear plot of the numerical values of ( )P k64 extracted from series of=N 105 i.i.d. uncorrelated random variables { }x Î ¼ n1, , . The solid line

corresponds to the asymptotic approximation (equation (9) for n=64), which workswell for the exponentially decaying part of the distribuion.

J. Phys. A: Math. Theor. 49 (2016) 35LT01

10

Page 12: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

While the theory has been derived assuming (bi)infinite size series, we also foundgood convergence properties for finite sizes. As an illustration, we consider the abilityof this method to distinguish between a purely random, uncorrelated process (iid) anda deterministic, chaotic process generated by a fully chaotic logistic map =+xt 1

( ) [ ]- Îx x x4 1 , 0, 1t t . The power spectrum of both processes is flat (and therefore bothprocesses have delta-distributed autocorrelation functions), so this discrimination is nontriviala priori. As a matter of fact, it is well known that HVG easily distinguishes both processes astheir associated degree distribution is clearly different [7, 8]. In other words, in the limit ¥n , ( )P kn easily discriminates these processes. What happens if we compare symbolic

representations of both processes (namely, finite n)? To explore this, we proceed to constructan homogeneous partition of the interval [ ]0, 1 into n non-overlapping cells of equal size, andaccordingly we construct the integer series associated to a chaotic trajectory { }xt , and wecompare this series with a sequence of ‘unbiased coin tosses’ where the coin has n faces( { }x Î ¼ n1, 2, , . In figure 6 we compare the theoretical shape of ( )P kn associated to theunbiased i.i.d. process with the numerics obtained in the chaotic case, finding that while ( )P k2

seems to be identical in both processes, already ( )P k3 shows substantially deviations.Incidentally, in practice one can always construct statistical tests to investigate the

compliance to ( )P kn for experimental sequences of finite size N (such as the case for n=2above). For instance, a Pearson c2 statistic can be used

[ ( ) ( )]( )

( )( )

åc =-

=

=

Nf k P k

P k, 10

k

k n nn n

n

2

2

2 2max

where ( )f kn is the observed (estimated) frequency and ( )P kn is the theoretical frequency. c2 isthe Pearsonʼs cumulative test statistic, which asymptotically approaches a c2 distribution with

-n2 1 degrees of freedom. One could build an hypothesis test where the null hypothesis isthat the time series is uncorrelated, and apply this in a variety of empirical series andsituations, such as to explore the normality conjecture of numbers such as π, 2 etc. In the

Figure 6. ( )P kn extracted from series of (empty symbols) the theory obtained for i.i.d.integer random variables { }x Î ¼ n1, , and (solid symbols) a chaotic trajectory of 106

data points from ( )= -+x x x4 1t t t1 , after coarse-graining into a symbolic sequencewith n symbols via homogeneous partition of the phase space. Circles correspond to thecase n=2 while squares correspond to n=3. We observe that ( )P k2 fails todiscriminate both processes, ( )P k3 is already clearly different.

J. Phys. A: Math. Theor. 49 (2016) 35LT01

11

Page 13: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

example considered above it is not really necessary to use an hypothesis test for n=3 asthere is a clear deviation. For n=2 we can apply it: the theoretical values are ( ) =P 2 5 82 ,

( ) =P 3 1 42 , ( ) =P 4 1 82 and for >k 4 ( ) =P k 02 , so for a trajectory of =N 106 thestatistic gives c » 0.4132 thus we cannot reject the null hypothesis (i.e. the method cannotdiscriminate for n=2) as this value is much smaller than the critical ones.

The application of HVG to inherently discrete (symbolic) sequences in areas such as textanalysis in linguistics or DNA sequencing in bioinformatices to cite just a couple are potentialavenues for future research.

References

[1] Zhang J and Small M 2006 Complex network from pseudoperiodic time series: topology versusdynamics Phys. Rev. Lett. 96 238701

[2] Kyriakopoulos F and Thurner S 2007 Directed network representations of discrete dynamicalmaps Lecture Notes Comput. Sci. 4488 625–32

[3] Xu X, Zhang J and Small M 2008 Superfamily phenomena and motifs of networks induced fromtime series Proc. Natl Acad. Sci. USA 105 19601–5

[4] Donner R V, Zou Y, Donges J F, Marwan N and Kurths J 2010 Recurrence networks: a novelparadigm for nonlinear time series analysis New J. Phys. 12 033025

[5] Donner R V et al 2011 The geometry of chaotic dynamics—a complex network perspective Eur.Phys. J. B 84 653–72

[6] Lacasa L, Luque B, Ballesteros F, Luque J and Nuño J C 2008 From time series to complexnetworks: the visibility graph Proc. Natl Acad. Sci. USA 105 4972–5

[7] Luque B, Lacasa L, Ballesteros F and Luque J 2009 Horizontal visibility graphs: exact results forrandom time series Phys. Rev. E 80 046103

[8] Lacasa L 2014 On the degree distribution of horizontal visibility graphs associated to Markovprocesses and dynamical systems: diagrammatic and variational approaches Nonlinearity 272063–93

[9] Severini S, Gutin G and Mansour T 2011 A characterization of horizontal visibility graphs andcombinatorics on words Physica A 390 2421–8

[10] Flajolet P and Noy M 1999 Analytic combinatorics of non-crossing configurations Discrete Math.204 203–29

[11] Lacasa L and Flanagan R 2015 Time reversibility from visibility graphs of nonstationary processesPhys. Rev. E 92 022817

[12] Luque B, Lacasa L, Ballesteros F and Robledo A 2012 Analytical properties of horizontalvisibility graphs in the Feigenbaum scenario Chaos 22 013109

[13] Luque B, Núñez A, Ballesteros F and Robledo A 2012 Quasiperiodic graphs: structural design,scaling and entropic properties J. Nonlinear Sci. 23 335–42

[14] Núñez A M, Luque B, Lacasa L, Gómez J P and Robledo A 2013 Horizontal visibility graphsgenerated by type-I intermittency Phys. Rev. E 87 052801

[15] Aragoneses A, Carpi L, Tarasov N, Churkin D V, Torrent M C, Masoller C and Turitsyn S K 2016Unveiling temporal correlations characteristic of a phase transition in the output intensity of afiber laser Phys. Rev. Lett. 116 033902

[16] Murugesan M and Sujith R I 2015 Combustion noise is scale-free: transition from scale-free toorder at the onset of thermoacoustic instability J. Fluid Mech. 772 225–45

[17] Charakopoulos A, Karakasidis T E, Papanicolaou P N and Liakopoulos A 2014 The application ofcomplex network time series analysis in turbulent heated jets Chaos 24 024408

[18] Manshour P, Rahimi Tabar M R and Peinche J 2015 Fully developed turbulence in the view ofhorizontal visibility graphs J. Stat. Mech. P08031

[19] Liu C, Zhou W X and Yuan W K 2010 Statistical properties of visibility graph of energydissipation rates in three-dimensional fully developed turbulence Physica A 389 13

[20] Xie W J and Zhou W X 2011 Horizontal visibility graphs transformed from fractional Brownianmotions: topological properties versus the Hurst index Physica A 390 3592–601

[21] Qian M C, Jiang Z Q and Zhou W X 2010 Universal and nonuniversal allometric scaling behaviorsin the visibility graphs of world stock market indices J. Phys. A: Math. Theor. 43 335002

J. Phys. A: Math. Theor. 49 (2016) 35LT01

12

Page 14: Horizontal visibility graphs from integer sequences › ~lacasa › P49.pdf · Keywords: visibility graphs, integer sequences, time series analysis (Some figures may appear in colour

[22] Ni X H, Jiang Z Q and Zhou W X 2009 Degree distributions of the visibility graphs mapped fromfractional Brownian motions and multifractal random walks Phys. Lett. A 373 3822–6

[23] Donner R V and Donges J F 2012 Visibility graph analysis of geophysical time series: potentialsand possible pitfalls Acta Geophys. 60 3

[24] Suyal V, Prasad A and Singh H P 2014 Visibility-graph analysis of the solar wind velocity Sol.Phys. 289 379–89

[25] Zou Y, Donner R V, Marwan N, Small M and Kurths J 2014 Long-term changes in the north–south asymmetry of solar activity: a nonlinear dynamics characterization using visibility graphsNonlinear Process. Geophys. 21 1113–26

[26] Donges J F, Donner R V and Kurths J 2013 Testing time series irreversibility using complexnetwork methods Europhys. Lett. 102 10004

[27] Jiang S, Bian C, Ning X and Ma Q D Y 2013 Visibility graph analysis on heartbeat dynamics ofmeditation training Appl. Phys. Lett. 102 253702

[28] Lacasa L, Luque B, Luque J and Nuño J C 2009 The visibility graph: a new method for estimatingthe Hurst exponent of fractional Brownian motion Europhys. Lett. 86 30001

[29] Ahmadlou M, Adeli H and Adeli A 2010 New diagnostic EEG markers of the Alzheimerʼs diseaseusing visibility graph J. Neural Transm. 117 9

[30] Flanagan R and Lacasa L 2016 Irreversibility of financial time series: a graph-theoretical approachPhys. Lett. A 380 1689–97

[31] Luque B and Lacasa L Canonical horizontal visibility graphs are uniquely determined by theirdegree sequence (under review) arXiv:1605.05222

J. Phys. A: Math. Theor. 49 (2016) 35LT01

13