general strategy for hierarchical decomposition of multivariate time series

15
General Strategy for Hierarchical Decomposition of Multivariate Time Series: Implications for Temporal Lobe Seizures M. A. REPUCCI, N. D. SCHIFF, and J. D. VICTOR Department of Neurology and Neuroscience, Weill Graduate School of Medical Sciences of Cornell University, 1300 YorkAvenue, New York, NY (Received 19 October 2000; accepted 4 October 2001) Abstract—We describe a novel method for the analysis of multivariate time series that exploits the dynamic relationships among the multiple signals. The approach resolves the multi- variate time series into hierarchically dependent underlying sources, each driven by noise input and influencing subordinate sources in the hierarchy. Implementation of this hierarchical decomposition ~HD! combines principal components analysis ~PCA!, autoregressive modeling, and a novel search strategy among orthogonal rotations. For model systems conforming to this hierarchical structure, HD accurately extracts the underly- ing sources, whereas PCA or independent components analysis does not. The interdependencies of cortical, subcortical, and brainstem networks suggest application of HD to multivariate measures of brain activity. We show first that HD indeed re- solves temporal lobe ictal electrocorticographic data into nearly hierarchical form. A previous analysis of these data identified characteristic nonlinearities in the PCA-derived temporal com- ponents that resembled those seen in absence ~petit mal! sei- zure electroencephalographic traces. However, the components containing these characteristic nonlinearities accounted for only a small fraction of the power. Analysis of these data with HD reveals furthermore that components containing characteristic nonlinearities, though small, can be at the origin of the hierar- chy. This finding supports the link between temporal lobe and absence epilepsy. © 2001 Biomedical Engineering Society. @DOI: 10.1114/1.1424914# Keywords—Nonlinear dynamics, Autoregression, Epilepsy, Principal components, Independent components, Spatiotempo- ral, Electroencephalography, Electrocorticography. INTRODUCTION Multivariate time series are abundant in biomedical systems, from spatiotemporal signals such as the electro- encephalogram ~EEG!, to temporal patterns of gene ex- pression observed with GeneChip 21 technologies. One common approach for the analysis of multivariate data involves decomposition into several independent time se- ries ~e.g., by principal components analysis 4 !, followed by analysis of the individual time series ~e.g., autoregres- sive modeling, spectral analysis, and nonlinear techniques 20 !. Typically, the goal of the analysis is to extract the underlying biological sources of the multi- variate temporal data, in order to characterize each source more fully and to describe the system more pre- cisely. Unfortunately, the criterion of independence of sources enforced by standard decomposition techniques may not be appropriate for real biological systems, in which the underlying sources may well have dynamic interrelationships. With this in mind, we developed a novel approach to multivariate time series analysis— hierarchical decomposition ~HD!—that exploits the causal ~i.e., forward in time! dynamic content of multi- variate temporal data. In contrast to HD, standard decomposition techniques including principal components analysis ~PCA!, alone or modified by varimax rotations ~VR!, 4 independent com- ponents analysis ~ICA!, 2 and non-negative matrix factor- ization ~NMF!, 10 do not exploit causality. Briefly, PCA seeks sources that are uncorrelated ~orthogonal!. ICA seeks sources that are independent in an information- theoretic sense. VR seeks a transformation that maxi- mizes the sum of the variances in each extracted source. NMF attempts to find sources whose weights are non- negative ~i.e., only additive, not subtractive, combina- tions of sources are allowed!. These assumptions are appropriate for image analysis or for separation of inde- pendent time series ~e.g., the ‘‘cocktail party problem,’’ separating one voice from the cacophony of other con- versations and background noise!, but may be unsuitable for multivariate systems originating from dynamically interrelated sources. In particular, these methods assume that the order of data points is irrelevant, and thus would produce equivalent results for the original, time-reversed, or randomly shuffled data. As described below, the HD method takes advantage of the signal dynamics, and thus may achieve a more useful resolution of the sources of a multivariate time series. HD was designed with physiologically derived Address correspondence to Michael A. Repucci, Department of Neurology and Neuroscience, Weill Graduate School of Medical Sci- ence of Cornell University, 1300 York Avenue, New York, NY 10021. Annals of Biomedical Engineering, Vol. 29, pp. 1135–1149, 2001 0090-6964/2001/29~12!/1135/15/$15.00 Printed in the USA. All rights reserved. Copyright © 2001 Biomedical Engineering Society 1135

Upload: others

Post on 11-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: General Strategy for Hierarchical Decomposition of Multivariate Time Series

venue,

Annals of Biomedical Engineering,Vol. 29, pp. 1135–1149, 2001 0090-6964/2001/29~12!/1135/15/$15.00Printed in the USA. All rights reserved. Copyright © 2001 Biomedical Engineering Society

General Strategy for Hierarchical Decomposition of MultivariateTime Series: Implications for Temporal Lobe Seizures

M. A. REPUCCI, N. D. SCHIFF, and J. D. VICTOR

Department of Neurology and Neuroscience, Weill Graduate School of Medical Sciences of Cornell University, 1300 York ANew York, NY

(Received 19 October 2000; accepted 4 October 2001)

ofipsulti-ingnatecalis

tegyto

rly-lysandatere-arlyedm-

entnlyD

isticar-nd

y.

y,po

altro-

atase-

aroti-chre-ofuesinica—

es

-

ion-axi-rce.on--

ee-

on-

lyumeulded,

georee

ed

t ofci-

1.

Abstract—We describe a novel method for the analysismultivariate time series that exploits the dynamic relationshamong the multiple signals. The approach resolves the mvariate time series into hierarchically dependent underlysources, each driven by noise input and influencing subordisources in the hierarchy. Implementation of this hierarchidecomposition~HD! combines principal components analys~PCA!, autoregressive modeling, and a novel search straamong orthogonal rotations. For model systems conformingthis hierarchical structure, HD accurately extracts the undeing sources, whereas PCA or independent components anadoes not. The interdependencies of cortical, subcortical,brainstem networks suggest application of HD to multivarimeasures of brain activity. We show first that HD indeedsolves temporal lobe ictal electrocorticographic data into nehierarchical form. A previous analysis of these data identificharacteristic nonlinearities in the PCA-derived temporal coponents that resembled those seen in absence~petit mal! sei-zure electroencephalographic traces. However, the componcontaining these characteristic nonlinearities accounted for oa small fraction of the power. Analysis of these data with Hreveals furthermore that components containing characternonlinearities, though small, can be at the origin of the hierchy. This finding supports the link between temporal lobe aabsence epilepsy. ©2001 Biomedical Engineering Societ@DOI: 10.1114/1.1424914#

Keywords—Nonlinear dynamics, Autoregression, EpilepsPrincipal components, Independent components, Spatiotemral, Electroencephalography, Electrocorticography.

INTRODUCTION

Multivariate time series are abundant in biomedicsystems, from spatiotemporal signals such as the elecencephalogram~EEG!, to temporal patterns of gene expression observed with GeneChip21 technologies. Onecommon approach for the analysis of multivariate dinvolves decomposition into several independent timeries ~e.g., by principal components analysis4!, followed

Address correspondence to Michael A. Repucci, DepartmenNeurology and Neuroscience, Weill Graduate School of Medical Sence of Cornell University, 1300 York Avenue, New York, NY 1002

1135

is

s

-

-

by analysis of the individual time series~e.g., autoregres-sive modeling, spectral analysis, and nonlinetechniques20!. Typically, the goal of the analysis is textract the underlying biological sources of the mulvariate temporal data, in order to characterize easource more fully and to describe the system more pcisely. Unfortunately, the criterion of independencesources enforced by standard decomposition techniqmay not be appropriate for real biological systems,which the underlying sources may well have dynaminterrelationships. With this in mind, we developednovel approach to multivariate time series analysishierarchical decomposition~HD!—that exploits thecausal~i.e., forward in time! dynamic content of multi-variate temporal data.

In contrast to HD, standard decomposition techniquincluding principal components analysis~PCA!, alone ormodified by varimax rotations~VR!,4 independent com-ponents analysis~ICA!,2 and non-negative matrix factorization ~NMF!,10 do not exploit causality. Briefly, PCAseeks sources that are uncorrelated~orthogonal!. ICAseeks sources that are independent in an informattheoretic sense. VR seeks a transformation that mmizes the sum of the variances in each extracted souNMF attempts to find sources whose weights are nnegative ~i.e., only additive, not subtractive, combinations of sources are allowed!. These assumptions arappropriate for image analysis or for separation of indpendent time series~e.g., the ‘‘cocktail party problem,’’separating one voice from the cacophony of other cversations and background noise!, but may be unsuitablefor multivariate systems originating from dynamicalinterrelated sources. In particular, these methods assthat the order of data points is irrelevant, and thus woproduce equivalent results for the original, time-reversor randomly shuffled data.

As described below, the HD method takes advantaof the signal dynamics, and thus may achieve a museful resolution of the sources of a multivariate timseries. HD was designed with physiologically deriv

Page 2: General Strategy for Hierarchical Decomposition of Multivariate Time Series

enchplens,r-

ac-d

hatwe

hosom

, itss inced

thing

er-iveond

ad-cesertee

ar-ip-

ofsi-et-

l-

isro-os-o-

ateu-emor-dhatds;npoan

cing

ralhe

s,har-in

the

ri-ethatousat

latens

urestomi-or

asf

-odeionby,s

hir-of

d

ed

II

du/sist-of

s

1136 REPUCCI, SCHIFF, and VICTOR

multivariate time series in mind. Such time series oftresult from dynamically interrelated generators, in whisome of the generators have a driving role. For examthe EEG exhibits synchronizations, desynchronizatioand rhythmic oscillations resulting from cortical, subcotical, and brainstem network activity: these are charteristics anticipated in a dynamically interrelatesystem.18 To exploit these observations, we assume tthe generators are organized hierarchically. Namely,assume that there is an autonomous generator wstate depends only on noise input and feedback fritself ~i.e., an autoregressive process!. Each successivegenerator is influenced by an independent noise inputown state, and the state of more dominant generatorthe hierarchy, such that the last generator is influenby noise input and the state ofall generators. The HDmethod, as presented below, attempts to account fortemporal dynamics in a multivariate data set by resolvit into a model of this form.

We show that if a system does conform to this hiarchical structure, and the multivariate autoregressmodel includes sufficient terms, then the HD resolutiis unique and will recover the hierarchically relatesources. Moreover, the HD approach will not be misleing for an independent system; if the underlying sourare truly independent, the HD algorithm will uncovthem as well. On the other hand, there is no guaranthat all multivariate time series are reducible to hierchical form—one such example is a system with recrocally related generators. For such systems, failurethis approach to identify a hierarchical resolution is potive evidence for the presence of a more complex nwork structure.

In brief, the HD approach consists of building a mutivariate linear autoregressive~MLAR ! model6 from thePCA-derived components of the original data set. Thisfollowed by seeking a coordinate transformation thattates the MLAR model to make it as consistent as psible with a hierarchical interrelationship among compnents. We demonstrate that for simulated multivaritime series with truly hierarchical structure, HD accrately extracts the underlying generators of the systWe then use the HD method to analyze ictal electrocticographic~ECoG! records, the application that inspirethe HD approach. Analysis of these data reveals tsignificant hierarchical structure is present in all recorthis fact is not obvious by PCA or ICA decompositioalone. Subsequent analysis of these HD resolved comnents adds further insight to the dynamic nature ofictal discharge.

In previous work,15 we identified certain characteristidynamic nonlinearities in EEG traces obtained durabsenceseizures~also known aspetit mal seizures! andin the PCA-derived temporal components from tempolobe ictal ECoG data. The similarities, most notably t

,

e

e

.

-

nonlinear interactions at lags around 90 and 150 msuggested a common underlying mechanism. These cacteristic dynamics, however, were observed onlycomponents that accounted for a small fraction ofvariance of the original data~third, fourth, or fifth rank-ing components, when ordered by the amount of vaance explained!, thus leaving their biological significancuncertain. By contrast, the HD method demonstratesthese nonlinearities are present in the more autonomcomponents~i.e., the components in the system thdrive the remaining components!. This indicates that thesize of the components does not necessarily correwith their physiological importance, and thus strengthethe evidence that temporal lobe and absence seizshare a common mechanism. Here, the ability of HDresolve the underlying generators on the basis of dynacal interrelationships, rather than by independencepower, sheds light on a prior puzzle.

METHODS

Multivariate ECoG data were previously collectedbriefly described here~for further details, see Schifet al.15!. Epilepsy patients with medication-resistant temporal lobe seizures were studied with subdural electrgrids and strips in the course of presurgical evaluatfor temporal lobectomy. ECoG signals were recordeda 64-channel Telefactor Beehive~West ConshohockenPA! telemetry system~sampling rate of 200 Hz, low-pasfilter at 0.3 Hz and high-pass filter at 70 Hz!, and con-current video images of each patient were obtained. Tteen to 16 channels of clear artifact-free recordingsictal events~occurring during wakefulness! were resa-mpled at 100 Hz~2:1!; the mean was subtracted andetrending applied. Typically, 5 s of data recorded fromwithin an ongoing ictal event were analyzed as describin the text.

All data analyses herein were carried out viaMATLAB

functions and scripts~version 5.3, release 11, PentiumPC running Windows 2000!, which are archived, alongwith a sample data set, at http://www.med.cornell.eresearch/hds. We consider a multivariate data set coning of N evenly spaced observations in time, on eachM channels (M<N), represented as a matrixX (M3N). In our application to ECoG dataN5512 ~approxi-mately 5 s of data sampled at 100 Hz! and typicallyM514 ~epicortical recording sites!.

Principal Components Analysis (PCA)

The original data matrixX is processed by PCA~seefile ‘‘pca.m’’ archived in ‘‘HD–ABME–code.zip’’ athttp://www.med.cornell.edu/research/hds!, yielding a sec-ond matrixY (M3N), consisting of linear combinationof P orthogonal principal components,

Page 3: General Strategy for Hierarchical Decomposition of Multivariate Time Series

ts,alsncbert

ral-ts

ed

sestob-er-

totru-

or

toac

g ana

-sh

/

is

t a

dcerds

the

m

en

ns-

at

n-dy-

s

f-

of

-

--

1137Hierarchical Decomposition of Multivariate Time Series

Y5CTWT. ~1!

C is a P3M matrix whose rows are the spatial weighT is a P3N matrix whose rows are the temporweights, andW is a P3P diagonal matrix of eigenvaluewhose elements, squared, are the amount of variaexplained by each principal component. For any numof componentsP @P<min(M,N)#, PCA guarantees thathe unexplained variance betweenX and Y,

RPCA5trb~X2Y!~X2Y!Tc, ~2!

is minimized ~i.e., Y is the ‘‘best fit’’ to X!.This decomposition ofX into orthogonal~i.e., inde-

pendent or instantaneously uncorrelated! componentsC andT, however, is not unique. Replacing the tempocomponentsT by GT ~for any nonsingular linear transformation G! and, accordingly, the spatial componenC by W21(G21)TWC, leaves the matrix Yunchanged: @(W21(G21)TWC)T#W@GT#5CTWG21W21WGT5CTWT5Y, where we have usedbasic properties of the transpose, and the fact thatW isdiagonal. The specific, but arbitrary, solution identifiby PCA consists of matricesC and T whose rows areorthonormal, and a matrixW whose diagonal elementare non-negative and ordered from largest to smallResolving the nonuniqueness of the decompositiontained with Eq.~1! is the focus of the remainder of thHD analysis~i.e., finding a unique, nonarbitrary transfomation G!.

We use PCA as a first step in the analysis in orderreduce the dimension of the system by removing insmental noise from the signal~without removing biologi-cal noise12!. For real data, theP temporal componentsare selected such that each component accounts ffraction of variance greater than or equal toM 21, exceptwhere noted. Typically this allows us to reduce a 13-16-channel system to 3–5 components that togethercount for greater than 80% of the variance. Selectinsmall number of components reduces the computatioburden for the following analyses.

Multivariate Linear Autoregression (MLAR)

We next create a MLAR model of the temporal componentsT in a form equivalent to that proposed by Gerand Yonemoto ~see file ‘‘mlar.m’’ archived in‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds!.6 The P temporal components~i.e., eachelement Tp,n—nth sample of thepth component! aremodeled by anL-term autoregression:

Tp,n5Rp,n1mp1 (q51

P

(l 51

L

Aq,p,lTq,n2 l , ~3!

e

.

a

-

l

whereR is a P3N matrix of residual values,mp is themean value of thepth component, andA is a P3P3L three-dimensional array of model coefficients. Ithelpful to view the model coefficientsA as a set ofLplanes, each of which is aP3P matrix. Thel th plane ofA describes the estimated influence of the signals atime n2 l , on the signal values at timen. Namely,Aq,p,l

is the influence of theqth component at thel th time lagon the current value of thepth component.

The model coefficientsA are determined by minimiz-ing the sum of squared residual values~the innovations,Rp,n! in the model

RMLAR5 (p51

P

(n51

N

Rp,n2 , ~4!

via the Yule–Walker equations.22 We choose the maxi-mum lag L ~the order of the MLAR model! by theAkaike criterion ~AIC!,1 a statistical justification baseon whether the amount of reduction in residual varianis sufficient to justify the inclusion of additional lineamodel terms. In the examples below this typically yiela value ofL from 2 to 4 lags.

Decorrelation of the MLAR Innovations

The first step in resolving the nonuniqueness ofPCA decomposition relies upon theP3N matrix of in-novationsRp,n . These quantities are viewed as randoterms that drive theP channels of the MLAR model.Thus, if T represents distinct underlying generators, ththe corresponding driving terms in Eq.~3! will be un-correlated. For this reason, we require that the traformed temporal componentsGT decorrelateRp,n . Ad-ditionally, it is analytically convenient to assume theach channel of innovations has equal variance~i.e., noparticular source is noisier than another!. This amountsto an arbitrary choice for the overall size of each geerator, but makes no further assumptions about thenamical structure of the system.

Under these assumptions, we seek transformationKthat orthonormalize the innovationsRp,n—that is, (KR)(KR)T5KRRTKT5I , whereI is the identity matrix. Onesuch matrixK can be obtained by dividing the rows othe eigenvectors ofRRT by the square root of the corresponding eigenvalues.~The rows of the matrixK arenecessarily orthogonal, since they are scalar multiplesthe eigenvectors of the positive symmetric matrixRRT.!The transformationK applied to Eq.~3! yields new tem-poral componentsT85KT, new autoregressive coefficients Al8

T5KAlTK21 ~for l 51¯L, where Al

T denotesthe transpose ofAl!, and decorrelated, normalized innovations R85KR. Nevertheless, this does not fully resolve the nonuniqueness problem, since premultiplyingK

Page 4: General Strategy for Hierarchical Decomposition of Multivariate Time Series

l-

offor

ill

sedffi-y

as

re-lityer

the

by

a

n-ble

rs

is

f

toalfor-es,esthele

a-ehe

re,

-os-

atpro-

oreto

albewealnt

’’

r

/m-sasnt

1138 REPUCCI, SCHIFF, and VICTOR

by any orthogonal transformationQ ~i.e., Q such thatQQT5QTQ5I ! yields an alternative matrixG5QK,for which GR is likewise decorrelated and normaized: (GR)(GR)T5(QKR)(QKR)T5QKRRTKTQT

5QIQT5I .

Novel Search Among Orthogonal Rotations

At this point, we have reduced the nonuniquenessthe PCA decomposition to an arbitrary scale factoreach channel—the normalization performed above wsuffice—and any orthogonal transformationQ. We nowsearch for a rotation of the temporal componentsQT8that is consistent with the hierarchical structure propoin the introduction. Recall that the autoregressive coecients A8 from the decorrelated MLAR model specifthe dynamic interrelationships among theP componentsat up toL lags. Accordingly, the HD algorithm seeksP3P transformationQ that simultaneously transformall L matrices Al8 into upper-triangular form~see file‘‘rotation.m’’ archived in ‘‘HD–ABME–code.zip’’ athttp://www.med.cornell.edu/research/hds!. We requirethat Q be orthogonal, thereby guaranteeing that thesidualsRMLAR are unchanged and that the orthonormaof the innovationsR8 is preserved. The search is furthrestricted to the set of [email protected]., det(Q)51# to avoidarbitrary changes in the sign of an odd number ofcomponents.

Q is obtained iteratively via a procedure motivatedJacobi matrix diagonalization.11,13 We begin with an ar-bitrary rotationQ0 ~see below!, and successively applysequence of individual plane rotationsJ, each about apair of axes@u,v#. The sequence of plane rotations cosists of multiple cycles through each of the possichoices of axis pairs. That is,

Q5Jud ,vd~u f d!¯Ju1 ,v1

~ud11!Jud ,vd~ud!

¯Ju1 ,v1~u1!Q0 , ~5!

whered5P(P21)/2 is the number of unique axis pai~i.e., for a 333 matrix d53, for a 434 matrix d56,and so on!, f is the number of cycles through all axpairs, and eachJu,v(u) is a rotation by an angleu aboutaxes@u,v#. For example,

J4,2~u!5F 1 0 0 0

0 cos~u! 0 sin~u!

0 0 1 0

0 2sin~u! 0 cos~u!

G . ~6!

At the nth stage in the iteration,u1 ,...,un21 are heldfixed, andun is determined by minimizing the sum o

squared elements below the diagonal~the residuals,RHD!in all L matricesAl8 ~see the Appendix!:

RHD5(l 51

L

(p52

P

(q51

p21

@~QAl8QT!p,q#2. ~7!

The algorithm terminates when a full cycle ofd itera-tions passes without further reduction ofRHD . In otherwords, termination occurs when theL matricesAl8 areconcurrently transformed to be as close as possibleupper-triangular form. If the generators of the originsystem can be cast in a hierarchical form, this transmation returns a new set of upper-triangular matricAHD ; for other systems, this procedure yields matricAHD that are as nearly upper triangular as possible, insense thatRHD has reached the minimum attainabvalue.

This process of minimization, by successive appliction of rotationsJu,v(u), in a manner analogous to thJacobi matrix diagonalization procedure, falls into tclass of ‘‘direction set’’ methods.13 Direction set methodsentail sequential one-dimensional minimizations—hedetermination ofu f d about the axis pair@ud ,vd#—of amultidimensional—here,d-dimensional—space. The sequence of axis pairs is chosen to cycle through all psible pairs of axes~i.e., for a 434 matrix, we used56 and the cycle@2, 1#, @3, 1#, @4, 1#, @3, 2#, @4, 2#, @4,3#, @2, 1#,...!. Numerical experimentation indicates ththere is no particular advantage to any sequence,vided that each axis pair is examined once per cycle.~Ingeneral, however, direction set methods tend to be mefficient when the gradient of the function is usedguide the sequence.13! As with any multidimensionalminimization, there is a risk of being trapped in a locminimum. While simulated annealing methods couldused to ensure that the global minimum is found,have found that it is sufficient simply to run severminimizations in parallel, each preceded by a differeinitializing rotationQ0 chosen from a collection of well-spaced rotations~see the Appendix and file ‘‘rotmesh.marchived in ‘‘HD–ABME–code.zip’’ at http://www.med.cornell.edu/research/hds!.

The rotationQ that attains the global minimum foRHD ~see file ‘‘globalmin.m’’ archived in‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds! is subsequently used to transform the teporal componentsT8, thereby identifying componentTHD5QT8, such that each component is, as nearlypossible, influenced only by itself and the more dominacomponents in the hierarchy.

Page 5: General Strategy for Hierarchical Decomposition of Multivariate Time Series

r-mena-

ysisly-

orermor

os

ef-e

edonsne

e

uretoof

elarele-

aed

r-

nal

heari-ed

n-a-re-r

-chi-

thee-for

r-nt-ical

ichar-odys-

ee-the

1139Hierarchical Decomposition of Multivariate Time Series

Nonlinear Analysis of HD Resolved Components

In our application of HD to ictal ECoG, we are inteested in the nonlinear features of the underlying tiseries identified by the HD procedure. We therefore alyzed the time seriesTHD:p ~for p51¯P! for nonlinearcharacteristics via nonlinear autoregressive anal~NLAR!, and compared them to the related NLAR anasis of the principal componentsTp ~for p51¯P!. TheNLAR method is discussed in detail elsewhere.16,17

Briefly, a 20-term linear autoregressive model feach time series is augmented by a single nonlinear tthat models a joint influence of the signal at two pritimes ~timesn2 j andn2k! on the signal at timen. ThecoefficientsB and C of this nonlinear model,

THD:p,n5RNLAR: p,n2Bp,02(l 51

20

Bp,lTHD:p,n2 l

2Cj ,kTHD:p,n2 jTHD:p,n2k ,

~ j 51¯20, k51¯ j !, ~8!

are then determined separately for each of the 210 psible choices of a single nonlinear term~i.e., each of the210 unique assignments ofj and k to pairs in the set$1,..., 20%! by minimizing RNLAR . This yields C, a 20320 symmetric matrix of nonlinear autoregressive coficients ~where each entry inC is resolved by separatmodels!.

The addition of each nonlinear termCj ,k to the au-toregressive model reduces the model varianceV by anamountDVj ,k in comparison with the variance associatwith the best 20-term linear AR model. Via an extensiof the AIC,19 we use NDV/V to assess whether thireduction in variance is significant. We also examiNDVj ,k /V as a function ofj and k; this ‘‘NLAR finger-print’’ summarizes the nonlinear dynamics of each timseries over the lags considered, up to second order.

Definitions of Some Useful Quantities

In assessing the degree of hierarchical structpresent in any set of MLAR coefficients, it is usefuldefine four quantities as follows. For any such setMLAR model coefficientsF ~whereF may beA, A8, orAHD as below!, the sum of all squared elements is

RTOTAL5(l 51

L

(p51

P

(q51

P

Fp,q,l2 . ~9!

The size of the decoupled portion of the MLAR mod~i.e., the extent to which the components’ dynamicsindependent! is summarized by the sum of squared ements on the diagonal in allL matricesFl :

-

RDIAG5(l 51

L

(p51

P

Fp,p,l2 . ~10!

The size of the hierarchical dynamic relationships inMLAR model is summarized by the sum of squarelements above the diagonal in allL matricesFl :

RUPPER5(l 51

L

(p51

P21

(q5p11

P

Fp,q,l2 . ~11!

The size of the dynamic relationships that are ‘‘antihiearchical’’ ~i.e., neither hierarchical nor independent! isequal to the sum of squared elements below the diagoin all L matricesFl :

RLOWER5(l 51

L

(p52

P

(q51

p21

Fp,q,l2 . ~12!

Thus, RTOTAL5RDIAG1RUPPER1RLOWER.These four quantities will be used to compare t

degree of independent and hierarchical drive in the vous MLAR models at each stage of analysis describabove.RTOTAL before and after the transformation idetified by HD must be unchanged, since this transformtion is orthogonal. However, the stage of noise decorlation may changeRTOTAL because of the scale factoimplicit in the transformationK. For this reason, we willtypically focus on the quantitiesRDIAG , RUPPER, andRLOWER, as fractions ofRTOTAL , and refer to these fractions as the independent, hierarchical, and antihierarcal drives, respectively.

We draw the readers attention to the fact thatarbitrary order chosen for the principal and noisdecorrelated components will define one set of valuesRUPPER and RLOWER for each model, and that any reodering of those components potentially yields differesets of values. Thus, theP! permutations of those components may change the apparent percent of hierarchdrive in the system. To distinguish the degree to whany such permutation will lead to a more or less hierchical model versus the degree to which the HD methuncovers the inherent hierarchical structure in the stem, we shall use the maximum value ofRUPPER—whereRUPPER is considered for all permutations of thcomponents—for all models of the principal and noisdecorrelated components. This necessarily results inminimum possible value forRLOWER, and guaranteesthat RUPPER>RLOWER.

Page 6: General Strategy for Hierarchical Decomposition of Multivariate Time Series

1140 REPUCCI, SCHIFF, and VICTOR

FIGURE 1. HD analysis of a three-generator hierarchical system constructed from a two-lag MLAR model. „A… Time series of thethree hierarchical generators, showing one, two, and three dominant frequencies, respectively. „B… Sixteen time series com-posed of random linear combinations of the three time series in „A…. „C… Time series of the first three principal componentsobtained from „B…. „D… Time series of the three resolved hierarchical components obtained by rotation of the principal compo-nents. The ability of the HD method to extract hierarchical sources is evidenced by the virtually identical time series „disre-garding signal polarity … in „A… and „D…, as compared to the scrambled and dissimilar principal components in „C….

diateree

are-s ins

in/too

om

ntslsopo-

hi-aln-imeali-se

ndl

Inthetoe

ent:cy,ree

e–

sti-

htm-

Dis-

RESULTS

HD Analysis of Simulated Multivariate Time Series

To demonstrate proof of principle of the HD methowe present several analyses of simulated multivarsystems. We first consider systems composed of thgenerators driven by Gaussian noise and linked inhierarchical relationship. For the first system, the autogressive model has nonzero coefficients up to two lagthe past~i.e., L52!; this situation permits unambiguouidentification of the underlying generators~this data setand the code to reproduce Fig. 1 is archived‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds!. We then limit the autoregressive modela single lag ~i.e., L51! and show that this leads tdegeneracy in the possible minimizing rotationsQ, andconsequent lack of uniqueness in the hierarchical decposition. Finally, we examine a system of threeindepen-dent generators, with nonzero autoregressive coefficieup to L52; this example demonstrates that HD is acapable of extracting dynamically independent comnents.

To create the first simulated system of three hierarccally related time series, we fix a two-lag hierarchicMLAR model, and drive each generator with indepedent Gaussian white-noise input, generating three tseries of lengthN5512. Coefficients above the diagonin the MLAR matrices—which correspond to hierarchcal influences between channels—were randomly cho

-

n

from a Gaussian distribution with a mean of zero aunit variance.~It can be shown that for a hierarchicasystem off-diagonal coefficients do not affect stability.our simulations, stability was ensured by choosingMLAR coefficients on the diagonal to correspondpolynomials of orderL whose roots have absolute valuless than unity.9! Figure 1~A! shows the three resultingtime series~the simulated generators!. The influence ofeach generator on the subordinate generators is evidthe first generator exhibits a single dominant frequenwhile the second and third generators have two and thdominant frequencies, respectively.~Analysis of thepower spectra confirms this.! Estimation of the MLARcoefficients for the simulated generators via the YulWalker procedure will approximate~but not equal! thevalues used to construct the system. That is, the emated coefficients yieldRLOWER50.0003, a good ap-proximation to the intendedRLOWER50. By comparison,RDIAG58.0316 andRUPPER54.7208, indicating;63%independent drive~i.e., RDIAG /RTOTAL'0.63! and;37%hierarchical drive ~i.e., RUPPER/RTOTAL'0.37! in theoverall signal.

To simulate the kind of mixing of sources that migoccur in EEG recordings, we take 16 random linear cobinations of the three generators@see Fig. 1~B! and thefile ‘‘lincomb.m’’ archived in ‘‘HD –ABME–code.zip’’ athttp://www.med.cornell.edu/research/hds#. PCA decom-position of these 16 time series—the first step in the Hanalysis—yields components that are remarkably d

Page 7: General Strategy for Hierarchical Decomposition of Multivariate Time Series

es,e

-teeint,a-

theeR

sre-

ity.thenc

can

os-re-anyof

tiom

to

ai-i-

allyeds

in,ors

e

e-tes

tedic

gn

an

he

ian--

y

he

larttrixarn-

1141Hierarchical Decomposition of Multivariate Time Series

similar to the original generators@see Fig. 1~C!#. Eachprincipal component contains a mixture of frequenciand the MLAR model of them gives no hint of thhierarchical structure~RTOTAL522.9593,;64% indepen-dent and;24% hierarchical drive!. Because these signals are Gaussian, decorrelation by PCA guaraninformation-theoretic independence at each time poand is thus equivalent to ICA. In other words, examintion of the higher-order momentsat zero lagwould nothelp to reveal the hierarchical structure. By contrast,explicit use of signal dynamics allows HD to resolvhierarchically related components: the resulting MLAmodel is characterized by;64% independent and;36%hierarchical drive (RTOTAL512.5284), close to the valueobtained from the original generators. Strikingly, thesolved components@see Fig. 1~D!# are virtually identicalto the original generators, other than signal polar~Amplitudes of the extracted components also matchoriginal generators because we chose equal-varianoises as the driving terms.!

For a three-dimensional system such as this, wevisualize the search for the rotationQ ~see the Methodssection! by creating a representation of the space of psible three-dimensional rotations. To construct this repsentation of the rotation space, we recognize thatthree-dimensional rotation is specified by an anglerotation ~u, from 0 to p rad! around a unique axis inspace. We can thus represent a three-dimensional rotaas a pointr in a solid sphere, where the direction frothe center of the sphere tor indicates the axis of therotation, and the length of this vector representsu. Fig-ure 2 uses this representation of the rotation spaceexamine the global behavior of the residual valuesRHD

for the example outlined in Fig. 1. Notice that there islocal minimum in the neighborhood of the global minmum that provides a potential trap for the HD minimzation algorithm.

The second example consists of three hierarchicrelated generators, but one in which the predefinMLAR model stipulates only single-lag influence~RTOTAL511.8557,;7% independent and;93% hierar-chical drive!. Figure 3 shows the time series (N5512)for this example in the same format as Fig. 1. AgaPCA is unable to resolve the hierarchical generat~RTOTAL51.6037,;73% independent and;23% hierar-chical drive!. By contrast, the HD algorithm uncovers thhierarchical components~RTOTAL511.8533,;8% inde-pendent and;92% hierarchical!, as is evidenced by thesimilarity between Figs. 3~A! and 3~D!. However, thehierarchical decomposition is not unique: the global bhavior of the residuals, as shown in Fig. 4, demonstrathe existence of multiple global minima.

It can be shown that this nonuniqueness is expecfor single-lag hierarchical MLAR models. For a generP-component system, there areP! rotationsQ ~not lim-

s

e

nited to permuting the generators or changing their si!that minimizeRHD . These rotations correspond to theP!possible orderings of eigenvalues in the matrixAHD . ~Tosee this, we observe that transformingA8 into upper-triangular form by a rotation amounts to choosingappropriate orthonormal basis to form the columns ofQ.When hierarchical generators exist, application of tGram–Schmidt procedure to the eigenvectors ofA8 pro-vides such an orthonormal basis. This is because trgular form for a matrix implies a full set of real eigenvalues and eigenvectors.11 Each of theP! orderings ofthe P eigenvectors yields a different basis via the GramSchmidt procedure, and hence, a distinct rotationQ anda distinct set ofL triangular matricesAHD: l5QAl8Q

T.!This degeneracy does not occur withL>2, since rota-tions Q that makeQA18Q

T triangular do not necessarilmakeQA28Q

T triangular.An extension of this analysis allows us to state t

conditions under which a set of matricesA1 ,A2 ,...,AL

can be simultaneously transformed to upper-trianguform by an orthogonal matrixQ. First, each matrix mushave a full set of real eigenvalues, so that each macan be individually transformed into upper-triangulform.11 Second, it must be possible to order the eigevectors for each matrix in such a way that~a! the firsteigenvector of each matrix is identical~other than a scalefactor!, and~b! for eachk<Q, the firstk eigenvectors of

FIGURE 2. Graphical representation of a portion of the rota-tion space for the HD analysis of the system displayed inFig. 1. Each rotation Q is specified by a point r in a solidsphere, where the direction from the center of the sphere tor indicates the axis of the rotation, and the length of thisvector represents u. The residual value RHD for each rotationis represented in color „see scale to the right …. In this portionof space „angle of rotation uÉ1.9478 rad …, we can see both aglobal minimum and a local minimum, which provides a po-tential trap for the HD algorithm.

Page 8: General Strategy for Hierarchical Decomposition of Multivariate Time Series

1142 REPUCCI, SCHIFF, and VICTOR

FIGURE 3. HD analysis of a three-generator hierarchical system constructed from a one-lag MLAR model. „A… Time series of thehierarchical generators. „B… Sixteen time series composed of random linear combinations of the three time series in „A…. „C…

Time series of the first three principal components obtained from „B…. „D… Time series of the three resolved hierarchicalcomponents obtained by rotation of the principal components. Although HD extracts hierarchical sources, the hierarchicaldecomposition for the single-lag case is not unique „see Fig. 4 ….

.ro-

e-siaRs-

m

n-the

each matrix span the samek-dimensional subspaceThese conditions fix the order of the eigenvectors, pvided that the number of lagsL.1.

The final simulated example consists of three indpendent generators, each driven by independent Gausnoise. To simulate independence, the two-lag MLAmodel has no off-diagonal terms. MLAR coefficients etimated from the time series (N5512) generated by thismodel @see Fig. 5~A!# indeed indicate essentially 100%

n

independent dynamics:RDIAG57.5851 and RTOTAL

57.5875. When PCA is applied to a set of 16 randolinear combinations of these generators@see Fig. 5~B!#,the extracted components see@Fig. 5~C!# depend in parton the sizes of the contributions of the individual geerators to each channel, and thus do not fully revealindependent dynamics~RTOTAL57.5888,;94% indepen-dent,;3% hierarchical, and;3% antihierarchical drive!.

FIGURE 4. Graphical representation of three portions of the rotation space for the HD analysis of the system displayed in Fig.3. The residual value RHD for each rotation Q is represented as in Fig. 2. The three panels are chosen to contain three of theglobal minima for RHD „uÉ0.1257 rad in panel A, uÉ0.3770 rad in panel B, and uÉ0.6283 rad in panel C …. This illustrates thedegeneracy in the rotation Q for single-lag models. „Small differences in the color of each minimum result from the rough, finitesampling required by the graphical presentation. …

Page 9: General Strategy for Hierarchical Decomposition of Multivariate Time Series

1143Hierarchical Decomposition of Multivariate Time Series

FIGURE 5. HD analysis of a three-generator independent system constructed from a two-lag MLAR model. „A… Time series of theindependent generators. „B… Sixteen time series composed of random linear combinations of the three time series in „A…. „C…

Time series of the first three resolved principal components. „D… Time series of the three HD-resolved independent components.Here, we see that the HD method is also capable of extracting dynamically independent sources.

e-ntsird

o-

thetors

ceedyIC

s.e

nly

ipanrem-

th

dye

by

here,

t

n-n-

u-s

Following application of HD@see Fig. 5~D!#, we findapproximately 100% independent drive (RTOTAL

57.5862). However, since there is no hierarchical dpendence in this example, the order of the componeextracted by HD is arbitrary—here, the second and thgenerators are interchanged in Fig. 5~D! compared withFig. 5~A!.

HD Analysis of Ictal ECoG Data

Here, we use HD to reanalyze the ictal electrocorticgraphic records examined by Schiff and co-workers.15 Inthe analysis of electroencephalographic data, neithernumber nor the time course of the underlying generais known in advance. Therefore, we select theP compo-nents that constitute most of the original signal varian~see the Methods section!, and are interested in both thdegree of the resolved hierarchical structure and thenamics of each component. We use the traditional A~Ref. 1! to determine the order of the MLAR modelApplication of a stricter AIC that doubles the relativweighting of the number of degrees of freedom19 typi-cally reduces the model orderL by 1 but does not oth-erwise significantly alter the results~data not shown!.Since individual seizure generators are commothought of as cyclic~i.e., roughly periodic in time!, it isimportant to point out that the hierarchical relationshbetween generators, which we seek here, is by no mein contradiction to this view. That is, generators that arelated to each other in a hierarchical fashion may theselves be cyclic in time.

-

s

Strikingly, the components resolved by HD are boproportionally more hierarchically influenced~comparecolumns labeled ‘‘RUPPER/RLOWER’’ in Table 1! andmoreindependently driven ~compare columns labele‘‘% RDIAG’’ in Table 1! than the components resolved bPCA. For example, the MLAR model coefficients of thprincipal components—A in Eq. ~3!—obtained from thethird seizure record of patient 2 are characterized;94.3% independent drive (%RDIAG) and ;4.4% hier-archical influence (%RUPPER), yielding a hierarchical toantihierarchical ratio (RUPPER/RLOWER) of 3.4 ~sixth row,first column, Table 1!. By contrast, the MLAR modelcoefficients of the hierarchical components (AHD) revealsthat ;96.2% of the drive is independent (%RDIAG),;3.6% is hierarchical (%RUPPER), and the ratio of hier-archical drive (RUPPER/RLOWER) is 16.3 ~sixth row, thirdcolumn, Table 1!. Some, but not all, of this simplificationof model structure is due to the reorganization of tmodel coefficients by the noise decorrelation proceduwhich yields model coefficientsA8. For this example,noise decorrelation alone yields;96.0% independen(%RDIAG) and ;2.5% hierarchical drive (%RUPPER).But, in most instances, noise decorrelation fails to idetify a proportionally more hierarchical organizatio~RUPPER/RLOWER is only 1.7!, whereas HD extracts components that are both more independent~largerRDIAG /RTOTAL , all records! and proportionally more hi-erarchical ~greater ratio ofRUPPER to RLOWER in 7/9records!, as detailed in Table 1. Moreover, the contribtion of each of theL coefficient matrices to the indice

Page 10: General Strategy for Hierarchical Decomposition of Multivariate Time Series

sact

ents

HDdi-

een

re

nat

bar.nent-by

lids,

-

ce

on-ents

Difi-far

ra-CAtoor

ds

eshi-herein

s ofntspectinthe

-otB

LE1.

Sum

mar

yof

the

perc

enta

geof

inde

pend

ent,

hier

arch

ical

,an

dan

tihie

rarc

hica

ldriv

efo

rth

eM

LAR

mod

elco

effic

ient

sof

the

prin

cipa

lcom

pone

nts

deco

mpo

sitio

n…,n

oise

-dec

orre

late

dP

CA

deco

mpo

sitio

n„A

8…,

and

hier

arch

ical

deco

mpo

sitio

n„A

HD…

for

allp

atie

ntre

cord

s.T

hepe

rcen

tage

sar

eob

tain

edby

norm

aliz

ing

the

valu

esfo

re

inde

pend

ent

„R

DIA

G…,

hier

arch

ical

„R

UP

PE

R…,

and

antih

iera

rchi

cal

„R

LOW

ER…

driv

eby

thei

rto

tal

„R

TOTA

L…

inea

chin

stan

ce,

here

labe

led

%R

DIA

G,

%R

UP

PE

R,

and

%R

LOW

ER

,sp

ectiv

ely.

The

unitl

ess

ratio

RU

PP

ERÕR

LOW

ER

refle

cts

the

rela

tive

prop

ortio

nof

hier

arch

ical

influ

ence

inth

esy

stem

.Per

cent

ages

for

RU

PP

ER

and

RLO

WE

Rfo

rA

and

A8

refle

cte

mos

thi

erar

chic

alor

derin

gof

the

resp

ectiv

eco

mpo

nent

s,as

outli

ned

inth

eM

etho

dsse

ctio

n,su

chth

atR

UP

PE

RLO

WE

R.A

llva

lues

are

roun

ded

toth

ene

ares

t0.

1fo

rar

ity.

Prin

cipa

lcom

pone

nts

Noi

se-d

ecor

rela

ted

com

pone

nts

Hie

rarc

hica

lcom

pone

nts

%R

DIA

G%

RU

PP

ER

%R

LOW

ER

RU

PP

ERÕR

LOW

ER

%R

DIA

G%

RU

PP

ER

%R

LOW

ER

RU

PP

ERÕR

LOW

ER

%R

DIA

G%

RU

PP

ER

%R

LOW

ER

RU

PP

ERÕR

LOW

ER

atie

nt1

Sei

zure

189

.68.

71.

75.

297

.21.

81.

11.

698

.51.

20.

43.

2S

eizu

re2

69.8

26.2

4.0

6.5

84.4

10.6

5.0

2.1

97.4

2.3

0.3

8.3

Sei

zure

384

.612

.82.

74.

888

.87.

73.

52.

295

.93.

40.

74.

7at

ient

2S

eizu

re1

82.5

11.9

5.6

2.1

84.1

10.8

5.1

2.1

88.3

9.9

1.8

5.4

Sei

zure

293

.26.

30.

611

.396

.72.

40.

92.

698

.11.

80.

114

.7S

eizu

re3

94.3

4.4

1.3

3.4

96.0

2.5

1.5

1.7

96.2

3.6

0.2

16.3

atie

nt3

Sei

zure

193

.64.

91.

53.

394

.44.

01.

62.

595

.44.

10.

58.

5S

eizu

re2

99.7

0.2

0.1

4.0

99.8

0.2

0.0

5.6

99.8

0.2

0.0

10.4

atie

nt4

Sei

zure

196

.32.

90.

83.

798

.80.

80.

41.

999

.70.

20.

013

.3

1144 REPUCCI, SCHIFF, and VICTOR

RUPPER andRLOWER are approximately equal in all case~data not shown!, perhaps as a consequence of the fthat the HD algorithm treats allL matrices equally.

As alluded to in the Introduction, examination of thnonlinear dynamics present in the extracted componeprovides another means to interpret the results of themethod. To assess the nonlinear dynamics in an invidual time series we use the NLAR fingerprint.19 Thisapproach identified common nonlinear dynamics betw3/s spike-and-wave seizure traces,16,17 and the principalcomponents derived from the partial complex seizurecords of patients 1 and 2.15 Figure 6 compares theNLAR analyses of the principal components~as analyzedin Ref. 15! with that of the hierarchical components. IFig. 6, the percent of variance of the original signal theach component explains is shown by the height of a~For PCA this necessarily decreases as the componumber increases.! The significance of the nonlinear dynamics identified in each component, as summarizedNDV/V, is shown by the corresponding point on a soline @the dashed line indicates significant nonlinearitieNDV/V54 ~Ref. 19!#. Whereas the principal components with the largestNDV/V are typically those com-ponents that explain only a small fraction of the varian~leaving their significance unclear!, HD analysis revealsthat the hierarchical components with the greatest nlinear structure can be the most autonomous compon~lowest numbered component!. This is seen in seizure 1of patient 1, and in seizures 1 and 3 of patient 2.

This latter record is particularly striking because Hanalysis uncovers a hierarchical component with signcant nonlinearities (NDV/V'8) even though none othe principal components display significant nonlinecharacteristics (NDV/V,4). We hypothesize that thisreflects a more effective separation of underlying genetors by HD, and that the components extracted by Preveal only diluted effects of the nonlinearity duemixing. Additionally, five components were retained fseizure 3 of patient 1~following Schiff et al.!15—this isan exception to the guidelines mentioned in the Methosection—because the fifth component has anNDV/V'5.2. Here, HD analysis with five components expossignificant nonlinear characteristics in the third hierarccal component, which accounts for nearly 50% of toriginal signal variance. Finally, we note that there adifferences in the analyses of individual records withand across patients, both in terms of the relative sizethe hierarchical components and in the HD componethat demonstrate the greatest nonlinearities. We susthat these differences reflect physiologic differencesthe events themselves rather than nonrobustness ofanalysis method, since~a! the records themselves appeared artifact free,~b! the results of the analysis are nsignificantly altered by small changes in the model ord

er,TA „

A th re th cl P P P P

Page 11: General Strategy for Hierarchical Decomposition of Multivariate Time Series

1145Hierarchical Decomposition of Multivariate Time Series

FIGURE 6. Summary of the NLAR analyses of the principal components „see Ref. 15 … and hierarchical components derived fromeach record of patients 1 and 2. The percent variance explained by each component is shown by the height of a bar, while themaximum NDVÕV for that component is shown by the corresponding point on a solid line. The dashed line in each graphindicates NDVÕVÄ4, the criterion we have used for detection of significant dynamical nonlinearities „see Ref. 19 …. Often HDanalysis reveals that the component with the greatest nonlinear structure is one of the more autonomous components. Patient2, seizures 1 and 3 are particularly striking examples of this.

Page 12: General Strategy for Hierarchical Decomposition of Multivariate Time Series

iso-

i-tale

rcep-

e

hisomrbi

t isbe

emhipiaat

as

edthe

iso

fi-hethehe

icgble

isar-

n-etntsys-sly

t

hisof aally,d

torson

reex-rsdoem

ul--

ear

rep-en-ten

imencethether

toosi-

Rn

iateip-ns,ur-byse

-m-hipchre

salngi-

1146 REPUCCI, SCHIFF, and VICTOR

and ~c! for systems in which the internal structureknown ~Figs. 1–5!, HD provides an accurate decompsition.

DISCUSSION

Hierarchical Decomposition (HD) Method

In summary, the ability of HD to uncover hierarchcally related components results from its fundamenuse of the dynamics inherent in any multivariate timseries. We use a MLAR model to simultaneously enfotwo conditions upon the components. The first assumtion is that the noise~i.e., residual values between thobserved behavior and the MLAR model! in each com-ponent is independent and of equal magnitude. Tmakes concrete the assumption that the stochastic cponents of the generators are independent, and it atrarily sets a scale for the size of each generator. Iimplemented in a single step, and can alwaysachieved, independent of the dynamics of the systThe second condition seeks a hierarchical relationsamong components. It is implemented iteratively, vsuccessive orthogonal rotations of the components totain a set of MLAR matrices that are as triangularpossible.

Using simulated systems of hierarchically organizgenerators, we show that HD is capable of extractingoriginal sources, whereas PCA or ICA cannot~Figs. 1and 3!. The hierarchical minimization proceduregraphically depicted in Figs. 2 and 4, and highlights twimportant caveats:~a! finding the global minimum forthe model ~i.e., the most triangular! requires avoidingvarious local minima, and~b! models with simple dy-namics~i.e., where only single-lag influences are signicant! are degenerate, having multiple global minima. Tlatter point underscores the fact that separation ofgenerators relies intimately on their dynamics. If tMLAR model is trivial ~i.e., L50, no significant dynam-ics in the individual components!, then the HD procedureis trivial, too, and the continuum of ambiguities intrinsto PCA remain. If the MLAR model contains one laonly, then the HD procedure can narrow down possimodels of the system to a discrete set of sizeP! ~whereP is the number of generators!. If the MLAR modelcontains two or more lags, then the HD decompositiontypically unique, and if the original system has a hierchical structure, the algorithm will identify it.

In these simulations, we know exactly how many geerators~i.e., three! contribute to each 16-channel data swe create. Moreover, the first three principal componenecessarily account for 100% of the variance of the stem. This highlights two potentially important limitationof the HD method. In real-world data, by selecting onthe most important principal components~see the Meth-ods section!, we make the explicit compromise of no

--

.

-

accounting for all of the variance of the signal~;20% inthe current application to EEG!. Although in principleHD analysis could be performed on the raw data, tcompromise is often necessary because identificationminimum in the extremely high-dimensional space ofpossible rotations is not likely to be robust. Additionallone might wonder how HD would behave if performeon fewer components than the number of generaknown to be present. Here, the results of HD analysisnumerical simulations provide reassurance~data notshown!, since main features of the decomposition apreserved. For example, three hierarchical sourcestracted by HD from a system built with four generatolook more similar to the four original generators thanany of the principal components, in so far as they seto be less arbitrarily mixed.

For many previous approaches to the analysis of mtivariate time series~including methods based exclusively upon PCA and related methods!, the underlyingrationale is that the observed signals represent linmixtures of independent sources~or generators!. In manysystems, such as the brain, the observed time seriesresent mixtures of sources that are not strictly indepdent, but rather dynamically influence one another, ofin stereotyped patterns.18 This is the rationale for thepresent approach, in which we separate multivariate tseries by assuming instead that the sources influeeach other in a hierarchical fashion. More generally,method can be adapted to seek interrelationships of ocanonical types~e.g., cyclic! by appropriate modificationof Eq. ~7!. Thus, the HD method may be consideredbe a special case of a large set of canonical decomption procedures.

The use of causality~i.e., the influence of prior ob-servations on the current observation! and its importancein the HD algorithm deserves emphasis. The MLAtechnique used by HD is implicitly dependent upocausal relations among the observations in a multivartime series data set. That is, the MLAR model descrtion of the data relies upon the order of the observatioas it describes the effect of prior observations upon crent observations. This description would be obscuredtime-reversing or shuffling the data. HD uses thecausal relationships to guide the search for a rotationQ~see the Methods section! that yields a hierarchical interrelationship among components. The hierarchy of coponents delineates a specific type of causal relationsin a data set, namely, that the current value for eacomponent is affected by prior values of itself and modominant components in the hierarchy.

This approach is in direct contrast to other cautechniques for multivariate data analysis, includiGranger causality,7 generalized autoregressive condtional heteroskedasticity~GARCH!,3 adaptive multivari-ate autoregressive modeling~AMVAR !,5 and directed

Page 13: General Strategy for Hierarchical Decomposition of Multivariate Time Series

mek foionorm

to

saldilyes.etsre-

ifs.

tureofof

HD

-vi-heon-r,en

msfor

o-en

ofnts

ARncengelyo-ar.verhey

p-naloreoftor

aer-on-hein-

loralstionly-

50ce

llyheinbale of

eit

Theo-al-

ey

synednt,

ionni-is72,

s-

de-n

x-imill

1147Hierarchical Decomposition of Multivariate Time Series

coherence.8 These methods assume that the original tiseries represents the unmixed sources, and then looand characterize the degree and type of causal relatamong the data. In contrast, HD assumes a general ffor the causal interaction, and uses this assumptiondirect the decomposition into sources.~Granger7 notesthat multivariate data sets can have a ‘‘spurious cauchain appearance,’’ and examples of this can be reaconstructed by linear mixing of independent time seriHowever, as pointed out above, for multivariate data swhose MLAR structure includes more than one lag,duction to hierarchical form is not guaranteed, andsuch a reduction is possible it is typically unambiguou!From this point of view, HD appears to be unique.

Application of HD to Ictal ECoG

The degree of hierarchical and independent strucin all components derived from ictal ECoG recordspatients 1–4 is summarized in Table 1. ExaminationTable 1 indicates that the components resolved byare both proportionally more hierarchical~i.e., see thecolumns labeled ‘‘RUPPER/RLOWER’’ under each type ofcomponent! and more independently driven~i.e., see thecolumns labeled ‘‘%RDIAG’’ under each type of component! than the components resolved by PCA. Also edent from Table 1 is the fact that reorganization of tcomponents by noise decorrelation is partially respsible for this simplification of model structure. Howevenoise decorrelation alone typically results in a more evdistribution of off-diagonal coefficients~i.e., values forRUPPER/RLOWER that are closer to 1!. In addition, noticethat the proportion of hierarchical drive in these systerevealed by HD is sometimes quite large, particularlyseizures 2 and 3 of patient 2~RUPPER/RLOWER equals14.7 and 16.3, respectively!. Nevertheless, there is alsvariability, both from patient to patient and within patients, in the degree of hierarchical relationship betwethe underlying seizure generators.

Previously, we showed that PCA decompositiontemporal lobe seizure records uncovers componewhose nonlinear dynamics, as characterized by NLfingerprints, resemble those of the EEG during abseseizures.15 Because the principal components exhibitisignificant nonlinear characteristics account for relativsmall fractions of the original signal variance, the bilogical significance of the nonlinear signal was uncleIn contrast, HD analysis of these same records uncocomponents containing the same nonlinearities, but tcan appear relatively early in the hierarchy~see Fig. 6!.The position of these nonlinearities in the hierarchy apears to be independent of the amount of original sigvariance that these components explain. FurthermHD analysis may clarify the biological significancethese nonlinearities: they possibly reflect deep genera

rs

-

s

,

s

that contribute little to the surface EEG, but affectlarge portion of the entire neuronal system. More genally, the HD analyses demonstrate that although the nlinearities are not always a major contribution to toverall signal variance, they nevertheless may exertfluence over all or most of the system.

In addition, the NLAR fingerprints of the hierarchicacomponents support the connection between templobe and absence seizures, and strengthen the suggethat both seizure types may result from similar undering neuronal mechanisms.14,15 Significant nonlinearitiesin these fingerprints appear in the vicinity of 90 and 1ms,14 as is seen in the NLAR fingerprints of absenseizure traces,17 and as reported earlier for the NLARfingerprints of the principal components.15 Notice, how-ever, that the nonlinearities tend to be more statisticasignificant in the hierarchical components than in tprincipal components. Our findings may help explawhy even focal temporal lobe seizures produce gloalterations in awareness and behavior: the importanca neuronal circuit in driving global brain activity~i.e., itsposition in the hierarchy! may be disproportionate to thfraction of the variance of the surface activity thatexplains, especially for generators that are deep.results of HD modeling are thus consistent with the ntion that temporal lobe and absence seizures reflectteration of selective circuit mechanisms that play a krole in forebrain integration.15 The application of the HDtechnique to ECoG records of temporal lobe epilepsuggests that there is an interpretive benefit gaiby extracting hierarchical, rather than independecomponents.

ACKNOWLEDGMENTS

The authors thank The Lederman Family Foundatand the Biomedical Engineering Program of Cornell Uversity for partial support of this research. M.A.R.supported by EY7138, N.D.S. is supported by NS021and J.D.V. is supported by EY9314 and EY7977.

APPENDIX

This Appendix provides additional details for two apects of the algorithm we have presented above~see theMethods section!. The description is keyed toMATLAB

code, archived in ‘‘HD–ABME–code.zip’’ at http://www.med.cornell.edu/research/hds, but does not fullyscribe the code itself~see embedded documentatiowithin the code for further details!. Where the code dif-fers slightly from the calculations presented in this eposition such differences are noted. We make no clathat this code is optimized but rather hope that it wprovide a useful starting point for other researchers.

Page 14: General Strategy for Hierarchical Decomposition of Multivariate Time Series

’’

s

ntsd

i-

ion

e

trixfs

c-1

ent

ti-

gtrichethet ofes,the

-ns

he

1148 REPUCCI, SCHIFF, and VICTOR

Determination of the Rotation Angle(un)

This calculation is carried out by the file ‘‘rotation.marchived in ‘‘HD–ABME–code.zip’’ at http://www.med.cornell.edu/research/hds.

For any square matrixA, transformation via a planerotation Ju,v(u) @see Eqs.~6! and ~7!# produces a newsquare matrixJu,v(u)AJu,v

T (u)5A8. This transformationonly alters the elements in both the rows and columnuandv in matrix A—whose elements are denotedai , j . Weare interested in minimizing the sum of squared elemebelow the diagonal inA8—whose elements are denoteai , j8 . Thus we require equations for those elementsai , j8below the diagonal~i.e., such thati . j ! that are alteredby the rotation.@For historical reasons, the code minmizes the sum of squared elementsabovethe diagonal ofA8T, and each rotation induces the transformatJu,v

T (u)ATJu,v(u)5A8T. BecauseJu,v(u) is antisymmet-ric this is fundamentally equivalent to performing throtation as described in Eq.~7!.#

For convenience we can divide the analysis of maentries below the diagonal ofA8 ~above the diagonal oA8T! into five groups of elements~depending on the axevaluesu andv, some of these groups may be empty!. Inwhat follows, sP$u,v%, p5min(u,v), r 5max(u,v), andn,p,q,r ,t: ~1! as,n8 , ~2! at,s8 , ~3! ar ,q8 , ~4! aq,p8 ,and~5! ar ,p8 . Elementary matrix algebra used in conjuntion with trigonometric identities shows that groupsand 2 will never affect the residual valueRHD @see Eq.~7!#. For group 1 elements

au,n82 1av,n82 5@au,n cos~u!1av,n sin~u!#21@av,n cos~u!

2au,n sin~u!#25au,n2 1av,n

2 ,

and similarly for group 2 elements,

at,u821at,v825@at,u cos~u!1at,v sin~u!#21@at,v cos~u!

2at,u sin~u!#25at,u2 1at,v

2 .

In contrast, elements in groups 3 and 4, when prestypically affect the residual valueRHD , as will the group5 element, which is always present.

For groups 3 and 4, we use the trigonometric identies cos2(u)50.510.5 cos(2u), sin2 (u)50.520.5 cos(2u),and 2 sin(u)cos(u)5sin(2u), and find:

ar ,q82 1aq,p82 5@ar ,q cos~u!2ap,q sin~u!#2

1@aq,p cos~u!1aq,r sin~u!#2

5ar ,q2 cos2~u!22ar ,qap,q cos~u!sin~u!

1ap,q2 sin2~u!1aq,p

2 cos2~u!

,

12aq,pap,rcos~u!sin~u!

1aq,r2 sin2~u!

5~ar ,q2 1aq,p

2 !cos2~u!

12~aq,paq,r2ar ,qap,q!cos~u!sin~u!

1~ap,q2 1aq,r

2 !sin2~u!

5~aq,pap,r2ar ,qap,q!sin~2u!

10.5~ar ,q2 1aq,p

2 2ap,q2 2aq,r

2 !cos~2u!

10.5~ar ,q2 1aq,p

2 1ap,q2 1aq,r

2 !.

This equation may be further simplified by combininthe sine and cosine terms into a single trigonomefunction. ~The addition of sine and cosine waves of tsame frequency yields another sinusoidal wave atsame frequency, whose amplitude is the square roothe sum of the squared amplitudes of the original wavand whose phase is the arctangent of the ratio ofamplitudes of the original waves.! Accordingly, we makethe substitutions E sin(f)5(aq,paq,r2ar,qap,q) andE cos(f)50.5(ar ,q

2 1aq,p2 2ap,q

2 2aq,r2 ) which, used in

conjunction with the trigonometric identity sin(a)sin(b)1cos(a)cos(b)5cos(a2b), leads to

ar ,q82 1aq,p82 5E cos~2u2f!10.5~ar ,q2 1aq,p

2 1ap,q2 1aq,r

2 !.~A1!

Additionally, the equation for the group 5 element simplifies in an analogous manner, using the substitutioV sin(c)50.5(ar ,r2ap,p) and V cos(c)50.5(ar ,p1ap,r),and the trigonometric identities above:

ar ,q8 5@ar ,p cos~u!1ar ,r sin~u!#cos~u!

2@ap,p cos~u!1ap,r sin~u!#sin~u!

5ar ,p cos2~u!

1~ar ,r2ap,p!cos~u!sin~u!

2ap,r sin2~u!

50.5~ar ,r2ap,p!sin~2u!

10.5~ar ,p1ap,r !cos~2u!

10.5~ar ,p2ap,r !

5V cos~2u2c!10.5~ar ,p2ap,r !.

Finally, we calculate the square of the equation for tgroup 5 element and are left with

ar ,p82 50.5V2 cos~4u22c!1V~ar ,p2ap,r !cos~2u2c!

10.25~ar ,p2ap,r !210.5V2. ~A2!

Page 15: General Strategy for Hierarchical Decomposition of Multivariate Time Series

omnd

yoein

ts,

,4.nc

3

-

/es

t-

i-

.

n

et-

g:p.

t-ials

ro-

dels.

et-

es.

li-1

by

m-

P.c2,

smlo-9th

-

-Im-

r-am

a-es.

ea-G.

i-p..in

an

inpot

1149Hierarchical Decomposition of Multivariate Time Series

Thus, we must minimize—as a function ofu—anexpression that consists of the above contribution frEq. ~A2! for the group 5 element, and a contributiofrom Eq. ~A1! for each pair of elements in groups 3 an4 that are present~elements in a pair are alwaysbothabsent orboth present!. Elements in groups 1 and 2 mabe neglected, since they contribute a constant that dnot depend onu, as may the constant terms presentEqs. ~A1! and ~A2!.

In our code, matrix entries inA that contribute to thetrigonometric functions governing group 5~see lines 81–89! and, when necessary, groups 3 and 4~see lines 91–104! are collected and combined into matrix constanincluding E andV, f, andc. The trigonometric equationto be minimized is assembled from Eq.~A2! for eachgroup 5 element~see lines 109–112! and, as neededfrom Eq. ~A1! for pairs of elements in groups 3 andWhen elements in groups 3 and 4 are present, the fution to be minimized may be quite large~see lines 125–127!, whereas the minimization is simpler when groupsand 4 are empty~see lines 128–131!. Finally, we use acustomized bisection routine to find the value foru thatminimizes the complete trigonometric function~see sub-function ‘‘trigmin,’’ lines 240–266!.

Creating Well-Spaced Initializing Rotations(Q0)

The initializing rotationsQ0 are determined as follows ~see file ‘‘rotmesh.m’’ archived in‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds!. For each possible choice of a pair of ax@u,v# we build plane rotations at two rotation anglesu562p/3 ~see lines 56–74!. Thus, for a system consising of P components there ared5P(P21)/2 possiblepairs of axes, and 2d possible initializing rotationsQ0 .We randomly order all possible initializing rotationsQ0

~see lines 76–78! and use one for each parallel minimzation performed@see Eq.~5! and associated text#.

REFERENCES

1Akaike, H. A new look at statistical model identificationIEEE Trans. Autom. Control19:716–723, 1974.

2Bell, A. J., and T. J. Sejnowski. An information-maximizatioapproach to blind separation and blind deconvolution.NeuralComput.7:1129–1159, 1995.

3Bollerslev, T. Generalized autoregressive conditional heroskedasticity.J. Econometr.31:307–327, 1986.

4Borg, I., and P. Groenen. Moder Multidimensional ScalinTheory and Applications. New York: Springer, 1997, p132–134 and 399–402.

5Ding, M., S. L. Bressler, W. Yang, and H. Liang. Shorwindow spectral analysis of cortical event-related potent

s

-

by adaptive multivariate autoregressive modeling: Data pcessing, model validation, and variability assessment.Biol.Cybern.83:35–45, 2000.

6Gersh, W. and J. Yonemoto. Parametric time series mofor multivariate EEG analysis.Comput. Biomed. Res10:113–125, 1977.

7Granger, C. W. J. Investigating causal relations by economric models and cross-spectral methods.Econometrica37:424–438, 1969.

8Kaminski, M. J., and K. J. Blinowska. A new method of thdescription of the information flow in the brain structureBiol. Cybern.65:203–210, 1991.

9Kay, S. M. Modern Spectral Estimation: Theory and Appcation. Englewood Cliffs, NJ: Prentice-Hall, 1988, pp. 14and 142.

10Lee, D. D., and H. S. Seung. Learning the parts of objectsnon-negative matrix factorization.Nature (London)401:788–791, 1999.

11Mirsky, L. An Introduction to Linear Algebra. New York:Dover, 1990, pp. 236–247 and 306–311.

12Mitra, P. P., and B. Pesaran. Analysis of dynamic brain iaging data.Biophys. J.76:691–708, 1999.

13Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B.Flannery. Numerical Recipes in C: The Art of ScientifiComputing. New York: Cambridge University Press, 199pp. 412–413 and 463–466.

14Repucci, M. A., N. D. Schiff, and J. D. Victor. Implicationfor the dynamics of temporal lobe seizures derived fromultivariate autoregressive analysis of electroencephagrams. Proceedings of the Society for Neuroscience 2Annual Meeting, Miami, FL~242.9!, 1999.

15Schiff, N. D., D. R. Labar, and J. D. Victor. Common dynamics in temporal lobe seizures and absence seizures.Neu-roscience (Oxford)91:417–428, 1999.

16Schiff, N. D., J. D. Victor, and A. Canel. Nonlinear autoregressive analysis of the 3/s ictal electroencephalogram:plications for underlying dynamics.Biol. Cybern. 72:527–532, 1995.

17Schiff, N. D., J. D. Victor, A. Canel, and D. R. Labar. Chaacteristic nonlinearities of the 3/s ictal electroencephalogridentified by nonlinear autoregressive analysis.Biol. Cybern.72:519–526, 1995.

18Steriade, M., P. Gloor, R. R. Llina´s, F. H. Lopes da Silva, andM.-M. Mesulam. Report of IFCN committee on basic mechnisms: Basic mechanisms of cerebral rhythmic activitiElectroencephalogr. Clin. Neurophysiol.76:481–508, 1990.

19Victor, J. D., and A. Canel. A relation between the Akaikcriterion and reliability of parameter estimates, with appliction to nonlinear autoregressive modeling of the ictal EEAnn. Biomed. Eng.20:167–180, 1992.

20Wei, W. W. S. Time Series Analysis: Univariate and Multvariate Methods. New York: Addison-Wesley, 1990, 478 p

21Yoshioka, K., F. Matsuda, K. Takakura, Y. Noda, KImakawa, and S. Sakai. Determination of genes involvedthe process of implantation: Application of GeneChip to sc6500 genes.Biochem. Biophys. Res. Commun.272:531–538,2000.

22Yule, G. U. On a method of investigating periodicitiesdisturbed series with special reference to Wolfer’s sunsnumbers.Philos. Trans. R. Soc. London, Ser. A226:267–298,1927.