general strategy for hierarchical decomposition of multivariate time series
TRANSCRIPT
venue,
Annals of Biomedical Engineering,Vol. 29, pp. 1135–1149, 2001 0090-6964/2001/29~12!/1135/15/$15.00Printed in the USA. All rights reserved. Copyright © 2001 Biomedical Engineering Society
General Strategy for Hierarchical Decomposition of MultivariateTime Series: Implications for Temporal Lobe Seizures
M. A. REPUCCI, N. D. SCHIFF, and J. D. VICTOR
Department of Neurology and Neuroscience, Weill Graduate School of Medical Sciences of Cornell University, 1300 York ANew York, NY
(Received 19 October 2000; accepted 4 October 2001)
ofipsulti-ingnatecalis
tegyto
rly-lysandatere-arlyedm-
entnlyD
isticar-nd
y.
y,po
altro-
atase-
aroti-chre-ofuesinica—
es
-
ion-axi-rce.on--
ee-
on-
lyumeulded,
georee
ed
t ofci-
1.
Abstract—We describe a novel method for the analysismultivariate time series that exploits the dynamic relationshamong the multiple signals. The approach resolves the mvariate time series into hierarchically dependent underlysources, each driven by noise input and influencing subordisources in the hierarchy. Implementation of this hierarchidecomposition~HD! combines principal components analys~PCA!, autoregressive modeling, and a novel search straamong orthogonal rotations. For model systems conformingthis hierarchical structure, HD accurately extracts the undeing sources, whereas PCA or independent components anadoes not. The interdependencies of cortical, subcortical,brainstem networks suggest application of HD to multivarimeasures of brain activity. We show first that HD indeedsolves temporal lobe ictal electrocorticographic data into nehierarchical form. A previous analysis of these data identificharacteristic nonlinearities in the PCA-derived temporal coponents that resembled those seen in absence~petit mal! sei-zure electroencephalographic traces. However, the componcontaining these characteristic nonlinearities accounted for oa small fraction of the power. Analysis of these data with Hreveals furthermore that components containing characternonlinearities, though small, can be at the origin of the hierchy. This finding supports the link between temporal lobe aabsence epilepsy. ©2001 Biomedical Engineering Societ@DOI: 10.1114/1.1424914#
Keywords—Nonlinear dynamics, Autoregression, EpilepsPrincipal components, Independent components, Spatiotemral, Electroencephalography, Electrocorticography.
INTRODUCTION
Multivariate time series are abundant in biomedicsystems, from spatiotemporal signals such as the elecencephalogram~EEG!, to temporal patterns of gene expression observed with GeneChip21 technologies. Onecommon approach for the analysis of multivariate dinvolves decomposition into several independent timeries ~e.g., by principal components analysis4!, followed
Address correspondence to Michael A. Repucci, DepartmenNeurology and Neuroscience, Weill Graduate School of Medical Sence of Cornell University, 1300 York Avenue, New York, NY 1002
1135
is
s
-
-
by analysis of the individual time series~e.g., autoregres-sive modeling, spectral analysis, and nonlinetechniques20!. Typically, the goal of the analysis is textract the underlying biological sources of the mulvariate temporal data, in order to characterize easource more fully and to describe the system more pcisely. Unfortunately, the criterion of independencesources enforced by standard decomposition techniqmay not be appropriate for real biological systems,which the underlying sources may well have dynaminterrelationships. With this in mind, we developednovel approach to multivariate time series analysishierarchical decomposition~HD!—that exploits thecausal~i.e., forward in time! dynamic content of multi-variate temporal data.
In contrast to HD, standard decomposition techniquincluding principal components analysis~PCA!, alone ormodified by varimax rotations~VR!,4 independent com-ponents analysis~ICA!,2 and non-negative matrix factorization ~NMF!,10 do not exploit causality. Briefly, PCAseeks sources that are uncorrelated~orthogonal!. ICAseeks sources that are independent in an informattheoretic sense. VR seeks a transformation that mmizes the sum of the variances in each extracted souNMF attempts to find sources whose weights are nnegative ~i.e., only additive, not subtractive, combinations of sources are allowed!. These assumptions arappropriate for image analysis or for separation of indpendent time series~e.g., the ‘‘cocktail party problem,’’separating one voice from the cacophony of other cversations and background noise!, but may be unsuitablefor multivariate systems originating from dynamicalinterrelated sources. In particular, these methods assthat the order of data points is irrelevant, and thus woproduce equivalent results for the original, time-reversor randomly shuffled data.
As described below, the HD method takes advantaof the signal dynamics, and thus may achieve a museful resolution of the sources of a multivariate timseries. HD was designed with physiologically deriv
enchplens,r-
ac-d
hatwe
hosom
, itss inced
thing
er-iveond
ad-cesertee
ar-ip-
ofsi-et-
l-
isro-os-o-
ateu-emor-dhatds;npoan
cing
ralhe
s,har-in
the
ri-ethatousat
latens
urestomi-or
asf
-odeionby,s
hir-of
d
ed
II
du/sist-of
s
1136 REPUCCI, SCHIFF, and VICTOR
multivariate time series in mind. Such time series oftresult from dynamically interrelated generators, in whisome of the generators have a driving role. For examthe EEG exhibits synchronizations, desynchronizatioand rhythmic oscillations resulting from cortical, subcotical, and brainstem network activity: these are charteristics anticipated in a dynamically interrelatesystem.18 To exploit these observations, we assume tthe generators are organized hierarchically. Namely,assume that there is an autonomous generator wstate depends only on noise input and feedback fritself ~i.e., an autoregressive process!. Each successivegenerator is influenced by an independent noise inputown state, and the state of more dominant generatorthe hierarchy, such that the last generator is influenby noise input and the state ofall generators. The HDmethod, as presented below, attempts to account fortemporal dynamics in a multivariate data set by resolvit into a model of this form.
We show that if a system does conform to this hiarchical structure, and the multivariate autoregressmodel includes sufficient terms, then the HD resolutiis unique and will recover the hierarchically relatesources. Moreover, the HD approach will not be misleing for an independent system; if the underlying sourare truly independent, the HD algorithm will uncovthem as well. On the other hand, there is no guaranthat all multivariate time series are reducible to hierchical form—one such example is a system with recrocally related generators. For such systems, failurethis approach to identify a hierarchical resolution is potive evidence for the presence of a more complex nwork structure.
In brief, the HD approach consists of building a mutivariate linear autoregressive~MLAR ! model6 from thePCA-derived components of the original data set. Thisfollowed by seeking a coordinate transformation thattates the MLAR model to make it as consistent as psible with a hierarchical interrelationship among compnents. We demonstrate that for simulated multivaritime series with truly hierarchical structure, HD accrately extracts the underlying generators of the systWe then use the HD method to analyze ictal electrocticographic~ECoG! records, the application that inspirethe HD approach. Analysis of these data reveals tsignificant hierarchical structure is present in all recorthis fact is not obvious by PCA or ICA decompositioalone. Subsequent analysis of these HD resolved comnents adds further insight to the dynamic nature ofictal discharge.
In previous work,15 we identified certain characteristidynamic nonlinearities in EEG traces obtained durabsenceseizures~also known aspetit mal seizures! andin the PCA-derived temporal components from tempolobe ictal ECoG data. The similarities, most notably t
,
e
e
.
-
nonlinear interactions at lags around 90 and 150 msuggested a common underlying mechanism. These cacteristic dynamics, however, were observed onlycomponents that accounted for a small fraction ofvariance of the original data~third, fourth, or fifth rank-ing components, when ordered by the amount of vaance explained!, thus leaving their biological significancuncertain. By contrast, the HD method demonstratesthese nonlinearities are present in the more autonomcomponents~i.e., the components in the system thdrive the remaining components!. This indicates that thesize of the components does not necessarily correwith their physiological importance, and thus strengthethe evidence that temporal lobe and absence seizshare a common mechanism. Here, the ability of HDresolve the underlying generators on the basis of dynacal interrelationships, rather than by independencepower, sheds light on a prior puzzle.
METHODS
Multivariate ECoG data were previously collectedbriefly described here~for further details, see Schifet al.15!. Epilepsy patients with medication-resistant temporal lobe seizures were studied with subdural electrgrids and strips in the course of presurgical evaluatfor temporal lobectomy. ECoG signals were recordeda 64-channel Telefactor Beehive~West ConshohockenPA! telemetry system~sampling rate of 200 Hz, low-pasfilter at 0.3 Hz and high-pass filter at 70 Hz!, and con-current video images of each patient were obtained. Tteen to 16 channels of clear artifact-free recordingsictal events~occurring during wakefulness! were resa-mpled at 100 Hz~2:1!; the mean was subtracted andetrending applied. Typically, 5 s of data recorded fromwithin an ongoing ictal event were analyzed as describin the text.
All data analyses herein were carried out viaMATLAB
functions and scripts~version 5.3, release 11, PentiumPC running Windows 2000!, which are archived, alongwith a sample data set, at http://www.med.cornell.eresearch/hds. We consider a multivariate data set coning of N evenly spaced observations in time, on eachM channels (M<N), represented as a matrixX (M3N). In our application to ECoG dataN5512 ~approxi-mately 5 s of data sampled at 100 Hz! and typicallyM514 ~epicortical recording sites!.
Principal Components Analysis (PCA)
The original data matrixX is processed by PCA~seefile ‘‘pca.m’’ archived in ‘‘HD–ABME–code.zip’’ athttp://www.med.cornell.edu/research/hds!, yielding a sec-ond matrixY (M3N), consisting of linear combinationof P orthogonal principal components,
ts,alsncbert
ral-ts
ed
sestob-er-
totru-
or
toac
g ana
-sh
/
is
t a
dcerds
the
m
en
ns-
at
n-dy-
s
f-
of
-
--
1137Hierarchical Decomposition of Multivariate Time Series
Y5CTWT. ~1!
C is a P3M matrix whose rows are the spatial weighT is a P3N matrix whose rows are the temporweights, andW is a P3P diagonal matrix of eigenvaluewhose elements, squared, are the amount of variaexplained by each principal component. For any numof componentsP @P<min(M,N)#, PCA guarantees thathe unexplained variance betweenX and Y,
RPCA5trb~X2Y!~X2Y!Tc, ~2!
is minimized ~i.e., Y is the ‘‘best fit’’ to X!.This decomposition ofX into orthogonal~i.e., inde-
pendent or instantaneously uncorrelated! componentsC andT, however, is not unique. Replacing the tempocomponentsT by GT ~for any nonsingular linear transformation G! and, accordingly, the spatial componenC by W21(G21)TWC, leaves the matrix Yunchanged: @(W21(G21)TWC)T#W@GT#5CTWG21W21WGT5CTWT5Y, where we have usedbasic properties of the transpose, and the fact thatW isdiagonal. The specific, but arbitrary, solution identifiby PCA consists of matricesC and T whose rows areorthonormal, and a matrixW whose diagonal elementare non-negative and ordered from largest to smallResolving the nonuniqueness of the decompositiontained with Eq.~1! is the focus of the remainder of thHD analysis~i.e., finding a unique, nonarbitrary transfomation G!.
We use PCA as a first step in the analysis in orderreduce the dimension of the system by removing insmental noise from the signal~without removing biologi-cal noise12!. For real data, theP temporal componentsare selected such that each component accounts ffraction of variance greater than or equal toM 21, exceptwhere noted. Typically this allows us to reduce a 13-16-channel system to 3–5 components that togethercount for greater than 80% of the variance. Selectinsmall number of components reduces the computatioburden for the following analyses.
Multivariate Linear Autoregression (MLAR)
We next create a MLAR model of the temporal componentsT in a form equivalent to that proposed by Gerand Yonemoto ~see file ‘‘mlar.m’’ archived in‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds!.6 The P temporal components~i.e., eachelement Tp,n—nth sample of thepth component! aremodeled by anL-term autoregression:
Tp,n5Rp,n1mp1 (q51
P
(l 51
L
Aq,p,lTq,n2 l , ~3!
e
.
a
-
l
whereR is a P3N matrix of residual values,mp is themean value of thepth component, andA is a P3P3L three-dimensional array of model coefficients. Ithelpful to view the model coefficientsA as a set ofLplanes, each of which is aP3P matrix. Thel th plane ofA describes the estimated influence of the signals atime n2 l , on the signal values at timen. Namely,Aq,p,l
is the influence of theqth component at thel th time lagon the current value of thepth component.
The model coefficientsA are determined by minimiz-ing the sum of squared residual values~the innovations,Rp,n! in the model
RMLAR5 (p51
P
(n51
N
Rp,n2 , ~4!
via the Yule–Walker equations.22 We choose the maxi-mum lag L ~the order of the MLAR model! by theAkaike criterion ~AIC!,1 a statistical justification baseon whether the amount of reduction in residual varianis sufficient to justify the inclusion of additional lineamodel terms. In the examples below this typically yiela value ofL from 2 to 4 lags.
Decorrelation of the MLAR Innovations
The first step in resolving the nonuniqueness ofPCA decomposition relies upon theP3N matrix of in-novationsRp,n . These quantities are viewed as randoterms that drive theP channels of the MLAR model.Thus, if T represents distinct underlying generators, ththe corresponding driving terms in Eq.~3! will be un-correlated. For this reason, we require that the traformed temporal componentsGT decorrelateRp,n . Ad-ditionally, it is analytically convenient to assume theach channel of innovations has equal variance~i.e., noparticular source is noisier than another!. This amountsto an arbitrary choice for the overall size of each geerator, but makes no further assumptions about thenamical structure of the system.
Under these assumptions, we seek transformationKthat orthonormalize the innovationsRp,n—that is, (KR)(KR)T5KRRTKT5I , whereI is the identity matrix. Onesuch matrixK can be obtained by dividing the rows othe eigenvectors ofRRT by the square root of the corresponding eigenvalues.~The rows of the matrixK arenecessarily orthogonal, since they are scalar multiplesthe eigenvectors of the positive symmetric matrixRRT.!The transformationK applied to Eq.~3! yields new tem-poral componentsT85KT, new autoregressive coefficients Al8
T5KAlTK21 ~for l 51¯L, where Al
T denotesthe transpose ofAl!, and decorrelated, normalized innovations R85KR. Nevertheless, this does not fully resolve the nonuniqueness problem, since premultiplyingK
l-
offor
ill
sedffi-y
as
re-lityer
the
by
a
n-ble
rs
is
f
toalfor-es,esthele
a-ehe
re,
-os-
atpro-
oreto
albewealnt
’’
r
/m-sasnt
1138 REPUCCI, SCHIFF, and VICTOR
by any orthogonal transformationQ ~i.e., Q such thatQQT5QTQ5I ! yields an alternative matrixG5QK,for which GR is likewise decorrelated and normaized: (GR)(GR)T5(QKR)(QKR)T5QKRRTKTQT
5QIQT5I .
Novel Search Among Orthogonal Rotations
At this point, we have reduced the nonuniquenessthe PCA decomposition to an arbitrary scale factoreach channel—the normalization performed above wsuffice—and any orthogonal transformationQ. We nowsearch for a rotation of the temporal componentsQT8that is consistent with the hierarchical structure propoin the introduction. Recall that the autoregressive coecients A8 from the decorrelated MLAR model specifthe dynamic interrelationships among theP componentsat up toL lags. Accordingly, the HD algorithm seeksP3P transformationQ that simultaneously transformall L matrices Al8 into upper-triangular form~see file‘‘rotation.m’’ archived in ‘‘HD–ABME–code.zip’’ athttp://www.med.cornell.edu/research/hds!. We requirethat Q be orthogonal, thereby guaranteeing that thesidualsRMLAR are unchanged and that the orthonormaof the innovationsR8 is preserved. The search is furthrestricted to the set of [email protected]., det(Q)51# to avoidarbitrary changes in the sign of an odd number ofcomponents.
Q is obtained iteratively via a procedure motivatedJacobi matrix diagonalization.11,13 We begin with an ar-bitrary rotationQ0 ~see below!, and successively applysequence of individual plane rotationsJ, each about apair of axes@u,v#. The sequence of plane rotations cosists of multiple cycles through each of the possichoices of axis pairs. That is,
Q5Jud ,vd~u f d!¯Ju1 ,v1
~ud11!Jud ,vd~ud!
¯Ju1 ,v1~u1!Q0 , ~5!
whered5P(P21)/2 is the number of unique axis pai~i.e., for a 333 matrix d53, for a 434 matrix d56,and so on!, f is the number of cycles through all axpairs, and eachJu,v(u) is a rotation by an angleu aboutaxes@u,v#. For example,
J4,2~u!5F 1 0 0 0
0 cos~u! 0 sin~u!
0 0 1 0
0 2sin~u! 0 cos~u!
G . ~6!
At the nth stage in the iteration,u1 ,...,un21 are heldfixed, andun is determined by minimizing the sum o
squared elements below the diagonal~the residuals,RHD!in all L matricesAl8 ~see the Appendix!:
RHD5(l 51
L
(p52
P
(q51
p21
@~QAl8QT!p,q#2. ~7!
The algorithm terminates when a full cycle ofd itera-tions passes without further reduction ofRHD . In otherwords, termination occurs when theL matricesAl8 areconcurrently transformed to be as close as possibleupper-triangular form. If the generators of the originsystem can be cast in a hierarchical form, this transmation returns a new set of upper-triangular matricAHD ; for other systems, this procedure yields matricAHD that are as nearly upper triangular as possible, insense thatRHD has reached the minimum attainabvalue.
This process of minimization, by successive appliction of rotationsJu,v(u), in a manner analogous to thJacobi matrix diagonalization procedure, falls into tclass of ‘‘direction set’’ methods.13 Direction set methodsentail sequential one-dimensional minimizations—hedetermination ofu f d about the axis pair@ud ,vd#—of amultidimensional—here,d-dimensional—space. The sequence of axis pairs is chosen to cycle through all psible pairs of axes~i.e., for a 434 matrix, we used56 and the cycle@2, 1#, @3, 1#, @4, 1#, @3, 2#, @4, 2#, @4,3#, @2, 1#,...!. Numerical experimentation indicates ththere is no particular advantage to any sequence,vided that each axis pair is examined once per cycle.~Ingeneral, however, direction set methods tend to be mefficient when the gradient of the function is usedguide the sequence.13! As with any multidimensionalminimization, there is a risk of being trapped in a locminimum. While simulated annealing methods couldused to ensure that the global minimum is found,have found that it is sufficient simply to run severminimizations in parallel, each preceded by a differeinitializing rotationQ0 chosen from a collection of well-spaced rotations~see the Appendix and file ‘‘rotmesh.marchived in ‘‘HD–ABME–code.zip’’ at http://www.med.cornell.edu/research/hds!.
The rotationQ that attains the global minimum foRHD ~see file ‘‘globalmin.m’’ archived in‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds! is subsequently used to transform the teporal componentsT8, thereby identifying componentTHD5QT8, such that each component is, as nearlypossible, influenced only by itself and the more dominacomponents in the hierarchy.
r-mena-
ysisly-
orermor
os
ef-e
edonsne
e
uretoof
elarele-
aed
r-
nal
heari-ed
n-a-re-r
-chi-
thee-for
r-nt-ical
ichar-odys-
ee-the
1139Hierarchical Decomposition of Multivariate Time Series
Nonlinear Analysis of HD Resolved Components
In our application of HD to ictal ECoG, we are inteested in the nonlinear features of the underlying tiseries identified by the HD procedure. We therefore alyzed the time seriesTHD:p ~for p51¯P! for nonlinearcharacteristics via nonlinear autoregressive anal~NLAR!, and compared them to the related NLAR anasis of the principal componentsTp ~for p51¯P!. TheNLAR method is discussed in detail elsewhere.16,17
Briefly, a 20-term linear autoregressive model feach time series is augmented by a single nonlinear tthat models a joint influence of the signal at two pritimes ~timesn2 j andn2k! on the signal at timen. ThecoefficientsB and C of this nonlinear model,
THD:p,n5RNLAR: p,n2Bp,02(l 51
20
Bp,lTHD:p,n2 l
2Cj ,kTHD:p,n2 jTHD:p,n2k ,
~ j 51¯20, k51¯ j !, ~8!
are then determined separately for each of the 210 psible choices of a single nonlinear term~i.e., each of the210 unique assignments ofj and k to pairs in the set$1,..., 20%! by minimizing RNLAR . This yields C, a 20320 symmetric matrix of nonlinear autoregressive coficients ~where each entry inC is resolved by separatmodels!.
The addition of each nonlinear termCj ,k to the au-toregressive model reduces the model varianceV by anamountDVj ,k in comparison with the variance associatwith the best 20-term linear AR model. Via an extensiof the AIC,19 we use NDV/V to assess whether thireduction in variance is significant. We also examiNDVj ,k /V as a function ofj and k; this ‘‘NLAR finger-print’’ summarizes the nonlinear dynamics of each timseries over the lags considered, up to second order.
Definitions of Some Useful Quantities
In assessing the degree of hierarchical structpresent in any set of MLAR coefficients, it is usefuldefine four quantities as follows. For any such setMLAR model coefficientsF ~whereF may beA, A8, orAHD as below!, the sum of all squared elements is
RTOTAL5(l 51
L
(p51
P
(q51
P
Fp,q,l2 . ~9!
The size of the decoupled portion of the MLAR mod~i.e., the extent to which the components’ dynamicsindependent! is summarized by the sum of squared ements on the diagonal in allL matricesFl :
-
RDIAG5(l 51
L
(p51
P
Fp,p,l2 . ~10!
The size of the hierarchical dynamic relationships inMLAR model is summarized by the sum of squarelements above the diagonal in allL matricesFl :
RUPPER5(l 51
L
(p51
P21
(q5p11
P
Fp,q,l2 . ~11!
The size of the dynamic relationships that are ‘‘antihiearchical’’ ~i.e., neither hierarchical nor independent! isequal to the sum of squared elements below the diagoin all L matricesFl :
RLOWER5(l 51
L
(p52
P
(q51
p21
Fp,q,l2 . ~12!
Thus, RTOTAL5RDIAG1RUPPER1RLOWER.These four quantities will be used to compare t
degree of independent and hierarchical drive in the vous MLAR models at each stage of analysis describabove.RTOTAL before and after the transformation idetified by HD must be unchanged, since this transformtion is orthogonal. However, the stage of noise decorlation may changeRTOTAL because of the scale factoimplicit in the transformationK. For this reason, we willtypically focus on the quantitiesRDIAG , RUPPER, andRLOWER, as fractions ofRTOTAL , and refer to these fractions as the independent, hierarchical, and antihierarcal drives, respectively.
We draw the readers attention to the fact thatarbitrary order chosen for the principal and noisdecorrelated components will define one set of valuesRUPPER and RLOWER for each model, and that any reodering of those components potentially yields differesets of values. Thus, theP! permutations of those components may change the apparent percent of hierarchdrive in the system. To distinguish the degree to whany such permutation will lead to a more or less hierchical model versus the degree to which the HD methuncovers the inherent hierarchical structure in the stem, we shall use the maximum value ofRUPPER—whereRUPPER is considered for all permutations of thcomponents—for all models of the principal and noisdecorrelated components. This necessarily results inminimum possible value forRLOWER, and guaranteesthat RUPPER>RLOWER.
1140 REPUCCI, SCHIFF, and VICTOR
FIGURE 1. HD analysis of a three-generator hierarchical system constructed from a two-lag MLAR model. „A… Time series of thethree hierarchical generators, showing one, two, and three dominant frequencies, respectively. „B… Sixteen time series com-posed of random linear combinations of the three time series in „A…. „C… Time series of the first three principal componentsobtained from „B…. „D… Time series of the three resolved hierarchical components obtained by rotation of the principal compo-nents. The ability of the HD method to extract hierarchical sources is evidenced by the virtually identical time series „disre-garding signal polarity … in „A… and „D…, as compared to the scrambled and dissimilar principal components in „C….
diateree
are-s ins
in/too
om
ntslsopo-
hi-aln-imeali-se
ndl
Inthetoe
ent:cy,ree
e–
sti-
htm-
Dis-
RESULTS
HD Analysis of Simulated Multivariate Time Series
To demonstrate proof of principle of the HD methowe present several analyses of simulated multivarsystems. We first consider systems composed of thgenerators driven by Gaussian noise and linked inhierarchical relationship. For the first system, the autogressive model has nonzero coefficients up to two lagthe past~i.e., L52!; this situation permits unambiguouidentification of the underlying generators~this data setand the code to reproduce Fig. 1 is archived‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds!. We then limit the autoregressive modela single lag ~i.e., L51! and show that this leads tdegeneracy in the possible minimizing rotationsQ, andconsequent lack of uniqueness in the hierarchical decposition. Finally, we examine a system of threeindepen-dent generators, with nonzero autoregressive coefficieup to L52; this example demonstrates that HD is acapable of extracting dynamically independent comnents.
To create the first simulated system of three hierarccally related time series, we fix a two-lag hierarchicMLAR model, and drive each generator with indepedent Gaussian white-noise input, generating three tseries of lengthN5512. Coefficients above the diagonin the MLAR matrices—which correspond to hierarchcal influences between channels—were randomly cho
-
n
from a Gaussian distribution with a mean of zero aunit variance.~It can be shown that for a hierarchicasystem off-diagonal coefficients do not affect stability.our simulations, stability was ensured by choosingMLAR coefficients on the diagonal to correspondpolynomials of orderL whose roots have absolute valuless than unity.9! Figure 1~A! shows the three resultingtime series~the simulated generators!. The influence ofeach generator on the subordinate generators is evidthe first generator exhibits a single dominant frequenwhile the second and third generators have two and thdominant frequencies, respectively.~Analysis of thepower spectra confirms this.! Estimation of the MLARcoefficients for the simulated generators via the YulWalker procedure will approximate~but not equal! thevalues used to construct the system. That is, the emated coefficients yieldRLOWER50.0003, a good ap-proximation to the intendedRLOWER50. By comparison,RDIAG58.0316 andRUPPER54.7208, indicating;63%independent drive~i.e., RDIAG /RTOTAL'0.63! and;37%hierarchical drive ~i.e., RUPPER/RTOTAL'0.37! in theoverall signal.
To simulate the kind of mixing of sources that migoccur in EEG recordings, we take 16 random linear cobinations of the three generators@see Fig. 1~B! and thefile ‘‘lincomb.m’’ archived in ‘‘HD –ABME–code.zip’’ athttp://www.med.cornell.edu/research/hds#. PCA decom-position of these 16 time series—the first step in the Hanalysis—yields components that are remarkably d
es,e
-teeint,a-
theeR
sre-
ity.thenc
can
os-re-anyof
tiom
to
ai-i-
allyeds
in,ors
e
e-tes
tedic
gn
an
he
ian--
–
y
he
larttrixarn-
1141Hierarchical Decomposition of Multivariate Time Series
similar to the original generators@see Fig. 1~C!#. Eachprincipal component contains a mixture of frequenciand the MLAR model of them gives no hint of thhierarchical structure~RTOTAL522.9593,;64% indepen-dent and;24% hierarchical drive!. Because these signals are Gaussian, decorrelation by PCA guaraninformation-theoretic independence at each time poand is thus equivalent to ICA. In other words, examintion of the higher-order momentsat zero lagwould nothelp to reveal the hierarchical structure. By contrast,explicit use of signal dynamics allows HD to resolvhierarchically related components: the resulting MLAmodel is characterized by;64% independent and;36%hierarchical drive (RTOTAL512.5284), close to the valueobtained from the original generators. Strikingly, thesolved components@see Fig. 1~D!# are virtually identicalto the original generators, other than signal polar~Amplitudes of the extracted components also matchoriginal generators because we chose equal-varianoises as the driving terms.!
For a three-dimensional system such as this, wevisualize the search for the rotationQ ~see the Methodssection! by creating a representation of the space of psible three-dimensional rotations. To construct this repsentation of the rotation space, we recognize thatthree-dimensional rotation is specified by an anglerotation ~u, from 0 to p rad! around a unique axis inspace. We can thus represent a three-dimensional rotaas a pointr in a solid sphere, where the direction frothe center of the sphere tor indicates the axis of therotation, and the length of this vector representsu. Fig-ure 2 uses this representation of the rotation spaceexamine the global behavior of the residual valuesRHD
for the example outlined in Fig. 1. Notice that there islocal minimum in the neighborhood of the global minmum that provides a potential trap for the HD minimzation algorithm.
The second example consists of three hierarchicrelated generators, but one in which the predefinMLAR model stipulates only single-lag influence~RTOTAL511.8557,;7% independent and;93% hierar-chical drive!. Figure 3 shows the time series (N5512)for this example in the same format as Fig. 1. AgaPCA is unable to resolve the hierarchical generat~RTOTAL51.6037,;73% independent and;23% hierar-chical drive!. By contrast, the HD algorithm uncovers thhierarchical components~RTOTAL511.8533,;8% inde-pendent and;92% hierarchical!, as is evidenced by thesimilarity between Figs. 3~A! and 3~D!. However, thehierarchical decomposition is not unique: the global bhavior of the residuals, as shown in Fig. 4, demonstrathe existence of multiple global minima.
It can be shown that this nonuniqueness is expecfor single-lag hierarchical MLAR models. For a generP-component system, there areP! rotationsQ ~not lim-
s
e
nited to permuting the generators or changing their si!that minimizeRHD . These rotations correspond to theP!possible orderings of eigenvalues in the matrixAHD . ~Tosee this, we observe that transformingA8 into upper-triangular form by a rotation amounts to choosingappropriate orthonormal basis to form the columns ofQ.When hierarchical generators exist, application of tGram–Schmidt procedure to the eigenvectors ofA8 pro-vides such an orthonormal basis. This is because trgular form for a matrix implies a full set of real eigenvalues and eigenvectors.11 Each of theP! orderings ofthe P eigenvectors yields a different basis via the GramSchmidt procedure, and hence, a distinct rotationQ anda distinct set ofL triangular matricesAHD: l5QAl8Q
T.!This degeneracy does not occur withL>2, since rota-tions Q that makeQA18Q
T triangular do not necessarilmakeQA28Q
T triangular.An extension of this analysis allows us to state t
conditions under which a set of matricesA1 ,A2 ,...,AL
can be simultaneously transformed to upper-trianguform by an orthogonal matrixQ. First, each matrix mushave a full set of real eigenvalues, so that each macan be individually transformed into upper-triangulform.11 Second, it must be possible to order the eigevectors for each matrix in such a way that~a! the firsteigenvector of each matrix is identical~other than a scalefactor!, and~b! for eachk<Q, the firstk eigenvectors of
FIGURE 2. Graphical representation of a portion of the rota-tion space for the HD analysis of the system displayed inFig. 1. Each rotation Q is specified by a point r in a solidsphere, where the direction from the center of the sphere tor indicates the axis of the rotation, and the length of thisvector represents u. The residual value RHD for each rotationis represented in color „see scale to the right …. In this portionof space „angle of rotation uÉ1.9478 rad …, we can see both aglobal minimum and a local minimum, which provides a po-tential trap for the HD algorithm.
1142 REPUCCI, SCHIFF, and VICTOR
FIGURE 3. HD analysis of a three-generator hierarchical system constructed from a one-lag MLAR model. „A… Time series of thehierarchical generators. „B… Sixteen time series composed of random linear combinations of the three time series in „A…. „C…
Time series of the first three principal components obtained from „B…. „D… Time series of the three resolved hierarchicalcomponents obtained by rotation of the principal components. Although HD extracts hierarchical sources, the hierarchicaldecomposition for the single-lag case is not unique „see Fig. 4 ….
.ro-
e-siaRs-
m
n-the
each matrix span the samek-dimensional subspaceThese conditions fix the order of the eigenvectors, pvided that the number of lagsL.1.
The final simulated example consists of three indpendent generators, each driven by independent Gausnoise. To simulate independence, the two-lag MLAmodel has no off-diagonal terms. MLAR coefficients etimated from the time series (N5512) generated by thismodel @see Fig. 5~A!# indeed indicate essentially 100%
n
independent dynamics:RDIAG57.5851 and RTOTAL
57.5875. When PCA is applied to a set of 16 randolinear combinations of these generators@see Fig. 5~B!#,the extracted components see@Fig. 5~C!# depend in parton the sizes of the contributions of the individual geerators to each channel, and thus do not fully revealindependent dynamics~RTOTAL57.5888,;94% indepen-dent,;3% hierarchical, and;3% antihierarchical drive!.
FIGURE 4. Graphical representation of three portions of the rotation space for the HD analysis of the system displayed in Fig.3. The residual value RHD for each rotation Q is represented as in Fig. 2. The three panels are chosen to contain three of theglobal minima for RHD „uÉ0.1257 rad in panel A, uÉ0.3770 rad in panel B, and uÉ0.6283 rad in panel C …. This illustrates thedegeneracy in the rotation Q for single-lag models. „Small differences in the color of each minimum result from the rough, finitesampling required by the graphical presentation. …
1143Hierarchical Decomposition of Multivariate Time Series
FIGURE 5. HD analysis of a three-generator independent system constructed from a two-lag MLAR model. „A… Time series of theindependent generators. „B… Sixteen time series composed of random linear combinations of the three time series in „A…. „C…
Time series of the first three resolved principal components. „D… Time series of the three HD-resolved independent components.Here, we see that the HD method is also capable of extracting dynamically independent sources.
e-ntsird
o-
thetors
ceedyIC
s.e
nly
ipanrem-
th
dye
by
here,
t
n-n-
u-s
Following application of HD@see Fig. 5~D!#, we findapproximately 100% independent drive (RTOTAL
57.5862). However, since there is no hierarchical dpendence in this example, the order of the componeextracted by HD is arbitrary—here, the second and thgenerators are interchanged in Fig. 5~D! compared withFig. 5~A!.
HD Analysis of Ictal ECoG Data
Here, we use HD to reanalyze the ictal electrocorticgraphic records examined by Schiff and co-workers.15 Inthe analysis of electroencephalographic data, neithernumber nor the time course of the underlying generais known in advance. Therefore, we select theP compo-nents that constitute most of the original signal varian~see the Methods section!, and are interested in both thdegree of the resolved hierarchical structure and thenamics of each component. We use the traditional A~Ref. 1! to determine the order of the MLAR modelApplication of a stricter AIC that doubles the relativweighting of the number of degrees of freedom19 typi-cally reduces the model orderL by 1 but does not oth-erwise significantly alter the results~data not shown!.Since individual seizure generators are commothought of as cyclic~i.e., roughly periodic in time!, it isimportant to point out that the hierarchical relationshbetween generators, which we seek here, is by no mein contradiction to this view. That is, generators that arelated to each other in a hierarchical fashion may theselves be cyclic in time.
-
s
Strikingly, the components resolved by HD are boproportionally more hierarchically influenced~comparecolumns labeled ‘‘RUPPER/RLOWER’’ in Table 1! andmoreindependently driven ~compare columns labele‘‘% RDIAG’’ in Table 1! than the components resolved bPCA. For example, the MLAR model coefficients of thprincipal components—A in Eq. ~3!—obtained from thethird seizure record of patient 2 are characterized;94.3% independent drive (%RDIAG) and ;4.4% hier-archical influence (%RUPPER), yielding a hierarchical toantihierarchical ratio (RUPPER/RLOWER) of 3.4 ~sixth row,first column, Table 1!. By contrast, the MLAR modelcoefficients of the hierarchical components (AHD) revealsthat ;96.2% of the drive is independent (%RDIAG),;3.6% is hierarchical (%RUPPER), and the ratio of hier-archical drive (RUPPER/RLOWER) is 16.3 ~sixth row, thirdcolumn, Table 1!. Some, but not all, of this simplificationof model structure is due to the reorganization of tmodel coefficients by the noise decorrelation proceduwhich yields model coefficientsA8. For this example,noise decorrelation alone yields;96.0% independen(%RDIAG) and ;2.5% hierarchical drive (%RUPPER).But, in most instances, noise decorrelation fails to idetify a proportionally more hierarchical organizatio~RUPPER/RLOWER is only 1.7!, whereas HD extracts components that are both more independent~largerRDIAG /RTOTAL , all records! and proportionally more hi-erarchical ~greater ratio ofRUPPER to RLOWER in 7/9records!, as detailed in Table 1. Moreover, the contribtion of each of theL coefficient matrices to the indice
sact
ents
HDdi-
een
re
nat
bar.nent-by
lids,
-
ce
on-ents
Difi-far
ra-CAtoor
ds
eshi-herein
s ofntspectinthe
-otB
LE1.
Sum
mar
yof
the
perc
enta
geof
inde
pend
ent,
hier
arch
ical
,an
dan
tihie
rarc
hica
ldriv
efo
rth
eM
LAR
mod
elco
effic
ient
sof
the
prin
cipa
lcom
pone
nts
deco
mpo
sitio
n…,n
oise
-dec
orre
late
dP
CA
deco
mpo
sitio
n„A
8…,
and
hier
arch
ical
deco
mpo
sitio
n„A
HD…
for
allp
atie
ntre
cord
s.T
hepe
rcen
tage
sar
eob
tain
edby
norm
aliz
ing
the
valu
esfo
re
inde
pend
ent
„R
DIA
G…,
hier
arch
ical
„R
UP
PE
R…,
and
antih
iera
rchi
cal
„R
LOW
ER…
driv
eby
thei
rto
tal
„R
TOTA
L…
inea
chin
stan
ce,
here
labe
led
%R
DIA
G,
%R
UP
PE
R,
and
%R
LOW
ER
,sp
ectiv
ely.
The
unitl
ess
ratio
RU
PP
ERÕR
LOW
ER
refle
cts
the
rela
tive
prop
ortio
nof
hier
arch
ical
influ
ence
inth
esy
stem
.Per
cent
ages
for
RU
PP
ER
and
RLO
WE
Rfo
rA
and
A8
refle
cte
mos
thi
erar
chic
alor
derin
gof
the
resp
ectiv
eco
mpo
nent
s,as
outli
ned
inth
eM
etho
dsse
ctio
n,su
chth
atR
UP
PE
RÐ
RLO
WE
R.A
llva
lues
are
roun
ded
toth
ene
ares
t0.
1fo
rar
ity.
Prin
cipa
lcom
pone
nts
Noi
se-d
ecor
rela
ted
com
pone
nts
Hie
rarc
hica
lcom
pone
nts
%R
DIA
G%
RU
PP
ER
%R
LOW
ER
RU
PP
ERÕR
LOW
ER
%R
DIA
G%
RU
PP
ER
%R
LOW
ER
RU
PP
ERÕR
LOW
ER
%R
DIA
G%
RU
PP
ER
%R
LOW
ER
RU
PP
ERÕR
LOW
ER
atie
nt1
Sei
zure
189
.68.
71.
75.
297
.21.
81.
11.
698
.51.
20.
43.
2S
eizu
re2
69.8
26.2
4.0
6.5
84.4
10.6
5.0
2.1
97.4
2.3
0.3
8.3
Sei
zure
384
.612
.82.
74.
888
.87.
73.
52.
295
.93.
40.
74.
7at
ient
2S
eizu
re1
82.5
11.9
5.6
2.1
84.1
10.8
5.1
2.1
88.3
9.9
1.8
5.4
Sei
zure
293
.26.
30.
611
.396
.72.
40.
92.
698
.11.
80.
114
.7S
eizu
re3
94.3
4.4
1.3
3.4
96.0
2.5
1.5
1.7
96.2
3.6
0.2
16.3
atie
nt3
Sei
zure
193
.64.
91.
53.
394
.44.
01.
62.
595
.44.
10.
58.
5S
eizu
re2
99.7
0.2
0.1
4.0
99.8
0.2
0.0
5.6
99.8
0.2
0.0
10.4
atie
nt4
Sei
zure
196
.32.
90.
83.
798
.80.
80.
41.
999
.70.
20.
013
.3
1144 REPUCCI, SCHIFF, and VICTOR
RUPPER andRLOWER are approximately equal in all case~data not shown!, perhaps as a consequence of the fthat the HD algorithm treats allL matrices equally.
As alluded to in the Introduction, examination of thnonlinear dynamics present in the extracted componeprovides another means to interpret the results of themethod. To assess the nonlinear dynamics in an invidual time series we use the NLAR fingerprint.19 Thisapproach identified common nonlinear dynamics betw3/s spike-and-wave seizure traces,16,17 and the principalcomponents derived from the partial complex seizurecords of patients 1 and 2.15 Figure 6 compares theNLAR analyses of the principal components~as analyzedin Ref. 15! with that of the hierarchical components. IFig. 6, the percent of variance of the original signal theach component explains is shown by the height of a~For PCA this necessarily decreases as the componumber increases.! The significance of the nonlinear dynamics identified in each component, as summarizedNDV/V, is shown by the corresponding point on a soline @the dashed line indicates significant nonlinearitieNDV/V54 ~Ref. 19!#. Whereas the principal components with the largestNDV/V are typically those com-ponents that explain only a small fraction of the varian~leaving their significance unclear!, HD analysis revealsthat the hierarchical components with the greatest nlinear structure can be the most autonomous compon~lowest numbered component!. This is seen in seizure 1of patient 1, and in seizures 1 and 3 of patient 2.
This latter record is particularly striking because Hanalysis uncovers a hierarchical component with signcant nonlinearities (NDV/V'8) even though none othe principal components display significant nonlinecharacteristics (NDV/V,4). We hypothesize that thisreflects a more effective separation of underlying genetors by HD, and that the components extracted by Preveal only diluted effects of the nonlinearity duemixing. Additionally, five components were retained fseizure 3 of patient 1~following Schiff et al.!15—this isan exception to the guidelines mentioned in the Methosection—because the fifth component has anNDV/V'5.2. Here, HD analysis with five components expossignificant nonlinear characteristics in the third hierarccal component, which accounts for nearly 50% of toriginal signal variance. Finally, we note that there adifferences in the analyses of individual records withand across patients, both in terms of the relative sizethe hierarchical components and in the HD componethat demonstrate the greatest nonlinearities. We susthat these differences reflect physiologic differencesthe events themselves rather than nonrobustness ofanalysis method, since~a! the records themselves appeared artifact free,~b! the results of the analysis are nsignificantly altered by small changes in the model ord
er,TA „A th re th cl P P P P
1145Hierarchical Decomposition of Multivariate Time Series
FIGURE 6. Summary of the NLAR analyses of the principal components „see Ref. 15 … and hierarchical components derived fromeach record of patients 1 and 2. The percent variance explained by each component is shown by the height of a bar, while themaximum NDVÕV for that component is shown by the corresponding point on a solid line. The dashed line in each graphindicates NDVÕVÄ4, the criterion we have used for detection of significant dynamical nonlinearities „see Ref. 19 …. Often HDanalysis reveals that the component with the greatest nonlinear structure is one of the more autonomous components. Patient2, seizures 1 and 3 are particularly striking examples of this.
iso-
i-tale
rcep-
e
hisomrbi
t isbe
emhipiaat
as
edthe
iso
fi-hethehe
icgble
isar-
n-etntsys-sly
t
hisof aally,d
torson
reex-rsdoem
ul--
ear
rep-en-ten
imencethether
toosi-
Rn
iateip-ns,ur-byse
-m-hipchre
salngi-
1146 REPUCCI, SCHIFF, and VICTOR
and ~c! for systems in which the internal structureknown ~Figs. 1–5!, HD provides an accurate decompsition.
DISCUSSION
Hierarchical Decomposition (HD) Method
In summary, the ability of HD to uncover hierarchcally related components results from its fundamenuse of the dynamics inherent in any multivariate timseries. We use a MLAR model to simultaneously enfotwo conditions upon the components. The first assumtion is that the noise~i.e., residual values between thobserved behavior and the MLAR model! in each com-ponent is independent and of equal magnitude. Tmakes concrete the assumption that the stochastic cponents of the generators are independent, and it atrarily sets a scale for the size of each generator. Iimplemented in a single step, and can alwaysachieved, independent of the dynamics of the systThe second condition seeks a hierarchical relationsamong components. It is implemented iteratively, vsuccessive orthogonal rotations of the components totain a set of MLAR matrices that are as triangularpossible.
Using simulated systems of hierarchically organizgenerators, we show that HD is capable of extractingoriginal sources, whereas PCA or ICA cannot~Figs. 1and 3!. The hierarchical minimization proceduregraphically depicted in Figs. 2 and 4, and highlights twimportant caveats:~a! finding the global minimum forthe model ~i.e., the most triangular! requires avoidingvarious local minima, and~b! models with simple dy-namics~i.e., where only single-lag influences are signicant! are degenerate, having multiple global minima. Tlatter point underscores the fact that separation ofgenerators relies intimately on their dynamics. If tMLAR model is trivial ~i.e., L50, no significant dynam-ics in the individual components!, then the HD procedureis trivial, too, and the continuum of ambiguities intrinsto PCA remain. If the MLAR model contains one laonly, then the HD procedure can narrow down possimodels of the system to a discrete set of sizeP! ~whereP is the number of generators!. If the MLAR modelcontains two or more lags, then the HD decompositiontypically unique, and if the original system has a hierchical structure, the algorithm will identify it.
In these simulations, we know exactly how many geerators~i.e., three! contribute to each 16-channel data swe create. Moreover, the first three principal componenecessarily account for 100% of the variance of the stem. This highlights two potentially important limitationof the HD method. In real-world data, by selecting onthe most important principal components~see the Meth-ods section!, we make the explicit compromise of no
--
.
-
accounting for all of the variance of the signal~;20% inthe current application to EEG!. Although in principleHD analysis could be performed on the raw data, tcompromise is often necessary because identificationminimum in the extremely high-dimensional space ofpossible rotations is not likely to be robust. Additionallone might wonder how HD would behave if performeon fewer components than the number of generaknown to be present. Here, the results of HD analysisnumerical simulations provide reassurance~data notshown!, since main features of the decomposition apreserved. For example, three hierarchical sourcestracted by HD from a system built with four generatolook more similar to the four original generators thanany of the principal components, in so far as they seto be less arbitrarily mixed.
For many previous approaches to the analysis of mtivariate time series~including methods based exclusively upon PCA and related methods!, the underlyingrationale is that the observed signals represent linmixtures of independent sources~or generators!. In manysystems, such as the brain, the observed time seriesresent mixtures of sources that are not strictly indepdent, but rather dynamically influence one another, ofin stereotyped patterns.18 This is the rationale for thepresent approach, in which we separate multivariate tseries by assuming instead that the sources influeeach other in a hierarchical fashion. More generally,method can be adapted to seek interrelationships of ocanonical types~e.g., cyclic! by appropriate modificationof Eq. ~7!. Thus, the HD method may be consideredbe a special case of a large set of canonical decomption procedures.
The use of causality~i.e., the influence of prior ob-servations on the current observation! and its importancein the HD algorithm deserves emphasis. The MLAtechnique used by HD is implicitly dependent upocausal relations among the observations in a multivartime series data set. That is, the MLAR model descrtion of the data relies upon the order of the observatioas it describes the effect of prior observations upon crent observations. This description would be obscuredtime-reversing or shuffling the data. HD uses thecausal relationships to guide the search for a rotationQ~see the Methods section! that yields a hierarchical interrelationship among components. The hierarchy of coponents delineates a specific type of causal relationsin a data set, namely, that the current value for eacomponent is affected by prior values of itself and modominant components in the hierarchy.
This approach is in direct contrast to other cautechniques for multivariate data analysis, includiGranger causality,7 generalized autoregressive condtional heteroskedasticity~GARCH!,3 adaptive multivari-ate autoregressive modeling~AMVAR !,5 and directed
mek foionorm
to
saldilyes.etsre-
ifs.
tureofof
HD
-vi-heon-r,en
msfor
o-en
ofnts
ARncengelyo-ar.verhey
p-naloreoftor
aer-on-hein-
loralstionly-
50ce
llyheinbale of
eit
Theo-al-
ey
synednt,
ionni-is72,
s-
de-n
x-imill
1147Hierarchical Decomposition of Multivariate Time Series
coherence.8 These methods assume that the original tiseries represents the unmixed sources, and then looand characterize the degree and type of causal relatamong the data. In contrast, HD assumes a general ffor the causal interaction, and uses this assumptiondirect the decomposition into sources.~Granger7 notesthat multivariate data sets can have a ‘‘spurious cauchain appearance,’’ and examples of this can be reaconstructed by linear mixing of independent time seriHowever, as pointed out above, for multivariate data swhose MLAR structure includes more than one lag,duction to hierarchical form is not guaranteed, andsuch a reduction is possible it is typically unambiguou!From this point of view, HD appears to be unique.
Application of HD to Ictal ECoG
The degree of hierarchical and independent strucin all components derived from ictal ECoG recordspatients 1–4 is summarized in Table 1. ExaminationTable 1 indicates that the components resolved byare both proportionally more hierarchical~i.e., see thecolumns labeled ‘‘RUPPER/RLOWER’’ under each type ofcomponent! and more independently driven~i.e., see thecolumns labeled ‘‘%RDIAG’’ under each type of component! than the components resolved by PCA. Also edent from Table 1 is the fact that reorganization of tcomponents by noise decorrelation is partially respsible for this simplification of model structure. Howevenoise decorrelation alone typically results in a more evdistribution of off-diagonal coefficients~i.e., values forRUPPER/RLOWER that are closer to 1!. In addition, noticethat the proportion of hierarchical drive in these systerevealed by HD is sometimes quite large, particularlyseizures 2 and 3 of patient 2~RUPPER/RLOWER equals14.7 and 16.3, respectively!. Nevertheless, there is alsvariability, both from patient to patient and within patients, in the degree of hierarchical relationship betwethe underlying seizure generators.
Previously, we showed that PCA decompositiontemporal lobe seizure records uncovers componewhose nonlinear dynamics, as characterized by NLfingerprints, resemble those of the EEG during abseseizures.15 Because the principal components exhibitisignificant nonlinear characteristics account for relativsmall fractions of the original signal variance, the bilogical significance of the nonlinear signal was uncleIn contrast, HD analysis of these same records uncocomponents containing the same nonlinearities, but tcan appear relatively early in the hierarchy~see Fig. 6!.The position of these nonlinearities in the hierarchy apears to be independent of the amount of original sigvariance that these components explain. FurthermHD analysis may clarify the biological significancethese nonlinearities: they possibly reflect deep genera
rs
-
s
,
s
that contribute little to the surface EEG, but affectlarge portion of the entire neuronal system. More genally, the HD analyses demonstrate that although the nlinearities are not always a major contribution to toverall signal variance, they nevertheless may exertfluence over all or most of the system.
In addition, the NLAR fingerprints of the hierarchicacomponents support the connection between templobe and absence seizures, and strengthen the suggethat both seizure types may result from similar undering neuronal mechanisms.14,15 Significant nonlinearitiesin these fingerprints appear in the vicinity of 90 and 1ms,14 as is seen in the NLAR fingerprints of absenseizure traces,17 and as reported earlier for the NLARfingerprints of the principal components.15 Notice, how-ever, that the nonlinearities tend to be more statisticasignificant in the hierarchical components than in tprincipal components. Our findings may help explawhy even focal temporal lobe seizures produce gloalterations in awareness and behavior: the importanca neuronal circuit in driving global brain activity~i.e., itsposition in the hierarchy! may be disproportionate to thfraction of the variance of the surface activity thatexplains, especially for generators that are deep.results of HD modeling are thus consistent with the ntion that temporal lobe and absence seizures reflectteration of selective circuit mechanisms that play a krole in forebrain integration.15 The application of the HDtechnique to ECoG records of temporal lobe epilepsuggests that there is an interpretive benefit gaiby extracting hierarchical, rather than independecomponents.
ACKNOWLEDGMENTS
The authors thank The Lederman Family Foundatand the Biomedical Engineering Program of Cornell Uversity for partial support of this research. M.A.R.supported by EY7138, N.D.S. is supported by NS021and J.D.V. is supported by EY9314 and EY7977.
APPENDIX
This Appendix provides additional details for two apects of the algorithm we have presented above~see theMethods section!. The description is keyed toMATLAB
code, archived in ‘‘HD–ABME–code.zip’’ at http://www.med.cornell.edu/research/hds, but does not fullyscribe the code itself~see embedded documentatiowithin the code for further details!. Where the code dif-fers slightly from the calculations presented in this eposition such differences are noted. We make no clathat this code is optimized but rather hope that it wprovide a useful starting point for other researchers.
’’
s
ntsd
i-
ion
e
trixfs
c-1
ent
ti-
gtrichethet ofes,the
-ns
he
1148 REPUCCI, SCHIFF, and VICTOR
Determination of the Rotation Angle(un)
This calculation is carried out by the file ‘‘rotation.marchived in ‘‘HD–ABME–code.zip’’ at http://www.med.cornell.edu/research/hds.
For any square matrixA, transformation via a planerotation Ju,v(u) @see Eqs.~6! and ~7!# produces a newsquare matrixJu,v(u)AJu,v
T (u)5A8. This transformationonly alters the elements in both the rows and columnuandv in matrix A—whose elements are denotedai , j . Weare interested in minimizing the sum of squared elemebelow the diagonal inA8—whose elements are denoteai , j8 . Thus we require equations for those elementsai , j8below the diagonal~i.e., such thati . j ! that are alteredby the rotation.@For historical reasons, the code minmizes the sum of squared elementsabovethe diagonal ofA8T, and each rotation induces the transformatJu,v
T (u)ATJu,v(u)5A8T. BecauseJu,v(u) is antisymmet-ric this is fundamentally equivalent to performing throtation as described in Eq.~7!.#
For convenience we can divide the analysis of maentries below the diagonal ofA8 ~above the diagonal oA8T! into five groups of elements~depending on the axevaluesu andv, some of these groups may be empty!. Inwhat follows, sP$u,v%, p5min(u,v), r 5max(u,v), andn,p,q,r ,t: ~1! as,n8 , ~2! at,s8 , ~3! ar ,q8 , ~4! aq,p8 ,and~5! ar ,p8 . Elementary matrix algebra used in conjuntion with trigonometric identities shows that groupsand 2 will never affect the residual valueRHD @see Eq.~7!#. For group 1 elements
au,n82 1av,n82 5@au,n cos~u!1av,n sin~u!#21@av,n cos~u!
2au,n sin~u!#25au,n2 1av,n
2 ,
and similarly for group 2 elements,
at,u821at,v825@at,u cos~u!1at,v sin~u!#21@at,v cos~u!
2at,u sin~u!#25at,u2 1at,v
2 .
In contrast, elements in groups 3 and 4, when prestypically affect the residual valueRHD , as will the group5 element, which is always present.
For groups 3 and 4, we use the trigonometric identies cos2(u)50.510.5 cos(2u), sin2 (u)50.520.5 cos(2u),and 2 sin(u)cos(u)5sin(2u), and find:
ar ,q82 1aq,p82 5@ar ,q cos~u!2ap,q sin~u!#2
1@aq,p cos~u!1aq,r sin~u!#2
5ar ,q2 cos2~u!22ar ,qap,q cos~u!sin~u!
1ap,q2 sin2~u!1aq,p
2 cos2~u!
,
12aq,pap,rcos~u!sin~u!
1aq,r2 sin2~u!
5~ar ,q2 1aq,p
2 !cos2~u!
12~aq,paq,r2ar ,qap,q!cos~u!sin~u!
1~ap,q2 1aq,r
2 !sin2~u!
5~aq,pap,r2ar ,qap,q!sin~2u!
10.5~ar ,q2 1aq,p
2 2ap,q2 2aq,r
2 !cos~2u!
10.5~ar ,q2 1aq,p
2 1ap,q2 1aq,r
2 !.
This equation may be further simplified by combininthe sine and cosine terms into a single trigonomefunction. ~The addition of sine and cosine waves of tsame frequency yields another sinusoidal wave atsame frequency, whose amplitude is the square roothe sum of the squared amplitudes of the original wavand whose phase is the arctangent of the ratio ofamplitudes of the original waves.! Accordingly, we makethe substitutions E sin(f)5(aq,paq,r2ar,qap,q) andE cos(f)50.5(ar ,q
2 1aq,p2 2ap,q
2 2aq,r2 ) which, used in
conjunction with the trigonometric identity sin(a)sin(b)1cos(a)cos(b)5cos(a2b), leads to
ar ,q82 1aq,p82 5E cos~2u2f!10.5~ar ,q2 1aq,p
2 1ap,q2 1aq,r
2 !.~A1!
Additionally, the equation for the group 5 element simplifies in an analogous manner, using the substitutioV sin(c)50.5(ar ,r2ap,p) and V cos(c)50.5(ar ,p1ap,r),and the trigonometric identities above:
ar ,q8 5@ar ,p cos~u!1ar ,r sin~u!#cos~u!
2@ap,p cos~u!1ap,r sin~u!#sin~u!
5ar ,p cos2~u!
1~ar ,r2ap,p!cos~u!sin~u!
2ap,r sin2~u!
50.5~ar ,r2ap,p!sin~2u!
10.5~ar ,p1ap,r !cos~2u!
10.5~ar ,p2ap,r !
5V cos~2u2c!10.5~ar ,p2ap,r !.
Finally, we calculate the square of the equation for tgroup 5 element and are left with
ar ,p82 50.5V2 cos~4u22c!1V~ar ,p2ap,r !cos~2u2c!
10.25~ar ,p2ap,r !210.5V2. ~A2!
omnd
yoein
ts,
,4.nc
3
-
/es
t-
i-
.
n
et-
g:p.
t-ials
ro-
dels.
et-
es.
li-1
by
m-
P.c2,
smlo-9th
-
-Im-
r-am
a-es.
ea-G.
i-p..in
an
inpot
1149Hierarchical Decomposition of Multivariate Time Series
Thus, we must minimize—as a function ofu—anexpression that consists of the above contribution frEq. ~A2! for the group 5 element, and a contributiofrom Eq. ~A1! for each pair of elements in groups 3 an4 that are present~elements in a pair are alwaysbothabsent orboth present!. Elements in groups 1 and 2 mabe neglected, since they contribute a constant that dnot depend onu, as may the constant terms presentEqs. ~A1! and ~A2!.
In our code, matrix entries inA that contribute to thetrigonometric functions governing group 5~see lines 81–89! and, when necessary, groups 3 and 4~see lines 91–104! are collected and combined into matrix constanincluding E andV, f, andc. The trigonometric equationto be minimized is assembled from Eq.~A2! for eachgroup 5 element~see lines 109–112! and, as neededfrom Eq. ~A1! for pairs of elements in groups 3 andWhen elements in groups 3 and 4 are present, the fution to be minimized may be quite large~see lines 125–127!, whereas the minimization is simpler when groupsand 4 are empty~see lines 128–131!. Finally, we use acustomized bisection routine to find the value foru thatminimizes the complete trigonometric function~see sub-function ‘‘trigmin,’’ lines 240–266!.
Creating Well-Spaced Initializing Rotations(Q0)
The initializing rotationsQ0 are determined as follows ~see file ‘‘rotmesh.m’’ archived in‘‘HD –ABME–code.zip’’ at http://www.med.cornell.eduresearch/hds!. For each possible choice of a pair of ax@u,v# we build plane rotations at two rotation anglesu562p/3 ~see lines 56–74!. Thus, for a system consising of P components there ared5P(P21)/2 possiblepairs of axes, and 2d possible initializing rotationsQ0 .We randomly order all possible initializing rotationsQ0
~see lines 76–78! and use one for each parallel minimzation performed@see Eq.~5! and associated text#.
REFERENCES
1Akaike, H. A new look at statistical model identificationIEEE Trans. Autom. Control19:716–723, 1974.
2Bell, A. J., and T. J. Sejnowski. An information-maximizatioapproach to blind separation and blind deconvolution.NeuralComput.7:1129–1159, 1995.
3Bollerslev, T. Generalized autoregressive conditional heroskedasticity.J. Econometr.31:307–327, 1986.
4Borg, I., and P. Groenen. Moder Multidimensional ScalinTheory and Applications. New York: Springer, 1997, p132–134 and 399–402.
5Ding, M., S. L. Bressler, W. Yang, and H. Liang. Shorwindow spectral analysis of cortical event-related potent
s
-
by adaptive multivariate autoregressive modeling: Data pcessing, model validation, and variability assessment.Biol.Cybern.83:35–45, 2000.
6Gersh, W. and J. Yonemoto. Parametric time series mofor multivariate EEG analysis.Comput. Biomed. Res10:113–125, 1977.
7Granger, C. W. J. Investigating causal relations by economric models and cross-spectral methods.Econometrica37:424–438, 1969.
8Kaminski, M. J., and K. J. Blinowska. A new method of thdescription of the information flow in the brain structureBiol. Cybern.65:203–210, 1991.
9Kay, S. M. Modern Spectral Estimation: Theory and Appcation. Englewood Cliffs, NJ: Prentice-Hall, 1988, pp. 14and 142.
10Lee, D. D., and H. S. Seung. Learning the parts of objectsnon-negative matrix factorization.Nature (London)401:788–791, 1999.
11Mirsky, L. An Introduction to Linear Algebra. New York:Dover, 1990, pp. 236–247 and 306–311.
12Mitra, P. P., and B. Pesaran. Analysis of dynamic brain iaging data.Biophys. J.76:691–708, 1999.
13Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B.Flannery. Numerical Recipes in C: The Art of ScientifiComputing. New York: Cambridge University Press, 199pp. 412–413 and 463–466.
14Repucci, M. A., N. D. Schiff, and J. D. Victor. Implicationfor the dynamics of temporal lobe seizures derived fromultivariate autoregressive analysis of electroencephagrams. Proceedings of the Society for Neuroscience 2Annual Meeting, Miami, FL~242.9!, 1999.
15Schiff, N. D., D. R. Labar, and J. D. Victor. Common dynamics in temporal lobe seizures and absence seizures.Neu-roscience (Oxford)91:417–428, 1999.
16Schiff, N. D., J. D. Victor, and A. Canel. Nonlinear autoregressive analysis of the 3/s ictal electroencephalogram:plications for underlying dynamics.Biol. Cybern. 72:527–532, 1995.
17Schiff, N. D., J. D. Victor, A. Canel, and D. R. Labar. Chaacteristic nonlinearities of the 3/s ictal electroencephalogridentified by nonlinear autoregressive analysis.Biol. Cybern.72:519–526, 1995.
18Steriade, M., P. Gloor, R. R. Llina´s, F. H. Lopes da Silva, andM.-M. Mesulam. Report of IFCN committee on basic mechnisms: Basic mechanisms of cerebral rhythmic activitiElectroencephalogr. Clin. Neurophysiol.76:481–508, 1990.
19Victor, J. D., and A. Canel. A relation between the Akaikcriterion and reliability of parameter estimates, with appliction to nonlinear autoregressive modeling of the ictal EEAnn. Biomed. Eng.20:167–180, 1992.
20Wei, W. W. S. Time Series Analysis: Univariate and Multvariate Methods. New York: Addison-Wesley, 1990, 478 p
21Yoshioka, K., F. Matsuda, K. Takakura, Y. Noda, KImakawa, and S. Sakai. Determination of genes involvedthe process of implantation: Application of GeneChip to sc6500 genes.Biochem. Biophys. Res. Commun.272:531–538,2000.
22Yule, G. U. On a method of investigating periodicitiesdisturbed series with special reference to Wolfer’s sunsnumbers.Philos. Trans. R. Soc. London, Ser. A226:267–298,1927.