context related artefact detection in prolonged eeg_imp

Upload: amitjust

Post on 10-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    1/14

    Computer Methods and Programs in Biomedicine 60 (1999) 183196

    Context related artefact detection in prolonged EEGrecordings

    Maarten van de Velde a,*, I. Robert Ghosh b, Pierre J.M. Cluitmans a

    a Eindho6en Uni6ersity of Technology, Medical Electrical Engineering Group, PO Box 513, 5600 MB Eindho6en, The Netherlandb Department of Clinical Neurophysiology, St. Bartholomews Hospital, London, UK

    Received 30 September 1998; received in revised form 12 February 1999; accepted 15 February 1999

    Abstract

    The need for reliable detection of artefacts in raw and processed EEG is widely acknowledged. Although differen

    EEG analysis systems have been described, only few general applicable artefact recognition techniques have emerged

    This paper tackles the problem of artefact detection in seven 24 h EEG recordings in the intensive care unit. ICU

    recordings have received less attention than, e.g. epilepsy monitoring, although recordings in this environment presen

    an interesting application area. The EEG data used here was recorded during the difficult circumstances of an explorativ

    ICU study. The data set includes a diverse set of EEG patterns, as well as EEG artefacts. The study investigates objectiv

    artefact detection methods based on statistical differences between signal parameters, using time-varying autoregressiv

    modelling (AR) and Slope detection. In addition to matching the performance of artefact detection against two huma

    observers, the study focuses on the optimal settings for context incorporation by testing the algorithms for differentime windows and epoch lengths. Results indicate that a relatively short period (2040 s) provides sufficient contex

    information for the methods used. The combined AR and Slope detection parameters yielded good performance

    detecting approximately 90% of the artefacts as indicated by the consensus score of the human observers. 1999 Elsevi

    Science Ireland Ltd. All rights reserved.

    Keywords: EEG; ICU; Artefact detection; Validation; Amplitude analysis; Autoregressive modelling

    www.elsevier.com/locate/cmp

    1. Introduction

    The occurrence of artefacts in the EEG hinders

    the reliable use of automatic analysis techniques.

    Pre-processing by manual screening and marking

    of artefacts is a time-consuming and tedious task

    especially in prolonged recordings, though thviewing of events detected by automation an

    subsequent confirmation of artefact is not s

    onerous for human observers. A major proble

    in computerised processing of the EEG is th

    non-stationary behaviour of the non-artefactu

    signal and the fact that some types of artefac* Corresponding author. Tel.: +31-40-2473288; fax: +31-

    40-2466508.

    0169-2607/99/$ - see front matter 1999 Elsevier Science Ireland Ltd. All rights reserved.

    PII: S 0 1 6 9 - 2 6 0 7 ( 9 9 ) 0 0 0 1 3 - 9

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    2/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196184

    can resemble EEG activity. In addition, a wave-

    form of exactly similar morphology may be cor-

    rectly categorised as artefactual in one record, and

    non-artefactual in another. They must, therefore,

    be assessed in clinical context.

    From a practical point of view, the problem of

    non-stationarity and artefact identification actu-

    ally may lie in the basic differences between hu-man screening and screening by a computer.

    Visual evaluation is usually performed on rela-

    tively long segments of 10 60 s where artefacts

    are observed in relation to the ongoing signal. On

    the other hand, a computerised screening process

    should always be based on EEG features obtained

    from a stationary signal, which requires the use of

    short epochs of only 12 s [1]. Somewhat longer

    stationary epochs may be found when using adap-

    tive segmentation of EEGs, but in general the

    segments are still rather short when compared to

    human screening (e.g. [24]).

    The signals behaviour may be modelled by

    analysing the behaviour of features during seg-

    ment transitions, thus incorporating the temporal

    context of the EEG. We can then apply con-

    straints (rules) to restrict the permitted sequence

    of segments, and identify distinct segments ac-

    cordingly [5,6]. A drawback of these methods is

    the amount of heuristics involved in feature selec-

    tion and the difficulties in composing an optimal

    set of rules [7]. An alternative approach to arte-

    fact detection is the comparison of parameters tothresholds that are derived from statistics of a

    preceding EEG period. For instance, Flooh et al.

    [8] took a short period as referential context,

    using an amplitude threshold calculated as sixfold

    the average amplitude in the preceding 10 s. A

    relatively long context period was used in a study

    by Brunner et al. [9], where the median of spectral

    power was calculated over 3 min for the detection

    of muscle artefact in sleep recordings.

    The present study will further explore the con-

    cept of temporal context in relation to artefactdetection, using two complementary detection

    methods. A time-varying autoregressive (AR)

    model will target EEG-like artefacts, where in

    particular the identification of low frequency arte-

    facts is expected [10,11]. Detection of artefacts in

    the higher frequency range is performed by Slope

    analysis (first derivative), which has been succes

    ful for instance in the detection of muscle artefa[12,13]. Temporal context is modelled for bot

    methods by reference to the EEG period immedately preceding the test-epochs, where detection o

    significant changes is based on statistical princ

    ples for variability tracking (AR) and outlier detection (Slope). Different lengths for the contex

    period are investigated.

    2. Methods

    2.1. Autoregressi6e modelling

    Auto-regressive (AR) modelling of discrete tim

    series consists of computing the coefficients threpresent the correlation of a discrete time serie

    s(n) with the preceding samples at sampling time

    (n1) to (np),

    s(n)=0+ %p

    k=1

    ks(nk)+e(n) (

    where 0 represents the DC component of s(n1,,p are the AR model coefficients, and e(nis the residual error. The order p determines thnumber of unknown variables in the model.

    An optimal solution can be found for a signa

    period of length N by minimising the residuerrors, which can be performed with the ordinar

    least squares (OLS) method [14]. The N equationthat are used in the calculation are first written ivector notation:

    S=0+Z+e (2

    S and e are vectors of N elements,

    0=

    0

    0

    ,

    Z=

    0

    s(1)

    s(N1)

    0

    0

    0

    s(Np)

    , =

    1

    p

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    3/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196 1

    The OLS method consists of minimising the

    quadratic cost function

    J=eTe=e2to through the following set of equations:

    e=SZ0 (3)

    and consequently,

    J= [SZ0]T[SZ0]

    and

    dJ

    d=ZT[SZ0]=0

    The minimum for J is found for

    Z=S0

    resulting in the least squares estimate for the

    model coefficients [14]:

    =(ZTZ)1ZT(S0) (4)

    2.1.1. Optimal model estimation and order

    selection

    An adequate fit of the original signal is char-

    acterised by a residual error term that has the

    statistical properties of an independent white-

    noise process. This can be checked by testing for

    normality [15] using the Shapiro Wilk statistic

    [16]. This test is more powerful than other alter-

    natives, and provides a sensitive measure fornon-normality [17,18]. Minimum power of the

    residual error process is guaranteed by the OLS

    method, and the actual values can be calculated

    from Eq. (3).

    Higher model orders will generally result in

    smaller errors, but apart from the expense of

    increasing computing time, a problem of over-

    fitting exists [19,20]. In order to find a compro-

    mise, Akaikes information criterion (AIC: [21])

    includes the log-likelihood of the normalised er-

    ror while penalising increasing orders p:

    AIC(p)= ln|e2R0+2p

    N(5)

    error variance|e2

    R0 autocorrelation, or power, of s(n)

    length in samples of EEG periodN

    The best model order p minimises the AIC

    for

    15pB3Nwhich is a practical test range [19]. Previous in

    vestigations in EEG reveal optimal model order

    between p=2 and p=15 (e.g. [22,23]). Rel

    tively low orders p=5 or 6 have been reporte

    consistently in various types of EEG [2426].

    2.1.2. EEG 6ariability detection

    Theoretically, the AR coefficients describe th

    EEG signal in each epoch. However, the r

    quirements of independence and normality o

    the error term will most likely not be met durin

    abnormal and artefact periods. The AR mod

    will try to adapt to changes and any distu

    bance, which will be reflected in both the coeffi

    cients and the residuals. Therefore, usin

    coefficients or errors alone is not enough fo

    detection of changes in the EEG [26,27].

    We calculate the multidimensional vector F i i

    every epoch of 1 s, which is assumed to be sta

    tionary [1]. The vector F i consists of the A

    coefficients 1,,p and includes the mean veand standard deviation |err of the residual error

    normalised relative to 1 mV. All vector components are weighted equally in the model. Initi

    explorations in the current data set confirme

    that the models individual components were ap

    proximately of the same order of magnitud

    (varying below 5 mV, normal EEG) (also se

    e.g. [28]).

    Variability tracking is performed on two con

    secutive EEG periods, shifting forward in tim

    designated context window and test windo

    (see Fig. 1). In both periods, the variance of th

    Euclidean distances between F i and their averag

    is calculated. A high variance is expected whe

    artefacts are encountered, for which the statist

    cal significance is examined by comparing tes

    variance to context variance. For two indepen

    dent normal processes, the ratio of variances u

    u12 follows an F-distribution, having N21 an

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    4/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196186

    N11 degrees of freedom for test and context

    period respectively. The one-sided 100(1h) per-

    centage upper-confidence limit is found from a

    standard table for the F-distribution [29]:

    u22

    u125f

    h, N11, N21(6)

    The length of the test window is fixed at N2=

    10 [s], context length is varied for N1=10, 20, 40,

    80, 160 [s], corresponding to f0.01,N11, N21=5.35, 4.81, 4.57, 4.40, and 4.31 respectively.

    This approach incorporates all parameters of the

    AR estimation and tests Eq. (6) at significance

    level h=0.01.

    2.2. Detection of short transients

    Another statistical approach to EEG validation

    is based on the assumption that the occurrence of

    artefacts is reflected in changing statistical prop-

    erties of amplitude parameters. In the current

    study, we used the Slope parameter to target

    short transients. This parameter is simple to im-

    plement, yet very sensitive to high-frequency arte-

    fact (see, e.g. [3032]).

    A straightforward statistical implementation

    has been used here. In each epoch, the maximum

    Slope (1st derivative) is calculated, between all

    pairs of successive samples, resulting in the Slope

    histogram over a context window of epochs.

    The histogram is expected to follow a normaldistribution during normal data conditions [33].

    Now we can set a highamplitude threshold at

    (v+3|) based on the mean (v), and standard

    deviation (|) for outlier detection in the test

    window (see Fig. 2). The confidence interval B

    , v+3|\defines the range in which the

    parameter values are considered normal. In a

    normal distribution, this range encompasses

    99.9% probability of the distribution function,

    therefore promising high specificity (few false de-

    tections).The unit epoch length for processing was cho-

    sen at 1 s, identical to the autoregressive method.

    Apart from accepting this epoch length as sta-

    tionary [1], 1 s is also optimal for the accuracy of

    detection, e.g. as shown in muscle artefact [34].

    The Slope detection process was performed

    analogous to the AR approach: the referenc

    histogram was obtained from the context win

    dow, for context lengths of N=10, 20, 40, 8

    and 160 [s]. For increasing numbers of N, th

    precision of the threshold estimate (v+3|) in

    creases, which should lead to improved hypoth

    sis testing.

    2.3. Data set

    The data used here are EEG registrations a

    measured in a feasibility study in the intensiv

    care unit (ICU) in Kuopio University Hospita

    Finland. The recordings were approved by th

    Medical Ethics Committee; informed assent wa

    obtained from the patients relatives. Five p

    tients (male, age range 1978 years) were in

    cluded in this study; two were monitored twic

    resulting in seven 24 h recordings. This data

    publicly available from the fully annotated da

    library (DL) that was acquired in an interna

    tional collaboration, the IMPROVE DL [35]. Th

    EEG data in the DL presents a wide range o

    patterns, and may be considered reasonably rep

    resentative of EEG recordings in ICU.

    The EEG investigations were restricted to tw

    channels, as only globally representative cerebr

    changes were being assessed; these were C3-P

    and C4-P4 (10-20 system). As a minimal se

    these parasagittal derivations are also known a

    showing the least number of artefacts in a clinicasetting [36]. Standard Ag AgCl type electrod

    were used. Electrode impedance was kept low

    and electrodes were reapplied when checks o

    sustained artefacts suggested deterioration. Th

    input amplitude range was 9200 mV, at a samp

    frequency of 100 Hz, using a 2nd order low-pas

    filter at 25 Hz cut-off frequency. A comprehen

    sive review of procedures and technical details

    given by Thomsen et al. [37].

    2.4. E6aluation

    2.4.1. Visual artefact assessment

    Two experienced human observers were in

    volved in the visual screening of all data, whic

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    5/14

    Fig. 1. Variability tracking: an autoregressive model of order p is fitted every ith epoch, yielding vectors F i that consist of A

    and standard deviation |err of the residual errors (the arrows depict the (p+2) dimensional vectors F i and their average

    distances between F i and C (N) is calculated. This procedure is performed in both the context window and the test win

    significant changes in signal variability.

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    6/14

    Fig. 2. Detection of slope outliers: In every ith epoch the maximum slope is calculated, resulting in a distribution with mean

    in the context window. The threshold of (vS+3|s) is then used in the test window to detect short transients.

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    7/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196 1

    was performed on a high-resolution computer dis-

    play, showing only one channel in pages of 10 s.

    This means that both channels C3 and C4 were

    scored, without bias of the other channel. The

    observers were well trained in artefact assessment

    of clinical EEGs and worked independently

    through the data, browsing through the data page

    by page. Artefact-pages were classified as moder-ate artefact or severe artefact, and scored ac-

    cordingly by a button-push. The evaluation

    included on average 7,500 pages per channel per

    patient, amounting to a total of more than

    100 000 pages.

    Scoring was performed according to the follow-

    ing guidelines:

    No score was given to pages showing a distinctEEG signal, allowing for minor artefacts (i.e.

    very short duration or low amplitude, e.g. mi-

    nor 50 Hz/muscle activity).

    Moderate artefact was scored in signal pagesshowing: artefacts of relatively small ampli-

    tude, total artefact time less than 1 s, or show-

    ing presence of only one or two short

    electrode-pop artefacts.

    Se6ere artefact was assigned to pages otherthan above: large amplitude artefacts, 50 Hz

    interference and muscle activity of larger am-

    plitudes (twice the background amplitude).

    In general, artefact scoring is not a sharply

    defined procedure; indeed, these guidelines were

    designed to allow for some subjectivity while try-ing to capture most of the artefacts. In view of the

    amount of data, the exercise was kept relatively

    simple, while obtaining accurate artefact

    markings.

    2.4.2. Performance measures

    The above procedure and methods allow us to

    evaluate the performance of both observers and

    computer in percentages of time, in brief:

    Sensiti6ity is defined as the percentage of true

    artefacts (true according to the observer) that aremarked correctly by the detection algorithm, indi-

    cating the detection power of the method. Positive

    prediction is the accompanying measure of proba-

    bility that indicates the percentage of automatic

    markings that are considered by the observer(s) to

    be true artefacts.

    A comparable measure is specificity, to asse

    the reliability of leaving unmarked those page

    that do not contain any artefacts i.e. a low fal

    detection rate results in a high specificity.

    Subjective differences in interpretation o

    lengthy phenomena may be magnified in the pe

    formance measures. However, this evaluation

    objective in view of the question what averaglength of EEG context is adequate for detectio

    of artefacts?

    3. Results

    3.1. E6aluation by human obser6ers

    3.1.1. Matching the obser6ers artefact scoring

    The observers marked approximately 1,00

    artefacts (80 min per channel) in each of th

    patients, an average 7% of total recording tim

    Observerc1 was the most critical of the two, an

    scored significantly more artefact pages than ob

    serverc2, especially in recordings 34, 35 and 36

    This is obvious also from the inter-observer com

    parison as depicted in Fig. 3: c2 scored less tha

    60% of the artefacts ofc1 in those recordings

    The agreement score, or consensus, represen

    the sensitivity towards the other observers sco

    ing. Mean consensus was 76%, which includes a

    artefact markings. The differences in artefact asessment were mostly caused by the subjectiv

    interpretation of 50 Hz interference. Some length

    periods of this type of artefact were marked b

    observerc1 as severe (recording 34, 36) o

    moderate (35) and were not marked by ob

    serverc2 because of relatively distinct EEG pa

    terns. When correcting for these periods, th

    agreement score reached well over 80% (corre

    tion not shown in graph). Lengthy periods o

    serious distortion in patient 38, including 50 H

    interference, were marked by both observers.The general agreement of the observers a

    well as the subjective interpretation of signals an

    guidelines is further illustrated by comparin

    the scores for severe artefact periods. The resul

    ing higher overlap of the observers markings

    also indicated in Fig. 3. This effect was largest i

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    8/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196190

    the scores for patient 35. In this patient, only

    2% of the recording was scored as artefact by

    both observers, whereas an extra 2 h of 50 Hz

    interference in channel C3 was marked only by

    observerc1 (adding 4% artefact time).

    The consensus about 6alid EEG periods was

    very high: typically, 95 99% of the unmarked

    periods by one observer were accorded by theother. In part this is also explained by the low

    occurrence of artefacts, relative to the length of

    the recordings. The number of markings that

    did not match was rather small compared to the

    7,500 pages in an average recording.

    3.1.2. Artefacts and patients

    A previous exploration of the data set had

    resulted in an initial classification of artefacts.

    The annotations had been made on a 1 min

    time scale. Artefact occurrence was found to

    consist of: sustained artefacts (71%), brief elec-

    trode artefacts (21%), 50 Hz interference (6%),

    and scalp muscle potentials (2%). The absence

    of eye movement artefacts and the relative

    paucity of scalp EMG potentials reflected the

    chosen electrode derivations and the medication

    or pathologically obtained state of the patients.

    Nursing and medical interventions and patient

    coughing were responsible for 78% of the arte-

    facts. Most artefacts resolved rapidly without at-

    tending to electrodes [38]. Although no dire

    comparison could be performed because of di

    ferent methodology, the observers of the curren

    study acknowledged those earlier findings. Th

    current study focussed on the aspects of tim

    resolution of artefact detection using a highe

    resolution for scoring. Therefore, scores an

    derived measures are necessarily different (alssee Refs [37,39]).

    The patients had been admitted to the ICU

    based on the diagnosis of multiple organ failur

    (definitions in Ref [40]). Recordings 32 and 3

    were of the same patient (age 69), showing

    generally attenuated EEG; the patient eventuall

    died 7 days after the second recording. Record

    ings 33 and 36 were of a cardiac patient (ag

    78), without gross abnormalities in the EEG.

    presumed drug effect resulted in a burst-suppre

    sion (BS) pattern in patient 35 (age 19), whreceived a loading dose of thiopental before th

    recording. His ICU diagnose was status epilep

    ticus, suspected encephalitis, and the EEG gen

    erally showed high-amplitude, irregular pattern

    Neither of the observers scored the BS patter

    as artefact, nor did the automatic method

    Patient 37 (age 39) showed low amplitud

    EEG (diagnosis: meningitis Escherichia co

    hydrocephalus, septic shock). He died 10 day

    Fig. 3. Inter-observer comparison: the consensus or agreement-score for marking of artefacts in the different patients (recordin

    32/34, and 33/36 are the same patient). Consensus was high for severe artefacts.

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    9/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196 1

    Fig. 4. Optimal order estimation for the AR model using the

    Akaike information criterion (AIC).

    were generally in the lower frequency range, an

    were often spread over several pages. Therefor

    the performance measures in terms of time wer

    found to underestimate the true detection powe

    since the variability tracking only detected th

    start of multiple-page artefacts. For instance, th

    variability in a prolonged 50 Hz signal reduces t

    zero. This problem could be solved by an agorithm that halts the context window until th

    artefact is over. However, this was found a rathe

    intricate addition to the current model. Moreove

    this would invalidate the investigation of differen

    context lengths: the period between artefacts o

    ten did not allow reinstating the context model

    Fig. 5 shows the performance for detection o

    artefact onset of the AR method versus the con

    sensus of the observers. The consensus incorpo

    rated all artefacts that were marked by bot

    observers, regardless of being moderate o

    severe. The average sensitivity reached over 50%

    only for context lengths of 10 and 20 s. Th

    positive prediction for these contexts was signifi

    cantly different from the neighbouring serie

    (ANOVA, h=0.01). The increasing overlap fo

    40, 80 and 160 s context was obvious also from

    increasing, high non-significance.

    AR detection of only the severe artefacts wa

    characterised by approximately 20 40% high

    sensitivity values. However, the correspondin

    predictive accuracy was below 10%.

    3.2.2. Slope detection

    Slope detection was very successful in the ICU

    EEG data. The results are indicated in Fig.

    showing detection performance versus the consen

    sus of the observers.

    The sensitivity showed acceptable high values i

    all patients, except in patient 38. In this patien

    the relatively long periods of (consensus) interfe

    ence artefact resulted in a very wide distributio

    of Slope values, causing insensitive threshol

    calculation.Overall, Slope detection performance was no

    different for artefact onset alone: foremost, th

    method detected short duration, transient art

    facts. Average sensitivity was highest when usin

    a 20 s context length (76%), rising to 84% whe

    excluding patient 38.

    after the recording. Patient 38 (age 29) did not

    show any grossly abnormal EEG features.

    3.2. Performance of automatic methods

    3.2.1. Autoregressi6e modelling

    Order selection and model validation. Before

    starting the evaluation of the variability tracking

    method, the autoregressive model was examined

    for optimal order and normality of residual er-

    rors. These analyses were performed in the first 3

    min of every recording, testing both channels.

    Fig. 4 shows the grand-averaged data for the

    AIC, showing a minor, but obvious inclination

    towards AR order 5 as optimal. Therefore, this

    order was used in all subsequent calculations.

    The normality test was performed using a C-translation of Roystons implementation of the

    ShapiroWilk test [41]. Overall, 80% of the error

    series was accepted as statistically normal (h=

    0.05). The normality was lower in patients 35 and

    36 where the recording started with artefact peri-

    ods. This further validates the inclusion of the

    error parameters in the AR variability tracking.

    AR detection results. The detection of statisti-

    cally different EEG pages was performed by test-

    ing the F-statistic as described in the methods

    section. The average variance of the AR-vectorsin the artefactual pages was significantly higher

    than in the unmarked pages, but the method

    proved to be rather insensitive to artefact detec-

    tion in general. Two observations were made: (1)

    the method was most successful for artefacts of

    higher amplitude, and (2) the detected artefacts

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    10/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196192

    The variance of positive prediction increased

    with longer context windows. At the same time

    average prediction decreased: the performance did

    not improve. However, no statistical significance

    was found.

    3.2.3. Combined AR and Slope detection

    The results of the combined methods are givenin Table 1, for a context of 20 s. Selection of 20 s

    context was based on the observations above: an

    AR sensitivity of 51% (at an acceptable 0.3 true

    artefact prediction rate), and highest Slope perfor-

    mance. The detection process was generally char-

    acterised by Slope detection of high frequency

    artefacts and AR detection of lower frequency

    artefacts. We can see that the average sensitivity

    has increased to 89%, which is 5% higher than the

    average indicated in Fig. 6 using Slope detection

    alone. In the individual patients, the AR method

    contributed a 2 10% improvement to detection

    power.

    The specificity of detection was generally very

    high: 9399% of valid EEG pages was left un-

    marked by both the Slope and the AR method.

    4. Discussion

    Signal monitoring in ICU frequently presents a

    good mix of biologic, technologic, and extrinsic

    artefacts [42]. Validation of EEG data acquire

    during such difficult conditions is imperative fo

    automatic analysis and incorporation into routin

    practice [43].

    The current study aimed at detection of a

    artefacts in the EEG subset of the IMPROV

    data, focussing on context resolution. The meth

    ods were based on statistical rules, designed foobjective detection of outlier phenomena in th

    EEG. Two observers were involved in scrutinisin

    the 24 h recordings at a 10 s time resolutio

    Observer 1 scored a total percent artefact time o

    7.7%, observer 2 scored 5.7% as artefact.

    Subjective interpretation is a general problem i

    EEG evaluation studies [44]. For instance, sma

    artefacts in delta frequency range amid a (norma

    background of larger amplitude can be underest

    mated even by experienced observers [11]. Ther

    fore, the consensus score of observers was used ttest the performance of automatic algorithms. Th

    performance measures were defined to reflect th

    percentages of time correct detection.

    In general, the detection was performed high

    specific partly affected by using performanc

    definitions in terms of time, in combination with

    low occurrence of artefacts. We acknowledged i

    retrospect that 6alid EEG periods were sufficientl

    left unmarked by the automatic methods, i.e. im

    plying high specificity.

    Fig. 5. Artefact detection using time-varying autoregressive variability tracking. Ellipses indicate the (v+|) probability-contou

    (mean+standard deviation) for each series. (*) denotes a significant difference in positive prediction for 10 s, 20 s context length

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    11/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196 1

    Fig. 6. Performance of detection for the Slope amplitude method: detection versus context lengths. Ellipses indicate the (v+

    probability-contours (excluding the outlier values of patient 38). No change in performance was observed beyond 80 s conte

    length.

    The results also show that the Slope parameter

    detected most of the artefacts, and indicate that

    long context lengths were not needed for the

    investigated data set. The time-varying autore-

    gressive variability tracking method was only rela-

    tively successful. Nevertheless, when using a

    combination of both methods, AR contributed up

    to 10% sensitivity by detecting low frequency

    artefacts. The overall performance reached 89%

    sensitivity and 53% positive prediction. This latter

    figure implies that approximately half of the auto-

    matic markings do not indicate artefacts. How-ever, positive prediction is somewhat adversely

    influenced because of the consensus data from the

    human observers, which may also have excluded

    some possible true artefacts. In addition, it would

    seem sensible to err towards high sensitivity (at a

    cost of lesser positive prediction); this would al-

    low observers to visually analyse events detected

    by automation, and categorise them as artefact/

    non-artefact. This would be in the knowledge that

    very few artefacts were missed by automation. If

    the aim eventually were to develop event detec-tion as opposed to artefact detection, the posi-

    tive prediction would be greatly increased.

    Based on the current findings, especially the

    EEG-like deviations found by AR variance detec-

    tion may be defined as events rather than arte-

    facts. Therefore, in clinical recordings event

    detection not only includes the artefacts, but als

    may highlight the most interesting parts of th

    recording. As a discriminating method, highe

    AR-variability scores will more likely indica

    (low frequency) artefacts. Interestingly, AR base

    analysis combined with variance testing was als

    used in an early method by Vachon et al. (1978

    [45]. They used an F-ratio of only the erro

    variances, calculated within the residual array o

    the AR model (1 s-epoch, p=5). At significanc

    levels h=0.05 and 0.10, they concluded that th

    detected non-stationary waveforms also needeadditional pattern recognition. The current ap

    proach incorporated all parameters and residua

    of the AR estimation and tested formula Eq. (6

    at significance level h=0.01, while incorporatin

    longer context periods.

    Context related detection was implemented her

    as a history based detection, therefore still diffe

    ent from human screening. Human screening o

    ten also involves going back in the data, whic

    influences decision about the EEG being artefac

    tual or not. In the current implementation, thautomatic methods were designed for objectiv

    on-line processing, testing for statistical signifi

    cance. As an illustration, Figs. 1 and 2 represen

    true data from the current study. Both figur

    indicate automatically detected EEG events i

    the test window that were not marked by th

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    12/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196194

    Table 1

    Artefact detection using a context length of 20 s: slope detection and autoregressive variability tracking combined

    Overall32Patients 383736353433

    9887979479Sensitivity (%) 8989 75

    53Pos. prediction (%) 49 58505761 50 49

    observers, while clearly displaying deviating phe-

    nomena in the EEG.

    Artefacts often occur in more channels simulta-

    neously, therefore a detected event (or candi-

    date artefact) is usually checked visually in all

    channels displayed together. This was also ob-

    served in the current data set, but not incorpo-

    rated in the algorithms or the evaluation.

    Combining channels has been described by vari-

    ous authors (e.g. [4,46,47]), implementing such

    spatial (cross-channel) processing mainly for the

    identification of eye-artefacts using rule-based sys-

    tems. Another recently described system [48] used

    artificial neural networks to pre-process EEG fea-

    tures, and discriminated between (eye-) artefacts,

    muscle artefacts and electrode artefacts in an ad-

    ditional knowledge-based stage. The system cor-

    rectly identified 90% of artefacts in the initial

    evaluation. Unfortunately, the system was not

    evaluated in a large clinical data set, and temporal

    context was not evaluated systematically.

    The current study provides some starting pointsfor choosing the optimal length of the context

    periods in automatic analysis. Optimal context

    was concluded to be as short as 2040 s.

    Acknowledgements

    This project was supported by the Co-operation

    Centre of the Brabant Universities, project 94CH.

    We are also very grateful for the co-operation

    with colleagues from the IMPROVE project: DrP. Prior, Dr C.E. Thomsen and Mr R. Pottinger.

    Appendix A. Nomenclature

    Autoregressi6e model

    discrete time signals(n)

    N length in samples of EEG period

    P order of autoregressive (AR) model

    signal vectorS

    e residual error vector

    S summation

    1,,p AR coefficients

    AR vector (coefficients)

    Z`

    matrix of p times N elements

    quadratic cost vector (of error poweJD

    |e

    2 error amplitude variance

    R0 autocorrelation, or power, of s(n)

    F i vector of 1,,p, mean verr, stan

    dard deviation |err of the residu

    errors

    number of epochs in context, tesN1, N2window respectively

    C (N1), C (N

    2) average ofF i

    u12,u2

    2 variance of F i (euclidian distance t

    C (N1), C (N

    2))

    fh, N11, N21

    significance of the ratio of variance

    u12,u2

    2

    Slope distribution

    minus infinity

    vs mean

    |s standard deviation

    References

    [1] J.A. McEwen, G.B. Anderson, Modeling the stationari

    and gaussianity of spontaneous electroencephalographactivity, IEEE Trans. Biomed. Eng. 22 (1975) 361369

    [2] B.H. Jansen, A. Hasman, R. Lenten, Piecewise EE

    analysis: an objective evaluation, Int. J. Biomed. Compu

    12 (1981) 1727.

    [3] G. Bodenstein, W. Schneider, C.V.D. Malsburg, Compu

    erized EEG pattern classification by adaptive segment

    tion and probability-density-function classification.

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    13/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196 1

    Description of the method, Comput. Biol. Med. 15 (1985)

    297313.

    [4] A. Varri, K. Hirvonen, J. Hasan, P. Loula, V. Hakkinen,

    A computerized analysis system for vigilance studies,

    Comp. Meth. Progr. Biomed. 39 (1992) 113124.

    [5] V. Jagannathan, J.R. Bourne, B.H Jansen, J.W. Ward,

    Artificial intelligence methods in quantitative electroen-

    cephalogram analysis, Comput. Prog. Biomed. 15 (1982)

    249258.

    [6] B.H. Jansen, B.M. Dawant, Knowledge-based approachto sleep EEG analysisA feasibility study, IEEE Trans.

    Biomed. Eng. 36 (1989) 510518.

    [7] B.H. Jansen, Quantitative analysis of electroencephalo-

    grams: is there chaos in the future?, Int. J. Biomed.

    Comput. 27 (1991) 95123.

    [8] E. Flooh, E. Korner, G. Ladurner, H. Lechner, EEG-

    Nachtschlafableitungen: auswertung mittels automatis-

    cher Datenanalyse (EEG-night-sleep-recordings:

    automatic analysis. In German), Z. EEG-EMG 13 (1982)

    157160.

    [9] D.P. Brunner, R.C. Vasko, C.S. Detka, J.P. Monahan,

    C.F. Reynolds III, D.J. Kupfer, Muscle artifacts in the

    sleep EEG: automated detection and effect on all-nightEEG power spectra, J. Sleep Res. 5 (1996) 155164.

    [10] B.H. Jansen, J.R. Bourne, J.W. Ward, Identification and

    labelling of EEG graphic elements using autoregressive

    spectral estimates, Comput. Biol. Med. 12 (1982) 97106.

    [11] J.S. Barlow, Artifact processing (rejection and minimiza-

    tion) in EEG data processing, in: F.H. Lopes da Silva,

    W.H. Storm van Leeuwen (Eds.), Handbook of Elec-

    troencephalography and Clinical Neurophysiology, Re-

    vised edition, Vol. 3B: Applications of Analytical

    Techniques, Elsevier, Amsterdam, 1986, pp. 1562.

    [12] J.S. Barlow, Muscle spike artifact minimization in EEGs

    by time-domain filtering, Electroenceph. Clin. Neuro-

    physiol. 55 (1983) 487491.[13] J.S. Barlow, Automatic elimination of electrode-pop arti-

    facts in EEGs, IEEE Trans. Biomed. Eng. 33 (1986)

    517521.

    [14] V. Strejc, Least squares parameter estimation, Automat-

    ica 16 (1980) 535550.

    [15] D.A. Pierce, Testing normality in autoregressive models,

    Biometrika 72 (1985) 293297.

    [16] S.S. Shapiro, M.B. Wilk, An analysis of variance test for

    normality (complete samples), Biometrika 52 (1965) 591

    611.

    [17] S. Shapiro, M.B. Wilk, H.J. Chen, A comparitive study of

    various tests for normality, Am. Stat. Ass. J. 63 (1968)

    13431372.

    [18] R. Bender, B. Schultz, A. Schultz, I. Pichlmayr, Testing

    the gaussianity of the human EEG during anaesthesia,

    Meth. Inf. Med. 31 (1992) 5659.

    [19] J. Makhoul, Linear prediction: a tutorial review, Proc.

    IEEE 63 (1975) 561580.

    [20] G.E.P. Box, G.M. Jenkins, Time series analysis, forecast-

    ing and control, Revised edition, Holden-Day, London,

    1976.

    [21] H. Akaike, A new look at the statistical model identific

    tion, IEEE Trans. Autom. Control 19 (1974) 716723.

    [22] C.W. Anderson, E.A. Stolz, S. Shamsunder, Multivaria

    autoregressive models for classification of spontaneo

    electroencephalographic signals during mental task

    IEEE Trans. Biomed. Eng. 45 (1998) 277286.

    [23] S. Cerutti, D. Liberati, G. Avanzini, S. Franceschetti,

    Panzica, Classification of the EEG during neurosurger

    Parametric identification and Kalman filtering compare

    J. Biomed. Eng. 8 (1986) 244254.[24] L.H. Zetterberg, Estimation of parameters for a line

    difference equation with application to EEG analys

    Math. Biosciences 5 (1969) 205226.

    [25] B.H. Jansen, J.R. Bourne, J.W. Ward, Autoregressi

    estimation of short segment spectra for computeriz

    EEG analysis, IEEE Trans. Biomed. Eng. 28 (1981) 630

    638.

    [26] J. Pardey, S. Roberts, L. Tarassenko, A review of par

    metric modeling techniques for EEG-analysis, Med. En

    Physics 18 (1996) 211.

    [27] F.D.J. Dunstan, R.W. Marshall, The detection of art

    facts in EEG series, Stat. Med. 10 (1991) 17191731.

    [28] S. Cerutti, D. Liberati, P. Mascellani, Parameter extra

    tion in EEG processing during riskful neurosurgical ope

    ations, Signal Proc. 9 (1985) 2535.

    [29] D.C. Montgomery, G.C. Runger, Applied statistics an

    probability for engineers, Wiley, New York, 1994.

    [30] M. Scherg, Simultaneous recording and separation

    early and middle latency auditory evoked potentials, Ele

    troenceph. Clin. Neurophysiol. 54 (1982) 339341.

    [31] H. Hinrichs, H.J. Heinze, M.R. Gaab, Neurophysiolog

    ches monitoring bei neurochirurgischen gefaoperatione

    spezifische technische anforderungen und deren umse

    zung (Neurophysiological monitoring of neurosurgic

    vessel-operations: technical specification and implement

    tion. In German), Z. EEG-EMG 23 (1992) 195202.

    [32] H. Hinrichs, H. Feistner, H.J. Heinze, A trend-detectioalgorithm for intraoperative EEG monitoring, Med. En

    Physics 18 (1996) 626631.

    [33] P.J.M. Cluitmans, J.W. Jansen, J.E.W. Beneken, Artefa

    detection and removal during auditory evoked potenti

    monitoring, J. Clin. Mon. 9 (1993) 112120.

    [34] M. van de Velde, G. van Erp, P.J.M. Cluitmans, Musc

    artefact detection in the normal human awake EEG

    Electroenceph. Clin. Neurophysiol. 107 (1998) 149158

    [35] I. Korhonen, J. Ojaniemi, K. Nieminen, M. van Gils, A

    Heikela, A. Kari, Building the IMPROVE data Librar

    IEEE Eng. Med. Biol. 16 (1997) 2532.

    [36] B. Schultz, R. Bender, A. Schultz, I. Pichlmayr, Redu

    tion der anzahl von EEG-ableitungen fur ein ro

    tinemaiges monitoring auf der intensivstatio

    (Electroencephalographic monitoring in the ICU R

    duction of the number of recorded channels. In German

    Biomed. Technik 37 (1992) 194199.

    [37] C.E. Thomsen, J. Gade, K. Nieminen, R.M. Langfor

    I.R. Ghosh, K. Jensen, M. van Gils, A. Rosenfalck, P.F

    Prior, S. White, Collecting EEG signals in the IMPROV

    data library, IEEE Eng. Med. Biol. 16 (1997) 3340.

  • 8/8/2019 Context Related Artefact Detection in Prolonged EEG_IMP

    14/14

    M. 6an de Velde et al. /Computer Methods and Programs in Biomedicine 60 (1999) 183 196196

    [38] I.R. Ghosh, P.F. Prior, S.R. White, J. Gade, K. Jensen,

    R.M. Langford, A. Rosenfalck, C.E. Thomsen, Artefact

    assessment in prolonged EEG-polygraphic recordings in

    intensive care, Electroenceph. Clin. Neurophysiol. (In

    press).

    [39] M. van Gils, A. Rosenfalck, S. White, P. Prior, J. Gade,

    L. Senhadji, C.E. Thomsen, I.R. Ghosh, R.M. Langford,

    K. Jensen, Signal processing in prolonged EEG record-

    ings during intensive care, IEEE Eng. Med. Biol. 16

    (1997) 5663.[40] K. Nieminen, R.M. Langford, C.J. Morgan, J. Takala, A.

    Kari, A clinical description of the IMPROVE data li-

    brary, IEEE Eng. Med. Biol. 16 (1997) 2124.

    [41] P. Royston, Shapiro Wilk W test and its significance

    level. Algorithm AS R94, Appl. Stat. 44 (1995) 4.

    [42] D.W. Klass, The continuing challenge of artifacts in the

    EEG, Am. J. EEG Technol. 35 (1995) 239269.

    [43] P. Prior, The rationale and utility of neurophysiological

    investigations in clinical monitoring for brain and spinal

    cord ischaemia during surgery and intensive care, Comp.

    Meth. Prog. Biomed. 51 (1996) 1327.

    [44] G.W. Williams, H.O. Luders, A. Brickner, M. Goormas-

    tic, D.W. Klass, Interobserver variability in EEG inte

    pretation, Neurology 35 (1985) 17141719.

    [45] B. Vachon, B. Dubuisson, D. Samson-Dollfus, Etu

    automatique de lEEG: une methode de detection des no

    stationnarites (Automatic EEG processing: a method f

    detection of non-stationarities. In French), Int. J. Biom

    Comput. 9 (1978) 147162.

    [46] T. Pietila, S. Vapaakoski, U. Nousiainen, A. Varri, H

    Frey, V. Hakkinen, Y. Neuvo, Evaluation of a compute

    ized system for recognition of epileptic activity durinlong-term EEG recording, Electroenceph. Clin. Neur

    physiol. 90 (1994) 438443.

    [47] M. Nakamura, T. Sugi, A. Ikeda, R. Kagigi, H

    Shibasaki, Clinical application of automatic integrati

    interpretation of awake background EEG: quantitati

    interpretation, report making, and detection of artifac

    and reduced vigilance level, Electroenceph. Clin. Neur

    physiol. 98 (1996) 103112.

    [48] J. Wu, E.C. Ifeachor, E.M. Allen, W.K. Wimalaratn

    N.R. Hudson, Intelligent artefact identification in ele

    troencephalography signal processing, IEE Proc. S

    Meas. Technol. 144 (1997) 193201.

    .