predictive factors for outcome in patients having …...table 5.1: characteristics of patients with...

Predictive Factors for Outcome in Patients having Surgery

FOR CERVICAL SPONDYLOTIC MYELOPATHY.

By

Alina Karpova, BSc

A thesis submitted in conformity with the requirements for the degree of Masters of Science

Department of the Institute of Medical Sciences University of Toronto

©Copyright by Alina Karpova, 2011

ii

Predictive factors for outcome in patients having surgery for cervical spondylotic myelopathy.

Master’s of Science, 2011

Alina Karpova Institute of Medical Sciences

University of Toronto

ABSTRACT

PURPOSE: The objective was to determine if particular magnetic resonance,

clinical and demographic findings were associated with functional status prior to surgery

and predictive of functional outcomes at follow-up.

RESULTS: The study included 65 consecutive CSM patients. The modified

Japanese Orthopaedic Association Scale (mJOA) was used as the primary outcome

measure. Higher baseline mJOA scores were associated with younger age, shorter

duration of symptoms, fewer compressed segments and less severe cord compression.

Better post-operative mJOA scores were associated with younger age, shorter duration of

symptoms and higher baseline scores. Using multivariate analysis, baseline and follow-up

mJOA scores adjusted for baseline mjOA score were best predicted by age.

CONCLUSION: Age and clinical severity scores at admission can both provide

valuable information. However, MR imaging features of the spinal cord before surgery

cannot accurately predict the functional prognosis for patients with CSM and hence

alternative imaging approaches may be required.

iii

ACKNOWLEDGMENTS

I would like to acknowledge and thank my mentor, Dr. Michael Fehlings

(Supervisor), and my Program Advisory Committee members, Dr. Aileen Davis, Dr.

Abhaya Kulkarni, and Dr. David Mikulis. I am deeply grateful for the academic

enrichment they were able to provide, as well as their guidance and unwavering support

throughout this project.

I am thankful to have received funding from the Ontario Neurotrauma

Foundation.

I would also like to thank my friends and colleagues who have helped me shape

this project: Dr. David Cadotte, who screened the articles as a second reviewer for

systematic review and gave me an opportunity to write review paper of CSM in elderly

population, Dr. Yuriy Petrenko for the technical support; Amy Lem; Dr. E Massicotte,

Dr. SJ Lewis, Dr. YR Rampersaud, Neurosurgeons at the Toronto Western Hospital,

Ontario, for allowing us to study their patients; Dr. Zvonimir Lubina for the imaging data

he has provided for this study; and Branko Kopjar for his statistical advices.

I would like to dedicate the thesis to my ‘family’, Alexandra, Olga, Nataly, Ira

and Yakov. I am extremely grateful to Roman for his patience, love and unwavering

support over the years. It is because of them that I have had the strength to see the project

through to the end.

iv

TABLE OF CONTENTS ABSTRACT……………………………………………………………………………..i ACKNOWLEDGMENTS……………………………………………………………… ii CHAPTER 1: Introduction 1.1. Problems of predicting outcomes in CSM patients after surgery……………………1 1.2. Importance of investigating predictors of outcomes after surgery…………………1-2 1.3. Magnetic Resonance Imaging (MRI) in CSM population………………………….2-3 CHAPTER 2: Background and literature review Functional outcomes after surgery and their important predictors: Current State of knowledge 2.1. Cervical spondylotic myelopathy (CSM): Definition and clinical presentation……..4 2.2. Epidemiology of CSM……………………………………………………………….4 2.3. CSM treatment……………………………………………………………………….5 2.4. Functional outcome assessments…………………………………………………...5-6 2.5. Predictors of functional outcomes following surgery……………………………..6-8 2.5.1 Age …………………………………………………………………………6 2.5.2 Gender………………………………………………………………………7 2.5.3 Duration of symptoms………………………………………………….....7-8 2.5.4 Baseline severity score………………………………………………………8 2.5.5. MR imaging findings……………………………………………………8-10 2.6. Theoretical framework and definition of the concept…………………………...10-11 Chapter 3: Systematic review Currently available MR imaging based measurements for the explanation of variations among CSM patients: review and critical appraisal 3.1. LITERATURE SEARCH………………………………………………………..12-16 3.1.1 Objective…………………………………………………………………...12 3.1.2 Inclusion criteria………………………………………………………..12-13 3.1.3 Identification of studies and assessment of methodological quality…...13-15 3.1.4 Data extraction …………………………………………………………….16 3.1.4.1 Severity of myelopathy: functional score and recovery percentage 3.1.4.2 MRI predictive factors…………………………………………...16 3.2. RESULTS………………………………………………………………………..16-33 3.2.1 Compression of spinal canal and cord………………………………….20-26 3.2.2 T2 signal changes on MRIs of the spinal cord…………………………26-33 3.3. OVERALL SUMMARY OF THE SYSTEMATIC LITERATURE REVIEW…33-36 3.4. RATIONALE FOR STUDYING CLINICAL AND IMAGING PREDICTORS OF OUTCOME IN CSM…………………………………………....................................36-37 3.5. HYPOTHESIS AND STUDY OBJECTIVES…………………………………..37-38 CHAPTER 4: Material and methods 4. 1. STUDY OBJECTIVES……………………………………………………………39 4. 2. STUDY DESIGN………………………………………………………………….39 4. 3. TARGET POPULATION………………………………………………………39-42

v

4. 4. DEFINITION OF PRIMARY OUTCOME…………………………………….42-43 4. 5. PRIMARY EXPOSURE (INDEPENDENT VARIABLES)…………………...43-51 4.5.1 Strategies to improve accuracy and easy use of exposure variables…...43-45 4.5.2 Definition of primary exposure and clinimetric properties (validity)

of the independent variables …………………………………………………45-48 4.5.2. 1. Age

4.5.2. 2. Gender 4.5.2. 3. Baseline mJOA 4.5.2. 4. Duration of symptoms 4.5.2. e. Degree of compression (Anteroposterior diameter and Transverse

Area) 4.5.2. f. Signal intensity changes 4.5.2. g. Number of affected stenotic levels

4. 6. CONFOUNDING VARIABLES……………………………….………………….48 4. 7. SAMPLE SIZE ……………………………………………………………………49 4. 8. DATA ANALYSIS …………………………………………………………….49-52

4.8.1. Exploratory analysis…………………………………………...49-50 4.8.1.1. Univariable (unadjusted) analysis……………………….50 4.8.2. Model development …………………………………………...50-51

4.8.3. Data sources and management ………………………………..51-52 4.8.4. Ethics ……………………………………………………………..52

Chapter 5: Results 5. 1. DESCRIPTIVE STATISTICS……………………………………………………..54 5. 2. MODEL DEVELOPMENT ……………………………………………………54-57 5. 2. 1. Improving the validity of the predictive model………………………54-57 5. 2. 2. Univariate (unadjusted) analysis …………………………………….56-57 5. 2. 2. 1. mJOA Scores at baseline 5. 2. 2. 2. mJOA Scores at follow up 5. 2. 3. Multivariate (adjusted) analysis ……………………………………........57 5. 2. 3. 1. mJOA Scores at baseline 5. 2. 3. 2. mJOA Scores at follow-up Chapter 6: Discussion ………………………………………………………………58-67 6.1. Summary of findings…………………………………………………………….58-62 6.2. Implications of findings …………………………………………………………62-63 6.3. Limitations ………………………………………………………………………63-64 6.4. Future directions ………………………………………………………………...64-67 6.5. Conclusion…………………………………………………………………………..67 Chapter 7 Reference List …………………………………………………………..68-72 Chapter 8 Appendices…………………………………………………………………….

vi

LIST OF TABLES

CHAPTER 3

Table 3.1: Presents criteria in a modified version of quality assessment checklist

Table 3.2: Presents the summary of methodological limitations in CSM predictive studies

Table 3.3: Study design, sample size, type of outcome measures and quality rating

Table 3.4: Data extracted included MRI characteristics (signal intensity, spinal cord

compression and spinal canal compromise)

Table 3.5: Data extracted included those predictive factors for which the strength of

association with short and long term outcomes in patients with cervical myelopathy, were

reported

Table 3.6: Results – previous predictive models

Table 3.7: List of MR imaging features as potential predictors of recovery percentage and functional scores after surgery

CHAPTER 4

Table 4.1: The modified Japanese Orthopaedic Association (mJOA) scaling for

functional classification for CSM

Table 4.2: Definition of exposure variables

Table 4.3: Standard parameters for cervical spine T1- and T2-weighted Magnetic

Resonance Image (MRI) used in our study

CHAPTER 5

Table 5.1: Characteristics of Patients with Cervical Spondylotic Myelopathy

Table 5.2: Performance of the mJOA in CSM sample

Table 5.3: Correlation matrix and coefficients between functional outcomes and

independent variables

Table 5.4: Correlation matrix and coefficients between functional outcomes and spinal

cord compression as a potential predictor

Table 5.5: Unadjusted beta value estimates for independent variables (univariate analysis)

Table 5.6: Statistical details of full models (multivariate analysis)

vii

LIST OF FIGURES

CHAPTER 3

Figure 3.1: Flow diagram of inclusion and exclusion criteria of systematic reviews.

CHAPTER 4

Figure 4.1: Flow diagram of the study population.

CHAPTER 5

Figure 5.1: MR imaging measures for spinal cord compression

Figure 5.2: T1-weighted image of the sagittal view revealing hypointensity in the spinal

cord and T2-weighted image of the sagittal view showing hyperintensity in the spinal

cord before surgery (arrow) obtained from clinic spine at the Toronto Western Hospital.

Figure 5.3: Focal compression and multiple level of compression

Figure 5.4: Distribution of baseline mJOA scores

Figure 5.5: Distribution of post-operative mJOA scores at 12 months

LIST OF APPENDICES

Appendix 1 Search strategy

Appendix 2 Research Ethics Board Approval at University Health Network

Appendix 3 Reliability project

Appendix 4 Grade of recommendation: Levels of Evidence Table (2002)

viii

LIST OF ABBREVIATIONS

CSM Cervical Spondylotic Myelopathy

OPLL Ossification of the Posterior Longitudinal Ligament

HD Herniated Disc

TA Transverse Area

AP Anteroposterior Diameter

CR Compression Ratio

MSCC Maximum Spinal Cord Compression

MCC Maximum Canal Compromise

mJOA Modified version of Orthopaedic Association

SCI Spinal Cord Injury

CI Confidence Interval

SE Standard Error

ICC Intraclass correlation coefficient

1

CHAPTER 1

INTRODUCTION

Predictors of outcome following surgery have been a significant part of cervical

spondylotic myelopathy (CSM) research over the past 20 years. With the development of

new therapies and interventions and varied natural history of CSM, there is a need for

reliable predictors to optimize the timing of surgical intervention in order to maximize

functional recovery.

1.1. Problems of predicting outcomes in CSM patients after surgery

There is no consensus as to the optimal ways to assess the clinical (eg. advanced

age, prolonged duration of symptoms, etc) and MR imaging features (eg. spinal cord

compression, signal intensity changes, and number of compressed spinal cord segments)

used in research and clinical practice. Furthermore, very few studies have attempted to

build a predictive, multidimensional model of functional outcomes after surgery that

would combine age, gender, duration of symptoms, baseline scores, degrees of

compression, signal intensity changes and number of compressed segments together.

1.2. Importance of investigating predictors of outcomes after surgery

A prediction model would be useful in identifying individuals who are most likely

to experience improvement after surgery by determining an expected magnitude of

predictive factors effect. This knowledge could allow individualized decisions regarding

the use of different surgical approaches for the treatment of elderly individuals with

2

CSM. Therefore, a predictive model could guide the application of such strategies in high

risk groups and would potentially optimize functional recovery of CSM patients.

1.3. Magnetic Resonance Imaging (MRI) in CSM population

The application of MR imaging to the spinal cord has become increasingly

attractive due to the ability of MRI to reflect the amount of spinal cord compression,

reflect the pathological changes within the cord, measure space within the spinal canal,

detect bony pathology, and show suspected lesions of the soft tissues in and around the

vertebral column in a multiplanar display (eg. midsagittal, axial, etc). There is also no

risk of radiation exposure and the procedure is non-invasive [1]. It is also a standard of

care for both diagnosis and preoperative planning of patients with suspected CSM.

However, there is variability in the literature as to the value of MR imaging as a predictor

of functional outcomes after surgery.

The overall objective of this project, therefore, was to develop a predictive model

of functional outcomes incorporating key demographic, clinical and MR imaging

assessments in patients with CSM undergoing surgical treatment. The primary goal of

this model is to help treating physicians and spine surgeons to identify individuals who

are most likely to experience better outcome after surgery. Furthermore, a predictive

model could help to improve the specificity of inclusion criteria for future clinical trials,

detecting the potential benefit of surgical interventions on selected homogeneous groups

of CSM. The study was organized into three stages; each stage being a portion of the

entire study and answering questions that contribute to the overall result, the CSM

predictive model of functional outcomes.

3

The thesis is organized into six chapters and an appendix. The objectives of

Chapter 2 and Chapter 3 were to establish components of domains for a predictive

model. More specifically, Chapter 2 summarizes current knowledge of available

variables other than MR imaging that are potentially predictive of functional outcomes

following surgery. Chapter 3 consisted primarily of a literature review and the

subsequent critical appraisal and summary of current evidence to determine the pre-

testing power of the MR imaging domain. Chapter 4 includes details of the methods

used to develop an objective predictive model. The results are detailed in Chapter 5.

Finally, Chapter 6 states the conclusions of the study together with a discussion about

the implications and limitations as future research direction. The thesis ends with the

appendix which includes extra exhibits of data that may assist in understanding certain

aspects. Appendix 3 is essential in the body of the thesis for further validation of spinal

cord compression methods.

4

CHAPTER 2

BACKGROUND AND LITERATURE REVIEW

Functional outcomes after surgery and their important predictors: current

state of knowledge

This chapter outlines the background related to cervical spondylotic myelopathy

(CSM) as well as outcomes and their measurements. The epidemiology of CSM, surgical

treatments, outcomes after surgery, current measurement approaches related to CSM

severity along with predictors of those outcomes, are reviewed and summarized in this

section.

2.1. Cervical spondylotic myelopathy (CSM): Definition and clinical presentation

Cervical spondylotic myelopathy (CSM) can be broadly defined as symptomatic

dysfunction of the cervical spinal cord caused by degenerative changes of the bony and

ligamentous spine [2]. CSM can occur in all adults due to cord compression resulting

from one of several different factors: degenerative disc disease (or spondylosis); frank

disc protrusion; or OPLL. Symptoms of CSM include: neck stiffness; unilateral or

bilateral deep, aching neck, arm and shoulder pain; stiffness or clumsiness while walking;

hand dysfunction; motor weakness; and numbness and bowel/bladder dysfunction.

Symptoms may range in severity from mildly uncomfortable to completely disabling.

2.2. Epidemiology of CSM

Although the prevalence of CSM is still unknown, it is the most common form of

spinal cord dysfunction in patients and the most common underlying cause of traumatic

5

SCI in individuals older than 55 years of age [3]. It is a major cause of disability in the

adult population [4-6].

2.3. CSM treatment

Faced with a patient with limited function and MR imaging evidence of cervical

spinal cord compression, decompressive surgery is a practical treatment option. In most

cases, patients are informed that surgery is unlikely to improve their functional outcomes

but rather is aimed at halting the progression of their disease. There is however, emerging

evidence that most patients make robust and functional improvements following surgery

[7, 8]. There is some evidence that demographic factors, clinical history of CSM and MR

imaging evidence can explain outcomes after surgery, but at present it is difficult to

predict an individual patient’s response to surgery.

2.4. Functional outcome assessments

Determining functional status and independence after surgery in CSM patients has

become a primary area of research because of the impact of CSM has on health related

quality of life as well as the financial burden of this condition on society and individuals.

However, gain in function after surgery has been documented for individuals with CSM

[7, 8]. Function in the CSM population is often reported by means of postoperative

functional scores, absolute or relative changes in scores and rate of recovery. It is often

measured using functional measurement tools as: the original Japanese Orthopaedic

Association Scale (JOA), the modified version of JOA (mJOA), Nurick score, 30-meter

timed up walk test, and the Neurosurgical Cervical Spine (NCS) Score. The Japanese

Orthopaedic Association (JOA) is a qualitative tool to measure functional disability. The

6

scale ranges from 0-17 with higher scores indicating better function [9]. The inter- and

intraobserver reliability of original JOA scale have been shown to be high (Yonenobu K

et al 2001). To establish the percent of recovery, the following formula was proposed by

Hirabayashi et al [9] recovery rate (%) = ([postoperative JOA score – preoperative JOA

score] / (17 - preoperative JOA score]) X 100%. The term recovery rate does not imply

the actual rate of recovery but rather extent of recovery (percentage). For simplicity, the

term recovery rate will be used to describe the percent of recovery throughout the

manuscript. The modified JOA (mJOA) scale, which is the current so-called standard

functional measure in CSM population [10], was revised to account for cultural

differences in western populations (Table 4.1). The domains include upper extremity

function (5 points), lower extremity function (7 points), sensory function (3 points),

urinary bladder function (3 points). The scale ranges from 0-18 with higher scores

indicating better function [11]. Similarly, the recovery after surgery was evaluated using

the formula proposed by Hirabayashi et al [9]: recovery rate (%) = ([postoperative mJOA

score – preoperative mJOA score] / (18 - preoperative mJOA score]) X 100%. The

Neurosurgical Cervical Spine Score (NCS) is also a functional measure to quantify gain

in recovery in the following manner: recovery rate (%) = ([postoperative score –

preoperative score] / (14 - preoperative score]) X 100% [12].

2.5. Predictors of functional outcomes following surgery

2.5.1. Age

Conflicting results regarding the treatment of cervical myelopathy in geriatric

populations have been reported previously [13-17]. Some studies have found an

7

association between age and functional score obtained at long term follow-up (greater

than 6 months after surgery) [18, 19], while others did not [15]. Based on univariate

analysis, Nagata et al have found an association between age and functional score

obtained from 12 months to 4.5 years (mean follow up of 1.5 years); Yamazaki et al,

however showed that age did not affect functional scores obtained from 12 to 90 months

after surgery (mean follow up of 40 months). The inconsistencies could be due to

variable definitions of older and younger groups. After adjustment for other confounding

variables, Morio et al showed that age is a reliable predictor of functional score obtained

from 6 months to 10 years (mean, 3.4 years) after decompression of the spinal cord in

CSM patients.

2.5.2. Gender

Gender is rarely highlighted by predictive studies as a potential predictor of

outcome, failing to find such an association. Those that do, tend to show that women

have a better outcome than men [20].

2.5.3. Duration of symptoms

The literature contains conflicting results with regards to the duration of CSM

symptoms and post-operative functional scores. Based on univariate analysis, several

authors have found an association between duration of symptoms and functional score, as

assessed using the JOA scale and obtained at long term follow up (greater than 6 months)

[15, 19, 21]; Fukushima et al, however, showed that the duration of symptoms did not

affect functional score after surgery [22]. After adjustment for other important variables,

Morio et al found that the duration of symptoms is a significant predictor of functional

8

score obtained from 6 months to 10 years after surgery (mean follow up, 3.4 years). We

suspect that inconsistencies could have resulted from the manner in which the authors

quantified the duration of symptoms and how ‘long versus short’ was defined.

2.5.4. Baseline severity score

The literature contains conflicting results regarding whether patients with initial

poor functional scores gain less or greater benefit from surgery [19, 23]. Based on results

of univariate analysis, Singh et al reported that patients with lower starting points make

the most gains in function, as assessed by walking tests, 3 months after surgery.

However, Morio et al identified a positive correlation between baseline score and

functional score assessed by JOA scale. After adjustment for other confounding

variables, Morio et al found that severity score at admission is a reliable predictor of

functional score obtained from 6 months to 10 years (mean, 3.4 years) after

decompression of the spinal cord. These inconsistencies could be due to variability in

measures of functional scores and follow up times.

2.5.5. MR imaging findings

A number of authors [8, 10, 14, 18, 19, 21, 22, 24-28] have reported that varying

patterns of signal intensity changes on T1-/T2WI, degrees of spinal cord compression and

multiplicity of spinal cord segments being compressed are good predictors of functional

outcome after surgical decompression. Some authors [8, 14, 15, 19, 27-34], on the other

hand, have reported no clear correlation between the surgical outcome and MRIs. The

lack of statistically significant predictors of functional outcome, may be attributed to a

9

broad spectrum of compressive pathologies and therefore a broad spectrum of spinal cord

recuperative potentials, the MR imaging variables (intensity signal changes, spinal cord

compression, and number of compressed segments) on T1-/T2WI studied may not be

good predictors of outcome following surgery in patients with CSM. Several authors [14,

24, 25] considered intramedullary high SI on T2-weighted MR images is a predictor of

good recovery and low SI on T1-weighted MR images is a predictor of poor recovery; the

authors [8, 14, 15, 28, 29, 31, 32] thought they did not affect outcomes after surgery. The

diverse conclusions were presented due to unknown nature of histopathological

representation of intramedullary low SI on T1 and high T2-weighted images. Based on

the literature, many authors have considered that intramedullary high SI might represent a

variety of histological changes, including edema, ischemia, demyelination, gliosis,

microcavities, and cavities [35-40]. It is widely believed that a greater degree of spinal

cord compression increases the chances that the tissue damage is irreversible despite

surgical decompression and therefore leads to poor recovery. But, the value of

morphological plasticity obtained by quantifying surface area of spinal cord on T2W

imaging in the evaluation of spinal cord function and its relationship with outcomes of

surgery has been questioned. More specifically, no consensus was established on a

critical point beyond at which functional recovery becomes irreversible [22].

10

SUMMARY:

In summary, age, duration of symptoms and baseline severity score are

consistently associated with functional scores following surgery. Therefore, it would be

essential to adjust for these variables in the comparison of functional scores across

varying MR imaging features. The MR imaging features are variables of interest which

will be addressed in the subsequent chapter via systematic review. This stage consists of

a literature review for the development of a predictive model, and the subsequent critical

appraisal and summary of current evidence to determine the pre-testing content of the

MR imaging domain, followed by overall hypothesis and specific aims.

2.6. Theoretical framework and definition of the concept

We investigated the combination of these factors and their contribution to

predicting outcome using the all-variable model approach complemented with clinical

judgement and statistical importance (beta estimates were provided) in a well-controlled

prospective cohort. The functional outcomes after surgery were measured through the

following domains: 1) demographic factors, 2) baseline severity score, 3) duration of

symptoms, and 4) MR imaging features.

Demographic factors associated with lower recovery rate and/or lower

performance scores after surgical intervention include: advanced age in male population.

Barriers to optimum recovery are exacerbated due to greater neural tissue damage caused

by greater range of motion in narrower canal and higher C level involvement [41], as well

as diminished spontaneous plasticity with age [42]. The association with gender maybe

explained by the differences in the mechanical loading/muscle compressive forces

11

promoting new bone growth [43]. These factors may lead to decreased likelihood of

maximising recovery and functional performance after surgery.

Longer duration of symptoms is associated with the functional score and recovery

rate at admission and follow-up [10, 13, 19, 27, 32, 44, 45] because long-standing

mechanical compression causing additional circulatory impairment of the spinal cord

[46]. This factor leads to increased likelihood of poor recovery after surgery.

Higher baseline scores at admission are associated with greater improvement in

scores and recovery rate after surgical intervention [10, 19, 47]. The greater benefit from

surgery observed in groups of patients who are less functionally disabled could be due to

less severe neuropathologic alterations (eg.edema, ischemia) in the spinal cord that could

reflect greater recuperative potential.

A greater degree of spinal cord compression increases the chances that the tissue

damage is irreversible despite surgical decompression and therefore leads to poor

recovery. The intrinsic signal changes on low T1- and high T2 together predict a poor

functional recovery following surgery in comparison to the absence of these findings

[19]. These imaging findings likely represent long-standing and ongoing damage to the

neural elements of the spinal cord and the corresponding white matter tracts.

Furthermore, complex injuries that are contiguous over several spinal segments may

interfere with optimum recovery because such an injury is associated with profound

changes in the grey matter and significant changes in the posterior and anterior columns

which may result in more severe functional, electrophysiological and histological

deterioration [48].

12

CHAPTER 3

Currently available MR imaging based measurements to assess the

spinal cord in the setting of cervical spondylotic myelopathy:

review and critical appraisal

3.1. LITERATURE SEARCH

3.1.1 Objective

The goal of this systematic review was to establish which MR imaging features

can predict outcomes following surgery, including functional disability score and

recovery percentage. Moreover, the level of evidence and quality of methodology were

examined in each study. MR imaging based measurements included transverse area of

spinal cord, compression ratio of spinal cord, anteroposterior diameter of spinal cord and

scoring systems to quantify the degree of spinal canal and cord compression,

absence/presence of changes in T2SI, degrees of signal intensity changes, multisegmental

area of signal intensity changes, signal intensity ratio, and T1WI/T2WI signal intensity

change patterns.

3.1.2 Inclusion criteria

Articles were included if they satisfied the following criteria: a minimum sample

of 25 aged 18 and older with symptomatic CSM who underwent surgical treatment and

followed up post-surgically; detailed description of MRI features; outcomes of interest

were functional scores and recovery percentage; study design was not limited to any

particular methodology. Studies that included subjects with spinal cord compression due

13

to trauma or mass lesions, acute spinal cord injuries, assessed by kinematic MR imaging,

diffusion-weighted MR imaging, cine-phase contrast MR imaging, perfusion-weighted

MR imaging or phase-contrast MR imaging were excluded. No review papers were

included in the study.

Figure 3.1: Flow diagram of inclusion and exclusion criteria of systematic reviews.

Potential relevant publications identified and

screened for retrieval (n=6890)

Papers evaluated against the inclusion and

exclusion criteria on the basis of title/abstract/

after review of the full article (n=112)

Included studies for prediction (n=22)

3.1.3 Identification of studies and assessment of methodological quality

Papers examining the predictive value of MR imaging features were identified

through searches of Medline, Embase, and Pubmed, January, 1980 – November, 2008. Of

14

112 publications identified initially, 22 articles fulfilled the inclusion and exclusion

criteria, and constituted the basis for this review. Search terms (and MeSH headings:

“magnetic resonance imaging”, “predict”, “prognosis”, “cervical spondylotic

myelopathy”, “spinal canal”, “spinal cord compression”, “cohort studies”) used to

identify the study population included cervical spondylotic myelopathy, spinal cord

compression, spinal canal compromise, cervical myelopathy and central cord syndrome.

For complete search strategy, please refer to Appendix 1. Articles were initially screened

on the basis of title, abstract and the reference lists from 22 articles; full text copies were

then examined to ensure that studies met all inclusion criteria. All relevant papers were

evaluated for validity of evidence using a checklist for assessment of methodological

quality, specifically designed for the predictive studies. The Cochrane guideline for

assessment of the non-randomized studies was revised, making it more specific to the

predictive nature of reviewed studies and the medical condition of interest (CSM) [34]. It

included items such as sample representation, blinding, baseline comparability, follow-

up, validity and reliability of primary outcome measure and predictors. Table 3.1

presents criteria description for included studies in a modified version of quality

assessment checklist designed by the Cochrane collaboration group et al (2007) [49].

Table 3.2 presents the summary of methodological limitations in CSM predictive studies.

In addition, both reviewers assessed the quality of methodology and determined the level

of evidence according to Sackett et al 2000 (Appendix 4). Differences in rating the

quality of articles were resolved by consensus of two raters. The definite conclusions

were drawn based on the presence of at least two studies which provided similar findings

using comparable length of follow ups and outcome measures. The representativeness of

15

sample, blinding and baseline comparability played a significant role in drawing these

final conclusions.

The following are the criteria description for including studies, using a modified version

of the quality assessment checklist designed by the Cochrane collaboration group et al

2007 (Table 3.1)*.

Representative sample

• Source description

• Referral pattern

• Patients’ characteristics

• Sample size

Blinding

• Blinded assessor

Baseline comparability

• Compared baseline performance of clinical status

• Compared baseline performance of other known predictive variables

Follow-up

• Complete

• Comparison of drop outs

• Reasons of drop outs

Psychometric properties of primary outcome measurement

• Validity

• Reliability

Accuracy of MR imaging measurements

• Definition

• Reliability

* Ryan R, Hill S, Broclain D, Horey D, Oliver S, Prictor M; Cochrane Consumers and

Communication Review Group. Study Quality Guide. March 2007.

www.latrobe.edu.au/cochrane/resources.html (July 2009).

16

3.1.4 Data extraction

3.1.4.1 Severity of myelopathy: functional score and recovery percentage

Table 3.3 includes study design, sample size, type of outcome measures and level

of evidence. These data were selected to provide a description of the cohort and show the

level of evidence. Table 3.4 includes categories of MRI features (signal intensity

changes, spinal cord compression and spinal canal compromise). These data were

selected to provide a description of a spectrum of MRI features used to evaluate severity

of myelopathy in cohorts. The percent recovery and/or post-operative score as outcomes

of interest after surgery were extracted according to the definition used in the individual

articles. This was generally evaluated using the recovery percentage formulae proposed

by Hirabayashi for JOA scoring system [9], its modified version (mJOA) [11], and for the

NCS scoring system [12]. The difference between the initial and final assessment scores

was reported for studies using the Nurick [50] and the walking test [24]. Table 3.5

includes data describing potential predictors of functional scores and recovery percentage

for which the strength of association with short term (at or less than 3 months/ 3-6

months) and long term follow-up (6-12 or greater than 12 months) in patients with

cervical myelopathy was reported. The data extracted in Table 3.6 were preoperative

variables that have been shown to be significantly associated with post-operative scores

and percent of recovery after surgery. No pooling of the results of 22 eligible studies was

completed due to heterogeneous study designs.

3.2. Results

Multiple studies have identified preoperative MR imaging variables that are

associated with functional scores and percent of recovery after surgery in CSM

17

population (N=17) (Table 3.5); however, only a few studies have tested these variables,

adjusting for age, duration of symptoms and baseline severity score (N=5). These studies

are summarized in Table 3.6.

I. Samples

The median sample size was 73. Few studies had a representative sample of the

study population, 5 (23%) studies reported consecutive sequence of referrals, and 8

(36%) studies adequately displayed their sources of selected patients. Few studies had

attempted to create a homogeneous cohort groups, 11(48%) studies included exclusion

and inclusion criteria, 12 (52%) showed a well-described point in clinical course of

disease, and 11 (48%) had inclusion of wide spectrum of patients at various stages,

severities and subtypes of cervical myelopathy.

II. Measurement of MR imaging features and functional outcomes

Although 16 (73%) studies had adequately specified MR imaging features and

outcome criteria (eg. a detailed MRI protocol is provided, including type of plane and

thickness of slices), the description of steps for its measurement techniques was limited.

Few studies reported MR imaging features that were measured blindly to the presence of

neurologic impairments 6 (27%), or examined the reliability of their measurement

instruments 4 (18%). 17 (77%) and 4 (18%) studies used JOA and mJOA scales,

respectively, as measures to assess functional outcomes. Among the remaining 2 studies,

1 (2.5%) used the Nurick scale and 1(2.5%) used the walking test were used also to

assess functional outcomes.

18

III. Loss of participants to follow up

3 (13%) studies had dropouts which comprised more than 20% of participants,

with no reasons listed for patients being lost to follow-up, and no available demographic

and clinical characteristics of the patients who were lost to compare them to the patients

in whom follow-up was complete. Therefore, it was impossible to investigate the effect

of lost patients on the validity of study. 19 out of 22 studies reported post-operative

functional scores or recovery percentage (%) from at least one follow-up time point, the

remaining 3 studies did not report on time of follow up [27, 31, 33].

19

3.1.4.2 MRI predictive factors

For Table 3.5, potential MR imaging predictors associated with post-operative functional

scores collected follow up, as well as with percent of recovery were extracted from all

cohort groups.

Table 3.7: List of MR imaging features as potential predictors of recovery percentage

and functional scores after surgery

Predictor/ MR imaging features

Outcomes

COMPRESSION OF SPINAL CANAL AND CORD

Transverse area Functional score Recovery percentage Compression ratio ------ Recovery percentage Anteroposterior diameter Functional score Recovery percentage Spinal canal and cord deformities on sagittal T1-/T2-weighted MRI

------ Recovery percentage

Cord compression on sagittal and axial T1-weighted MRI


Cord compression on axial T1-weighted MRI


Spinal canal and cord deformities on axial T1-/T2WI

Functional score ------

ISCHEMIC CHANGES OF SPINAL CORD

Absence/presence of signal changes on T2WI

Functional score Recovery percentage

Degree of intensity signal changes on T1/T2WI


Area of signal intensity changes on T1/T2WI

Functional score Recovery percentage

Intensity ratio of signal changes on T1/T2WI


Patterns of signal intensity changes on T1/T2WI


20

3.2.1 Compression of spinal canal and cord

Transverse area

Summary

The transverse area at the site of maximum compression was measured to study the

morphological changes of spinal cord [19, 22, 27]. The results of our systematic review

suggest that transverse area of spinal cord is associated with recovery percentage and

functional score at long term follow up (greater than 6 months) before and after

adjustment for other important confounding variables.

Recovery percentage

Fukushima et al (Level I) reported significant differences in recovery percentage

from 6 to 48 months after surgery (mean follow up of 17 months) in groups with a spinal

cord area of less than 0.45 cm2(p<0.01), reflecting the irreversible pathology of spinal

cord [22]. Similarly, Okada et al (Level IIc) showed an association for each individual

etiology group, OPLL (ossification of the posterior longitudinal ligament) (r=0.678,

p<0.01) and CSM (cervical spondylotic myelopathy) (r=0.586, p<0.01). After adjusting

for disease etiology, duration of symptoms, and signal intensity changes, the investigators

showed that the preoperative cross-sectional area of the spinal cord is an independent

predictor of recovery percentage in CSM patients (no follow up time was reported).

Morio et al (Level IIc) confirmed previous findings showing a mild, insignificant

association between recovery percentage at follow up (6 months to 10 years, mean follow

up of 3.4 years) and preoperative cross-sectional area of spinal cord (r=0.243, p =

0.0517). All three studies evaluated the percentage of recovery using the formula

proposed by Hirabayashi based on the original formulae of the JOA scale.

21

Functional score

Transverse area was consistently shown to be associated with postoperative JOA

scores in two prior studies before statistical adjustment for other important confounding

variables. Fukushima et al 1991 (Level I) reported significant association between

transverse area and post-operative JOA scores (r=0.298, p<0.05) [22]. Similarly, Morio

et al 2001 (Level IIc) reported significant association between post-JOA scores and

preoperative spinal cord surface (r=0.398, p = 0.0015) [19]. After adjustment for age,

duration of symptoms and preoperative scores, Morio et al showed that the preoperative

surface area of the spinal cord is not associated with the functional score of CSM

patients.

Compression ratio

Summary

The compression ratio was consistently defined as a ratio of sagittal and transverse

diameters on T1-weighted axial imaging [10, 27, 51]. Our systematic review identified

three original articles where compression ratio was studied as an MR imaging feature to

quantify the severity of spinal cord compression. The results of our systematic review

suggest that compression ratio is not associated with recovery percentage at long term

follow up (greater than 6 months).

Recovery percentage

In studies by Chen et al and Okada et al, the compression ratio measurements

were reported to have no associations with recovery percentage evaluated using the

formula proposed by Hirabayashi based on the mJOA and JOA scores, respectively [10,

22

27]. Okada et al 1993 (Level IV) reported that the recovery percentage was not

significantly associated with compression ratio irrespective of etiology (OPLL/, CSM or

CDH (cervical herniated disc)) (follow up period was not reported) [27]. Chen et al

(Level IV) reported similar findings with a report of insignificant association between

preoperative compression ratio and recovery percentage at 6 months follow up after

surgery (r=0.026, p=0.836) [10]. In contrast, Chung et al (Level IIc) showed that

compression ratio is associated with recovery percentage calculated from the JOA scores

from 24 to 84 months after surgery (mean follow up of 42 months), where patients were

divided into two groups according to the recovery percentage – a ‘good’ group (n=19),

and a ‘fair’ group (n=18) . The results showed that patients in the good group showed a

greater compression ratio (p<0.05) [28]. In all likelihood the inconsistent findings

reported by Chung et al is due to variable outcomes.

Anteroposterior diameter (AP diameter)

Summary

Whether AP diameter on the preoperative axial T1 image can be used to predict recovery

percentage and functional disability score after surgery evaluated by JOA scale, remains

inconclusive due to limited number of studies available in the literature and poor

methodology used to support these findings.

Recovery percentage & Functional score

In our systematic review, only one previous study by Yone et al (Level IV) (45

OPLL, 64 CSM, 31 healthy patients) compared morphological spinal cord changes,

functional score and recovery percentage (follow up period was not reported) [31]. No

23

relationship was found between AP diameter and post-operative functional scores. In

contrast, the author found significant association between recovery percentage, evaluated

by the Hirabayashi’s formula based on the JOA scores, and preoperative minimum AP

diameter among OPLL but not CSM patients. Although the authors reported the

differences in recovery percentages between two pathologies, no statistical analysis and

mean of JOA scores were documented.

Scoring systems to interpret spinal cord compression

Summary

The results of our systematic review suggest inconclusive findings as to whether the

severity of spinal cord and canal deformities (severity of deformity described based on

scoring systems) are associated with recovery percentage calculated based on JOA scores

in three prior studies. Recovery percentages across the cohorts were highly variable, in all

likelihood these inconsistencies are due to the number and variety of measures used to

interpret severity of spinal cord and canal compression on MRI. Similarly, whether MR

imaging severity scoring systems of CSM can be used to predict functional score in CSM

patients after surgery also remains inconclusive due to variable follow-up periods and

outcome measures used to quantify functional scores.

Recovery percentage

A. Spinal canal and cord deformities on sagittal T1-/T2-weighted MRI

Kasai et al (Level IIc; 128 CSM patients) retrospectively studied a new method of

evaluating the cumulative severity of stenosis captured on preoperative T1- and T2-

24

weighted sagittal images and recovery percentage at from 12 months to 9.7 years after

surgery (mean follow up of 4.8 years) [52]. The authors used a six grade scale to classify

the severity of spinal cord compression, describing severity in terms of anterior/posterior

space and cord compressions (Table 3.4 (II)). The recovery percentage was calculated

based on the JOA scores obtained at long term follow-up and correlated with the

preoperative MRI cumulative score. As a result, the authors found a significant negative

correlation between the defined MR imaging findings and recovery percentage (r= -

0.436, p<0.01).

B. Spinal cord compression on sagittal and axial T1-weighted MRI

Nagata et al (Level IV; 300 CSM patients) prospectively compared preoperative

MRI and recovery percentage calculated based on the JOA scores collected at an average

follow up of 19 months [53]. The morphological changes of the spinal cord were

stratified into four categories of preoperative cord compression on sagittal T1-weighted

MRIs: Class 0, no compression; Class 1, slight cord compression; Class 2, cord width

decreased by less than 1/3; Class 3, cord width decreased by at least 1/3. As a result,

they found that the degree of spinal cord compression on sagittal T1-weighted MRI was

not significantly correlated with the severity of myelopathy (no p or r values were

reported).

C. Spinal cord compression on axial T1-weighted MRI

Matsuyama et al (Level IV; 44 OPLL patients) compared recovery percentage,

calculated based on JOA scores obtained at 1 month follow up, across categories. These

categories are described cross-sectional spinal cord configurations of three shapes:

boomerang, teardrop, and triangular with the following means of percentage of recovery:

25

61.8%, 72.1% and 23%, respectively (no p or r values were reported) [54].

Functional score

D. Spinal cord compression on axial T1-/T2WI

One study by Singh et al (Level I; 69 CSM patients) found no significant

association between spinal cord compression and post-operative walking scores obtained

at 3 months follow up (r=0.07, p=0.60)[24]. Based on the pattern of spinal cord

compression on T2-weighted MR images obtained at baseline, the authors classified all

patients into four categories: none (0), mild (1; flattening or concavity of the anterior

surface only), moderate (2; <50% reduction in maximal sagittal diameter), severe (3;

>50% reduction in maximal sagittal diameter).

In addition to studying recovery percentage, Matsuyama et al examined the

relationship between morphological characteristics of spinal cord deformities on the

preoperative MR image and functional score assessed by JOA score obtained at 1 month

following surgery [54]. Although the mean functional disability scores were reported in

three groups (triangular cord configurations, A=31.8 mm2, post-JOA = 11.6; teardrop

cord, A=39.0 mm2, post-JOA = 15.2; boomerang cord, A= 35.4 mm2, post-JOA = 14.2),

no direct comparisons were reported.

One study by Nagata et al (Level IIc; 74 CSM, 52 CDH, 49 OPLL patients)

retrospectively studied morphological and functional scores obtained from 12 months to

4.5 years (mean follow up of 1.5 years) after surgery in elderly patients [18]. The

morphological changes of spinal cord were stratified into four categories of preoperative

cord compression on sagittal T1-weighted MRIs: Class 0, no compression; Class 1, cord

compressed slightly; Class 2, cord width decreased by less than 1/3; Class 3, cord width

26

decreased by at least 1/3. The author reported that their patients performed better with

lesser cord distortions at baseline. The validity of these findings is difficult to judge due

to poor description of performance (no report of mean, SE, p and r values).

The inconsistencies in observations obtained at short term follow up could be due

to variable measures used to assess post-operative functional scores. The findings

obtained at long term remains inconclusive.

E. Spinal canal compromise and cord compression on axial T1-/T2WI

One study by Uchida et al (Level IIc; 135 CSM/OPLL patients) retrospectively

studied morphological and functional scores assessed by JOA scale obtained from 12

months to 12.8 years (mean follow up of 8.3 years) after surgery [47]. The percentage

rate of flattening and canal narrowing were used to estimate the morphological changes

of spinal canal and cord on sagittal T1-weighted MRI. The authors reported that the

better functional scores of OPLL, but not CSM patients, are associated with the

preoperative spinal canal narrowing by ossification of <40% and an extent of cervical

cord flattening of ≥50%. The validity of findings is difficult to judge due to poor

description of performance (no report of mean, SE, p and r values).

3.2.2 T2 signal changes on MRIs of the spinal cord

The evaluation of signal intensity changes is intended to assess the secondary damage to

the spinal cord of patients with CSM.

Absence/presence of signal changes on T2WI

27

Summary

In our systematic review, signal changes on T2WI has the greatest number of

publications among all MR imaging features used for prediction of functional outcomes

following surgery in the setting of CSM. High T2 signal intensity changes on MRWI

were not associated with recovery percentage after surgery at long term follow up

(greater than 6 months); less conclusive findings are found at short term follow up. In

contrast, high T2 signal intensity changes found on preoperative mid-sagittal MR image

were associated with functional score after surgery at long term follow up (greater than 6

months); less conclusive findings are found at short term follow up.

Recovery percentage

In case series, Mizuno et al (Level IV; 82 CSM, 62 OPLL patients) found

significant difference in recovery percentage, calculated from JOA scores from 3 to 6

months after surgery (mean follow up of 3.7 months), in SEA (snake-eye appearance,

characterized as nearly symmetrical round high signal intensity of the spinal parenchyma

resembling the face of a snake; 32.2 ±15.1%) compared to the NSEA (no snake-eye

appearance) group (47.1 ±12.1%) (p<0.001) [55]. In contrast, Wada et al (Level IIc) [29]

showed the absence of a relationship between signal changes and recovery percentage

obtained at 1.5 months. The inconsistencies in findings by Wada et al 1995 could be due

to variable view dimensions (axial versus sagittal) and approaches used to describe signal

intensity changes (‘yes/no’ versus ‘snake eye/non-snake eye’ appearances).

Inconsistencies in recovery percentages could be due to selection bias, given that both

studies had not given sufficient details on source and methods of patient enrolment, as

well as description of key patients characteristics on CSM severity groups, presence of

28

other co-morbidities, inclusion/exclusion criteria, etc. It is important to note that the study

by Wada et al had analyzed recovery percentages in groups with comparable JOA scores

at admission.

Several authors [8, 15, 28, 31] found that high T2 signal is not associated with

recovery percentage, Yukawa et al (Level IV; 142 CSM/OPLL/CDH/calcification of the

yellow ligaments patients) found that the recovery percentage, calculated from JOA

scores obtained from 12 to 90 months after surgery (mean follow up of 40 months), was

significantly different between groups of patients with and without signal changes on

sagittal T2W MRIs (p=0.033, r-value was not reported) [56]. Given the important role of

age and duration of symptoms in affecting outcomes after surgery, studies by Houten et

al 2003 & Yamazaki et al 2002 ensured similar baseline characteristics of patients in

comparison groups.

Functional score

Wada et al (Level IIc) [29] showed no association between signal changes and

functional score assessed by JOA scale obtained at 1.5 months follow-up. A study by

Singh et al (Level I) showed that it has association with functional score assessed by

walking scale at 3 months after surgery [24]. The authors concluded that CSM patients

with higher severity scores at admission and T2 signal showed more change in functional

score. Because clinical severity at baseline was not statistically adjusted between

comparison groups, it is difficult to conclude that high T2 signal changes alone are

independently associated with functional scores after surgery (p=0.0011). However,

Wada et al compared groups with comparable baseline severity. The findings remain

inconclusive due to variable measures used to measure function in CSM patients.

29

Yukawa et al (Level IV; p=0.0012)[56], Papadopolous et al (Level IV;

p<0.001)[57] and Matsuda et al (Level IV; p<0.05) [14] showed consistent association of

signal intensity changes on sagittal T2-weighted MR images and functional scores

assessed by JOA scale at long term follow up (greater than 6 months after surgery);

Houten et al (Level II) [8] found that there was no significant difference across

comparison groups at long term follow-up. Given Houten et al and Yukawa et al had

comparable patients’ characteristics at admission except severity score and Papadopolius

et al had comparable baseline severity but no control for other important predictors, these

differences could be contributing to the inconsistencies seen in the results.

Degree of signal intensity changes on T1-/T2-WI

Summary

In the CSM population, the impact of altered degrees of signal intensity changes on

recovery percentage was documented in three previous studies that were captured in our

systematic review [10, 47, 56]. The findings suggest that the assessment of CSM severity

based on the degree of signal intensity changes on preoperative T2 WI is useful as a

predictor of recovery percentage at long term follows up after surgery, calculated based

on JOA scores. The findings were consistent before and after adjustment for important

confounding variables.

Recovery percentage

Yukawa et al (Level IV; 142 CSM/OPLL/CDH/calcification of the yellow

ligaments patients) and Chen et al (Level IV; 64 CSM patients) showed an association

between highly intense and well-defined border of signal intensity area and poor recovery

30

percentage obtained at long term follow up (p-value= -0.018 and p<0.001, respectively).

To study the effect of this variable on functional recovery percentage after controlling for

other important confounding variables such as age, sex, preoperative JOA score, cervical

curvature, and cord compression ratio, Chen et al confirmed these associations [10].

Although Uchida et al (Level IIc; 135 CSM/OPLL patients) reported no significant

association, the observed differences could be attributed to different MR imaging

classifications used to assess the degree of signal intensity changes [47].

Multisegmental area of signal intensity changes on T1-/T2WI

Summary

The results of our systematic review suggest that a multisegmental area of signal intensity

changes is associated with recovery percentage and functional score irrespective of scale

(original or modified version of JOA) at long term but not at short term of follow up after

surgery. Further research is needed to replicate the findings documented above

controlling for other important confounders including baseline severity score,

demographics and duration of symptoms.

Recovery percentage

Three studies reported consistent findings on the differences in recovery

percentage, measured by the original or modified version of JOA scales at long term

follow up, in groups of patients with focal and multisegmental areas of high signal

changes found on the preoperative MR imaging [32, 57, 58]. While Wada et al (Level IIc;

85 CSM patients), Fernandez et al (Level I; 12 CDH, 55 OPLL patients) and

Papadopolous et al (Level IV; 42 CSM patients), reported a significant relationship

31

between multisegmental area of high signal intensity on preoperative MRIs and poor

recovery percentage at long term follow up (greater than 6 months) (p<0.05; p<0.001, &

p=0.001, respectively). Another study by Wada et al (Level IIc; 31 CSM patients)

reported no significance in recovery percentage obtained at 1.5 months across

comparison groups. It would seem likely that the inconsistencies in these findings by

were due to the differences in follow up times [29].

Functional score

Based on the results of our systematic review, the relationship between area of T2

intensity signal change and functional score after surgery was examined in three studies.

While two studies found a significant association of area of T2 intensity signal and

functional score assessed by mJOA and JOA scales after surgery at long term follow-up

(longer than 6 months), respectively (Mastronardi et al (Level I) & Wada et al (Level

IIc)), Wada et al 1995 (Level IIc) found no statistical difference in post-operative JOA

scores obtained at 1.5 months between patients with multisegmental areas of high MRI

intensity (13.4±1.1) and ones with focal areas (13.5±2.0%). It would seem likely that the

inconsistencies seen in the reported findings by Wada et al were due to the differences in

follow up at which functional score after surgery were measured.

Intensity ratio of signal changes on T1- and T2WI

Summary

It remains inconclusive whether signal intensity ratio is associated with recovery

percentage due to limited information reported on timing of follow up.

Recovery percentage

32

Our systematic review identified only one original article where ratio of signal

intensity changes on T1/T2WI was examined for its relationship with recovery

percentage after surgery [27]. Okada et al (Level IV) defined signal-intensity ratio as

sagittal T2-weighted MRI cord signal at maximally compressed levels divided by the

comparable readings at contiguous non-compressed sites. Okada et al (23 had OPLL, 34

had CSM, 17 CDH patients) showed the significance of the relationship between

recovery percentage and the mean preoperative intensity ratio at baseline, in particular in

groups of myelopathy due to OPLL and CSM, r=0.537 (p<0.01) and r=0.426 (p<0.01),

respectively. The recovery percentages were illustrated in all thee groups (OPLL, RR

(54.7 ±17.7%); CSM, recovery percentage (52.21 ±5.9%); CDH, recovery percentage

(78.3 ±19.1%), however the CDH group had a significantly higher recovery percentage

(p<0.01) (no follow up time was reported).

Patterns of signal intensity changes on T1-/T2WI

Summary

The assessment of CSM severity based on the combination of signal intensity changes on

both T1WI and T2WI shows promise as a potential predictor of functional scores

obtained at long term follow up after surgery.

Recovery percentage

In our systematic review, there is only one prior study that examined the role of

sagittal T1-/T2WI signal intensity change patterns as an independent predictor of

functional recovery percentage [19]. The study used the following patterns of spinal cord

signal intensity changes on T1-/T2WI to stratify patients into comparable groups: normal/

33

normal (N/N), normal/high-signal intensity changes (N/Hi), and low signal/high-signal

intensity changes (Lo/Hi). Morio et al (Level IIc; 42 CSM, 31 OPLL, 9 CDH patients)

retrospectively compared recovery percentage obtained between 6 months and 10 years

(mean, 3.4 years) after surgery assessed by JOA across different patterns of spinal cord

signal intensity changes. The authors showed a statistically significant difference in N/Hi

(48.0± 24.9%) groups and Lo/Hi (19.1± 22.8%), respectively (p=0.0259). Using stepwise

multiple regression, the best model for prediction of recovery percentage included

preoperative signal pattern combined with clinical features such as age and duration of

symptoms (adjusted r2 = 0.297; p =0.0002). More research is needed to replicate this

finding in a prospective cohort study.

3.3. OVERALL SUMMARY OF THE SYSTEMATIC LITERATURE R EVIEW

Our systematic review identified 22 observational studies that examined

relationship of 9 MR imaging measures as predictors of functional score and recovery

percentage. These included transverse area of spinal cord, compression ratio of spinal

cord, anteroposterior diameter of spinal cord, severity scoring systems to interpret the

degree of spinal cord compression and/or canal compromise, presence of high T2SI,

degrees of signal intensity changes, multisegmental area of signal changes, signal

intensity ratio, and T1WI/T2WI signal intensity change patterns, which were reported in

original articles of level-4, level-2b or level-1 evidence. The associations were studied

based on subgroups of measures of functional outcomes, follow up periods, and

adjustment for age, duration of symptoms and baseline severity score.

34

MR imaging predictors of functional scores & recovery percentage at short term

follow up

No MR imaging features were found to be associated with functional recovery

percentage at short term follow up. However, the multisegmental (linear) high intensity

areas on T2-weighted MR image were associated with recovery percentage at 1.5 months

of follow up. The relationship between anteroposterior diameter of spinal cord,

classifications of severity of spinal cord and canal deformities, and high T2 signal

intensity changes and recovery percentage at short term remains inconclusive.

MR imaging predictors of functional recovery percentage at long term follow ups

The degree of signal intensity changes and transverse area of the spinal cord were

found to be associated with recovery percentage at long term follow up (greater than 6

months). These data suggest that as the degree of spinal cord compression increases, the

chances that the tissue damage is more likely to be irreversible despite surgical

decompression and therefore leads to poor recovery. For both MR imaging features, the

findings were consistent before and after adjustment for age, duration of symptoms, and

baseline severity score. In contrast, high T2 signal intensity change and compression ratio

were consistently not associated with recovery percentage at long term follow up. The

relationship of anteroposterior diameter, severity scoring systems, signal intensity ratio

and recovery percentage at long term remains inconclusive.

35

MR imaging predictors of functional scores at long term follow-up

Using univariate analysis, transverse area of the spinal cord, high T2 signal

changes, multisegmental area of signal change and combined T1WI/T2WI signal

intensity changes were found to be associated with functional scores at long term follow

up (greater than 6 months). After adjustment for age, duration of symptoms and baseline

severity score, transverse area of spinal cord and combined T1WI/T2WI signal intensity

changes patterns remained significantly associated with functional scores. Further

research is needed to evaluate the role of high T2 signal changes and functional scores,

adjusting for other important variables. The relationship of anteroposterior diameter,

severity scoring systems, signal intensity ratio and recovery percentage at long term

remains inconclusive.

Although transverse area of the spinal cord and combined T1WI/T2WI signal

intensity changes are consistently shown to be significantly associated with functional

scores at long term follow up, it must be noted that there are some methodological

limitations to the data that caution against definite interpretation. First, there are the

limitations associated with the insufficient information provided about sources and

methods of patient recruitment, about all key patients’ characteristics including degree of

CSM severity, co-morbidity, inclusion/exclusion criteria, age and sex. In this case, the

possibility of selection and measurement bias cannot be ruled out, which may have

distorted the true differences between comparison groups. Second, no reliability testing

has been undertaken regarding the method of using transverse area of the spinal cord to

ensure its consistency. Further studies are required to explore the role of MR imaging

36

variables in prediction of functional score on the basis of methodological standards

appropriate for good quality observational studies.

3.4. RATIONALE FOR STUDYING CLINICAL AND IMAGING PR EDICTORS

OF OUTCOME IN CSM

Predicting the extent of functional gain is important for many reasons: it provides

information to patients about surgery related risks; it can be used among clinicians to

guide therapeutic decisions; it can provide better allocation of services; and it may be

useful in designing clinical trials to test the effect of certain interventions on outcomes.

Age, duration of symptoms and baseline severity score are consistently associated

with functional scores following surgery. Therefore, it would be essential to adjust for

these variables in the comparison of functional scores across varying MR imaging

features. In addition, while it is clear that age, duration of symptoms and baseline severity

score are associated with functional outcomes in patients with CSM, it is less clear

whether they are reliable predictors of functional outcomes following surgery.

It has been suggested previously that MR imaging can be predictive of function

after surgery. However, these assessments fall short of providing clinicians with key

information in CSM because they are either qualitative, or studied` in a quantitative way

in less methodologically vigorous studies. The rationale for studying MR imaging

predictors of functional outcomes is supported by several factors. Few studies have

reported an extensive evaluation of predictors of outcomes after surgery, reporting the

beta estimates and therefore establishing the strength of the relationships of individual

variables and outcomes. The majority of studies simply report associations without

37

controlling for other important confounding variables such as age, duration of symptoms

and baseline score. The reported magnitude of strength and the statistical significance

therefore is of limited clinical utility. Furthermore, few report using the mJOA scale, the

current standard outcome measure of functional disability in CSM population [10]. The

mJOA scale was modified from the JOA scale to allow for cultural differences in western

populations. The majority of MR imaging predictors described in these studies were also

associated with recovery rate and not the mean of post-operative functional scores after

surgery. The question still remains whether the above mentioned MR imaging parameters

are predictive of patients’ functional score after surgery. There is also a lack of

availability of information on inter-rater reliability of MR imaging measurements or

stability of measurements. The last factor is due to the lack of standardized MR imaging

protocols and clinical assessments collected concurrently in a prospective cohort sample.

3.5. HYPOTHESIS AND STUDY OBJECTIVES

OVERALL STUDY OBJECTIVE:

To develop a predictive model of functional outcome incorporating key demographic,

clinical and MR imaging assessments in patients with cervical spondylotic myelopathy

undergoing surgical treatment.

Hypothesis: Key demographic parameters, clinical factors and MR imaging features of

the site of cervical cord compression are independently associated with baseline scores

and predictive of functional outcomes scores at 12 months follow up in patients with

CSM undergoing surgical treatment.

38

Each specific aim contributes to the overall objective:

Specific Aim I: Reliability assessment of MR imaging to assess cord compression in

CSM (Appendix 3).

Objective: To investigate the inter-rater reliability of two published methods

(transverse area and anteroposterior diameter) of examining cord stenosis on axial

MR images.

Question: Are the ICC values of transverse area and anteroposterior

diameter of spinal cord methods free of systematic errors (bias)?

Specific Aim II: Development of a predictive model of outcome in patients with CSM

undergoing surgical treatment

Objective: To address the limitations of the current literature by prospectively

evaluating if demographic, clinical and radiological factors in patients with CSM

are predictive of functional outcomes pre- and post- surgery.

Questions: After controlling for age, gender and duration of symptoms,

MRI is independently associated and predictive of functional outcomes at

baseline and 12 months follow-up, respectively.

39

CHAPTER 4

MATERIAL AND METHODS

4.1. STUDY OBJECTIVES

Chapter 4 provides details of study methodologies designed to answer two research

questions related to Specific Aims I & II.

4. 2. STUDY DESIGN

A total of 85 CSM patients who were consecutively referred to the spine clinic at

the Toronto Western Hospital (an academic tertiary care institution affiliated with the

University of Toronto) from February 2006 to November 2007 were prospectively

recruited for this study. This project is based on analysis of a single centre which is part

of a larger multicentre AOSpine North America CSM Trial; n=283 cases.

The proposed research is based on secondary analysis of an existing data set

housed in a research database. The primary aim of the present study was to compare the

clinical and radiological outcomes, functional status, disease specific, and general health

related quality of life between patients managed with anterious vs. posterior approaches

using the Nurick Score, mJOA score, MR and plain radiographs, Neck Disability Index,

30 meter walk test and the SF-36 at baseline, 6, 12 and 24 months following surgery.

Data entry was validated (e.g., logic checks including range checks, missing value

checks) both by visual inspection and built-in database programming during the data

entry process. The subject’s electronic study file was not considered complete until

mandatory data fields were completed. The central study database was monitored by an

40

external representative and queries were made on a regular basis to ensure the quality and

integrity of the data.

4. 3. TARGET POPULATION

This study included all consecutive CSM patients who referred to a single spine

centre of the Toronto Western Hospital from February 2006 to November 2007. A total

of 99 patients with CSM, surgically treated per standard of care. Surgeons used their

expertise and preferences to determine the method of surgical intervention. An individual

or combinations of techniques were possible including anterior cervical decompression

and fusion, laminoplasty, and laminosplasty and fusion. 20 out of 85 patients were

excluded who were unable to have MRI (e.g., pacemaker) and had CT/myelography

instead. After excluding 4 patients who were lost to follow up, 61 subjects (follow-up

percentage: 94%) were analyzed for prediction of functional outcomes (please see

summary characteristics of the study population in Table 5.1).

The patients had a clinical diagnosis of cervical myelopathy confirmed with

characteristic findings on MRI consistent with CSM. CSM was defined as a constellation

of symptoms and signs supported by appropriate radiological findings, including

symptoms (numb clumsy hands, impairment of gait, bilateral arm parasthesia,

L'Hermitte's phenomena) and signs (corticospinal distribution motor deficits, atrophy of

hand intrinsic muscles, hyperflexia, positive Hoffman sign, upgoing plantar responses,

lower limb spasticity, broad based unstable gait) [1]. Any associated conditions such as

cardiovascular disease, angina/coronary artery disease, congestive heart failure,

arrhythmia and hypertension and diabetes, were not considered to be exclusion criteria.

Eligible patients were identified by the treating spine neurosurgeons during the initial

41

examination in spine clinics at Toronto Western Hospital. The pathologic conditions were

cervical spondylotic myelopathy, cervical ossification of the posterior longitudinal

ligament, soft disc herniation, hypertropic ligamentum flavum and subluxation.

A flow diagram of the study population is shown in Figure 4.1.

*Not Eligible for the one or more reasons listed below:

- Asymptomatic cervical cord compression

- Previous surgery for CSM

- Active infection

- Neoplastic disease

Total CSM patients treated surgically

from February 2006 to November 2007

N=99

N=85 eligible subjects*

N=61 analyzed sample

(N=4 were lost to follow up,

N=20 had CT scans/myelography)

42

- Rheumatoid arthritis

- Ankylosing spondylitis

- Trauma

- Concomitant symptomatic lumbar spinal stenosis

- Not referred for surgical consultation

- Pregnant women or women planning to get pregnant during the study period

- History of substance abuse

- Incarceration

- Currently involved in a study with similar purpose

- Has a disease process that would preclude accurate evaluation (e.g. neuromuscular

disease, significant psychiatric disease)

- Patients seen by other services

- Age <18 years

- Unable and not willing to give consent to participate in study

- Not willing and not able to participate in the study follow up according to the protocol

- Does not understand and cannot read English at elementary level

4. 4. DEFINITION OF THE PRIMARY OUTCOME

Although the most important outcome of decompression surgery for stenosis is

resolution of symptoms, it is the ability to regain normal function in activities of daily

living that has become of a great importance. The functional disability scale allows us to

better understand the expectations of surgical treatment for the CSM. The modified

version of Japanese Orthopaedic Association (mJOA), functional disability scale was

43

used for classification of CSM severity through assessment of upper extremity function

(5 points), lower extremity function (7 points), sensory function (3 points), urinary

bladder function (3 points). The scale ranges from 0-18 with higher scores indicating

better function (Table 4.1) [11]. The mJOA was used as the primary outcome measure to

quantify function pre-surgery and at 12 months follow-up. The 12-month time frame was

chosen because it represents a typical time period of optimum recovery for CSM. A

Cronbach alpha of 0.66 and 0.65 for preoperative and postoperative JOA scores

respectively, has been reported for internal consistency. The preoperative and

postoperative JOA scores (original scale) also correlate with other measures of

Myelopathy Disability Index (MDI), European Myelopathy Score (EMS), Ranawat and

Nurick, with Pearson product-moment correlation coefficients ranging from 0.47 to 0.62

and 0.42 to 0.72 as expected [23].

We chose mJOA scale as an outcome measure instead of its original version

because currently it is a so-called standard outcome measure of functional disability in

the CSM population. It is disease specific and it was modified from the JOA scale to

allow for cultural differences in western populations.

4. 5. PRIMARY EXPOSURE (INDEPENDENT VARIABLES)

4.5.1 Strategies to improve accuracy and easy use of exposure variables

Because our aim is to develop a predictive model that can be used in research and

clinical practices, several steps were necessary. First, the model includes demographics,

clinical and MR imaging characteristics (eg. age, gender, duration of symptoms, baseline

mJOA score, intensity signal changes on T1WI and T2WI, degree of spinal cord

44

compression and number of compressed segments). These variables have been shown to

be promising in predicting the post-operative functional scores in other studies [18, 19,

22, 24]. Second, continuous variables were dichotomized for practical purposes; the cut-

off values were determined based on earlier investigations. Third, in all patients, MRI

was performed within 8 weeks prior to surgery using a 1.5 Tesla General Electric unit

and a standardized imaging protocol in the majority of cases (please see MRI protocol in

Table 4.3), to minimize measurement errors and increase observers’ reliability. Fourth, a

radiologist with 10 years of experience (Zvonimir Ivan Lubina, M.D., Clinic of

Traumatology in Zagreb) analyzed the MR images obtained from all 65 patients without

knowledge of the patient’s clinical and neurological status, and the clinical assessors

were also blinded to the imaging results, to avoid observer bias. Fifth, as a minimal

requirement for a valid tool, we quantitatively examined the degree of agreement across

different raters for the same patient (inter-rater reliability) for two published methods of

examining spinal cord compression using a systematic approach with a magnified

software based tools, written instructions and consistent interpretations. Based on results,

we identified a list of reliable measures and matched to the ones available in CSM trial

database. Transverse area (TA) was chosen over anteroposterior (AP) diameter of spinal

cord as measure of spinal cord compression due wide applicability to both symmetrical

and asymmetrical cases despite lower ICC (interclass correlation coefficient) value

(please refer to Appendix 3 for further details). We did not examine the reliability index

for the pathological changes within spinal cord, the classification of combined patterns of

T1-/T2-WI intensity signal changes, since earlier studies reported this method moderately

reliable with the concordance correlation coefficient between two observers on single

45

occasion is 0.62 (k=0.37; p=0.0063) and predictive of functional outcomes (kappa=0.37;

p=0.0063) [19].

4.5.2 Definition of primary exposure and psychometric properties (validity and

reliability) of the independent variables

Primary exposure was defined as variables that are known prior to the time of

surgery (preoperative) and may independently predict the primary outcome. These

variables constitute the theoretical framework (see above: Table 3.6). The following

described characteristics of CSM patients below were collected at the time of diagnosis

and clinical examinations.

4. 5. 2. 1. Age

Age was defined as the age of the patient at the time of diagnosis and baseline

clinical examinations. Originally, age was collected as a continuous variable. Then, this

continuous variable was dichotomized for practical purposes to: 0=age less than the cut

off value of 65 and 1=age equals to or more than the cut off value of 65 [15, 18, 41, 59,

60]. Age is a variable that is expected to be valid and reliable.

4. 5. 2. 2. Gender

Gender variable is a variable that is expected to be valid and reliable.

4. 5. 2. 3. Baseline score

46

Baseline mJOA [11] was defined as the functional score performance of the

patient at the time of diagnosis and clinical examinations at admission, just prior to the

surgery (on average, several months apart). mJOA score is a continuous variable that

ranges from 0-18 with higher scores indicating better function.

4. 5. 2. 4. Duration of symptoms

The duration of symptoms were measured up to the time of assessment.

The duration of symptoms at the first visit was divided into 2 categories: 0= duration of

symptoms less than the cut off value of 12 months and 1= duration of symptoms greater

or equal to the cut off value of 12 months [15, 21, 22].

4. 5. 2. 5. Degree of spinal cord compression (AP diameter and Transverse Area)

The level of maximum spinal cord compression was defined as a segment of the

spinal cord that was compressed and deformed with larger or smaller disappearance of

the surrounding subarachnoid space.

Anteroposterior diameter is one of the means of determining spine stenosis with

established intraclass correlation coefficient of 0.86, 0.72, 0.68, and 0.52 (Please see

Appendix 3) on four occasions. The application software used appeared to hold 1-digit

numbers. Potentially, the repeated reduction to 1 digit could cause systematic build-up of

error in the estimating the accurate reliability index.

Transverse area (TA) is another measure commonly used by researchers to assess

the degree of spinal cord compression [27] and repeated measurements on four occasions

are reliable in CSM with intraclass correlation coefficient of 0.68, 0.69, 0.73 and 0.76

47

(please see Appendix 3). It is a continuous variable measured in millimetres squared

(mm2).

4. 5. 2. 6. Signal intensity changes

The appearance of spinal cord signal intensity changes on T1-weighted sequences

and T2-weighted sequences is classified into three categories: Type 0, Normal T1WI and

Normal T2WI, Type 1, Normal T1WI and Hi T2WI, Type 2, Low T1WI and Hi T2WI.

Increased or decreased signal intensity has been defined on the T2WSIs and T1WSIs,

respectively, as a high intensity area in relation to the signal of the normal medulla at the

unaffected level.

4. 5. 2. 7. Number of affected stenotic levels

This categorical variable is coded as: 1 to 3 (1 = 1 compressed segment), (2 = 2

compressed segments), and (3 = ≥ 3 compressed segments) (Figure 1). This cut-off point

has been used in previous studies [47, 61]. The level of maximum spinal cord

compression was defined as a segment of the spinal cord that was compressed and

deformed with larger or smaller disappearance of the surrounding subarachnoid space.

Determination of the number of stenotic levels was determined by a radiologist who was

blinded to patient neurologic status.

Table 4.2: Definition of exposure variables

Domain Variable definition Type Unit Demographics Age at the time of admission assessment

0=<65 years 1= ≥65 years

Binary 0/1

48

Gender 0=Male 1=Female

Binary 0/1

Baseline mJOA score Continuous (0-18) Clinical Duration of symptoms 0=<12 months 1= ≥12 months

Binary 0/1

Transverse area Continuous mm² Anteroposterior diameter Continuous mm Signal intensity changes Type 0 = Normal T1WI/ Normal T2WI Type 1=Normal T1WI/ Hi T2WI Type 2 = Low T1WI/ Hi T2WI

Categorical 0/1/2

MR imaging

Number of affected stenotic levels 0=1 compressed segment 1=2 compressed segments 2 = ≥ 3 compressed segments

Categorical 0/1/2

4. 6. CONFOUNDING VARIABLES

It has been shown that some baseline characteristics such as pre-existing or

concomitant medical conditions (hypertension, diabetes mellitus, coronary insufficiency,

cardiomyopathy, pulmonary problems, previous cerebral infarction and gastrointestinal

ulcers) may slow the functional recovery in patients with CSM [14]. Given the

established inhibitory effects of smoking on spine fusion [62, 63], smoking may slow the

functional recovery. In addition, functional deterioration in the postoperative period may

also result from aggravation of diabetes mellitus [14]. Since this type of information was

collected at baseline examination, it was statistically tested for its significance. The

surgical interventions information (anterior cervical decompression and fusion,

laminoplasty, and laminoplasty and laminectomy and fusion) was not included in the

predictive model due to the limited size of the sampled population at one single centre.

49

4. 7. SAMPLE SIZE

General guidelines have suggested for the minimum number of events per

variable required in the multivariate analysis. It is generally suggested that a minimum of

ten subjects per variable analyzed (for continuous outcome) are required to prevent over-

fitting [64]. Given the total number of 61 patients available for analysis, we included no

more than 6 out of 10 given preoperative variables in the theoretical framework. Such

number ensures adequate sample size for future predictive models.

4.8. DATA ANALYSIS

4.8.1. Exploratory analysis

All data analyses were performed by using SAS, version 9.2 Software. Data

analysis followed standard procedures for a prediction study. Summary descriptive

statistics were computed on all variables. Categorical variables were summarized as

frequencies and percentages, and continuous variables as means and standard deviations.

Categorical variables were compared using Spearman Chi-square test for independent

proportions, and the student t-test was used as compare continuous variables.

Exploratory correlation coefficient analyses were performed to identify

associations between the ten individual independent variables and final mJOA scores and

associations or multicollinearity between variables. More specifically, Spearman

correlation analysis was used when both variables were continuous, t tests were used

when one variable was continuous and the other dichotomous, and continuity-adjusted

chi squares were calculated when both variables were categorical. The Mann Whitney U

test was used for analysis of the association between dichotomous variables and final

50

mJOA scores, because these scores did not follow a Gaussian distribution. The criterion

of r> 0.90 was used for excessive correlation between variables. At the same time, the p-

value was used in chi-square test to see if it is significantly smaller than 5%.

To assess normality of primary outcome measure and other variables’

distribution, plotting of histograms was used. The logarithmic transformation for

normality was used when distribution of follow up mJOA scores was negatively skewed

[65].

4.8.1.1. Univariable (unadjusted) analysis

Univariable data analyses that include unadjusted regression coefficients (beta

values estimates) and p-values were carried out for all variables under evaluation.

Initially, continuous variables (age, duration of symptoms, baseline mJOA scores,

transverse area and anterioposterior diameter of spinal cord) were analyzed individually

for a linear relationship with post-operative functional scores. Then, age and duration of

symptoms variables were dichotomized for convenience in clinical practice and ease of

interpretation of findings. In addition, three MR imaging variables (three patterns of

spinal cord signal intensity changes on T1- and T2-weighted sequences, transverse area

of the spinal cord and number of compressed segments), were analyzed. Table 5.5

summarizes the statistical details of the unadjusted analysis. All candidate variables were

examined using linear regression.

4. 8. 2. Model development

As the outcome of interest is continuous (functional score calculated using mJOA

score from 0-18), multivariable linear regression modeling techniques were used to

51

determine the relationship between each independent variable and the functional

outcomes.

Unadjusted (univariable) data analyses were carried out initially to estimate the

effect of each potential predictive variable individually, followed by the adjusted

(multivariable) analysis.

Efforts were made to maximize predictive performance using all-variables

regression for model building (no selection methods were applied, eg. stepwise selection

for example) and a variable remained in the final model if it met the following three

criteria: 1) a significance level of p of 0.1 or less; 2) the r2 statistic for the model

increased by at least 10%; and 3) if the beta coefficient did not change by more than 10%

with the addition of other variables into the model [66, 67]. Baseline scores were

included in the model to adjust for the effect of baseline differences on final scores [68].

This analysis was conducted using the PROC GLM procedure in SAS, version 9.2.

4.8.3 Data sources and management

Source of clinical data: Source data included all information in original records,

observations, or other activities necessary for the reconstruction of missing data and

verification of outliers. More specifically, it included surgery, imaging and laboratory

reports, medical history information and demographics. The study database was a secured

electronic database system known as OPVerdi.

Several strategies were implemented for the reconstruction of missing data and

verification of outliers. For continuous data, we plotted each variable and investigated for

any outliers were beyond 3 standard deviations. The same approach was applied on

52

categorical data by plotting a boxplot. The spotted outliers were checked against data

collection forms and were corrected. For age, gender, and duration of symptoms

variables, the data was 100% complete. After calculating the frequency of missing

values, the following was found: 3 (5%) for transverse area of spinal cord measurements,

0 (0%) for anteroposterior diameter measurements, 2 (3%) for signal intensity changes,

and 4 (7%) for number of compressed segments. The mJOA scores at 12 months for 4 out

of 65 patients (6%) were found to be missing due to loss of follow up. The subjects were

removed and as a result, 61 subjects were analyzed in statistical modelling.

MR imaging data: Issa [proprietary name] was used as an integrated system for

archiving patient data and examination data including images.

4.8.4 Ethics

The research protocol was approved by the University Health Network Research

Ethics Board.

53

CHAPTER 5

RESULTS

Chapter 5 provides findings to two research questions related to Specific Aims I & II.

OVERALL STUDY OBJECTIVE: To develop a predictive model of functional score incorporating key demographic, clinical and MR imaging assessments in patients with cervical spondylotic myelopathy undergoing surgical treatment. Hypothesis: Key demographic parameters, clinical factors and MR imaging features of the site of cervical cord compression are independently associated with baseline scores and predictive of functional outcomes scores at 12 months follow up in patients with CSM undergoing surgical treatment. Each specific aim contributes to overall objective: Specific Aim I : Reliability assessment of MR imaging to assess cord compression in CSM Objective: To investigate the inter-rater reliability of two published methods (transverse area and anteroposterior diameter) of examining cord stenosis on axial MR images. Question: Are the ICC values of TA and AP diameter of spinal cord methods free of systematic errors (bias)? Findings: The two-way analysis of variance indicated the interrater agreement ICC’s for transverse area (TA) and anteroposterior diameter (AP) of the spinal cord were 0.68, 0.69, 0.73 and 0.76, and 0.86, 0.72, 0.68, and 0.52 on 1st-4th sessions, respectively. Those coefficients were calculated using Shrout-Fleiss models for random effects (Model 2). Of note, TA and AP methods showed wider variability in cases of severe cord compression (presence of systematic error) and the variability of images interpretation was dependent of rater’s individual differences. TA and AP measurement techniques demonstrated moderate to good inter-reliability, with more consistent agreement noted in the assessment of transverse area of spinal cord. This is the first study to examine, the interobserver reliability of quantifiable methods to assess spinal cord stenosis in the setting of CSM. Based on our data, we recommend that the TA method be used to assess the extent of compression on axial T2 images.

54

Specific Aim II : Development of a predictive model of outcome in patients with CSM undergoing surgical treatment Objective: To address the limitations of the current literature by prospectively evaluating if demographic, clinical and radiological factors in patients with CSM are predictive of functional outcomes pre- and post- surgery. Questions: After controlling for age, gender and duration of symptoms, MRI is independently associated and predictive of functional outcomes at baseline and 12 months follow-up, respectively. Findings: Higher baseline mJOA scores were associated with younger age (p=0.0002), shorter duration of symptoms (p=0.03), fewer compressed segments (p=0.04) and less severe cord compression (p=0.02). Moreover, better post-operative mJOA scores were associated with younger age (p<0.0001), shorter duration of symptoms (p=0.09) and higher baseline mJOA score (p<0.0001). Using multivariate analysis, baseline and follow-up mJOA scores were best predicted by age. This data suggest that: first, it is important to diagnose and treat CSM at an early stage and that age is a key predictor of functional improvement on the mJOA scale; ischemic changes, degree of spinal cord deformity and multiplicity of stenosis could not predict post-operative functional status being measured by mJOA scale, after controlling for age and baseline mJOA score.

5. 1. DESCRIPTIVE STATISTICS

The final dataset included information on 61 CSM patients, who underwent spine

surgery at Toronto Western Hospital between February 2006 and November 2007. The

missing data were 6% of sample population lost to follow up in the development of

model. All 61 patients had complete data. The general patients’ characteristics with

cervical spondylotic myelopathy are illustrated in Table 5.1.

5. 2. MODEL DEVELOPMENT

5. 2. 1. Improving the validity of the predictive model

Among the potential predictor variables, two of these variables, transverse area

[TA] and anteroposterior diameter [AP] of spinal cord, both provide similar information

about the degree of spinal cord compression, efforts were made to establish the

55

reliabilities (inter-rater reliability and test-retest) of each variable were (please refer to

Appendix 3).

Based on three-way ANOVA, the observed differences between AP

measurements consists of true score variances, random error (imprecision) and systematic

error (bias) caused by raters’ specialty training and their interpretations of MRI based on

stage of CSM severity (Table F.5. - F.8).

In addition to the sources of systematic error mentioned above, the TA method

had time (learning or fatigue) as a source of variability. The time effect has been shown

to be statically significant in the TA method of spinal stenosis assessment, based on

three-way ANOVA with Bonferroni post-hoc analysis [TA, p= 0.01], specifically the

agreement among four raters consistently increased from Session 1 to Session 4 (Table

F. 4). The time differences are illustrated as normal fluctuations by graphical

representation (i.e. random error) (Figure F.3).

The TA and AP measurement techniques demonstrated a moderate level of inter-

reliability (0.68, 0.69, 0.73, 0.76 and 0.86, 0.72, 0.68, 0.52), with more consistent

agreement noted in the assessment of transverse area of spinal cord. As a result,

transverse area was chosen over anteroposterior diameter of spinal cord method. The

variable was selected based on clinical, practical, statistical and reliability criteria

described in Table 4.7.

Transverse area and anteroposterior diameter of spinal cord are statically collinear

and choosing TA for a predictive model avoids this collinearity. Collinearity is a

statistical phenomenon in which two predictor variables in a multiple regression model

are highly correlated. As a result, the coefficient estimates of individual predictor

56

variables may change erratically in response to small changes in the model or the data.

AP of spinal cord has the disadvantage of being less applicable in cases of compression

sites off midline of spinal cord. TA adjusts for asymmetrical compression of spinal cord;

thus it is a less biased measure.

5. 2. 2. Univariable (unadjusted) analysis

5. 2. 2. 1. mJOA Scores at baseline

Higher baseline mJOA scores were associated with younger age (p=0.0002, β(r) =

-2.83), shorter duration of symptoms (p=0.03, β(r) = -1.55), a smaller compression of

transverse area of the spinal cord (p=0.02, β(r) = 0.06) and less number of compressed

segments (p=0.04, β(r) = 2.35 and β(r) = 1.06) (Table 5.4).

Analysis of all variables revealed that three patterns of spinal cord signal intensity

changes on T1- and T2-weighted sequences, and gender variables were not significantly

associated with the functional score at admission (p-value > 0.2). Therefore, these

insignificant demographic and MR imaging variables (gender and signal intensity

changes) were excluded (Table 5.4).

5. 2. 2. 2. mJOA Scores at follow up

The mean mJOA score improved from 12.8 ± 2.7 points pre-operatively to 15.8 ±

2.3 points at 12 months post-operatively (p<0.0001), as determined by the Wilcoxon

signed-rank test. Higher post-operative mJOA scores were associated with younger age

(p<0.0001, β(r) = -1.07), shorter duration of symptoms (p=0.09, β(r) = -1.03) and higher

baseline mJOA score (p<0.0001, β(r) = 1.01) (Table 5.4).

57

Analysis of all variables revealed that the MR imaging features (three patterns of

spinal cord signal intensity changes on T1- and T2-weighted sequences and number of

compressed segments), and gender variables were not significantly associated with the

functional score at follow-up (p-value > 0.2). Therefore, these insignificant variables (list

variables) were excluded (Table 5.4).

5. 2. 3. Multivariate (adjusted) analysis

5. 2. 3. 1. mJOA Scores at baseline

The final statistical model includes age (Table 5.5), which explains 20% of the

total variability of the baseline mJOA scores. The average baseline score of CSM patients

in patients older 65 years of age was 13.5. The baseline mJOA scores in younger patients

are on average 2.83 higher.

5. 2. 3. 2. mJOA Scores at follow-up

The final model includes the baseline mJOA score and age (Table 5.5), and

explains 36% of the total variability of the final mJOA scores. This model indicates that,

for example, if baseline scores were identical, a patient less than 65 years of age has on

average score 1.04 higher than an older patient. Moreover, if age was identical, a patient

with moderate severity of myelopathy may benefit from surgical treatment more than a

patient with severe myelopathy (approximately by 1.01 points lower on average).

58

CHAPTER 6

DISCUSSION AND CONCLUSION

6.1. Summary of findings

The studies described herein have led to several major conclusions: 1) Age and

baseline severity score are good predictors of functional score after surgery. 2) Duration

of symptoms is not a good predictor of functional scores after surgery. 3) Measurements

of the transverse area and anteroposterior diameter of the spinal cord have shown good to

moderate inter-rater reliability. 4) No definite conclusions can yet be drawn on whether

transverse area of spinal cord, combined patterns of signal intensity changes on T1/T2WI,

and the number of compressed levels are predictors of functional score.

Age & Baseline severity score

Based on in-depth examination of the impact of predictors on outcome using beta

coefficient values and reliability assessments, our study confirms that age and baseline

severity score are two preoperative variables that can predict functional outcomes after

surgery (post-operative mean mJOA score). The most prominent patient information was

the age at the time of admission, which was shown to be associated with baseline

functional score and predictive of follow up functional score in the setting of CSM.

Based on the beta estimate magnitude, the following data suggest that there might be

more opportunities for greater improvement when performing surgery on younger

population. However, more research is needed to confirm these findings. In contrast,

Yamazaki et al showed no differences based on age in post-operative functional scores

after surgery in a retrospective study [15]. However, these results must be cautiously

59

interpreted because the study did not controlled for baseline severity score, the number of

patients in each subgroup was small, and patient characteristics were too poorly described

to understand the differences between two samples. Finally, baseline CSM severity score

was a strong independent predictor of functional score following surgery. Patients with

less severe functional disability may benefit from surgical treatment more than those with

a more severe disability. The greater benefit from surgery in patients with less functional

disability could be due to milder neuropathologic alterations in the spinal cord that reflect

greater recuperative potential [19]. These findings suggest the possibility that patients

may experience poorer outcome if surgery is delayed until the patient is more severely

affected. In contrast, Singh et al. reported patients with lower starting point in function

make the most gains after surgery[24]. We suspect that higher functional scores in the

more severe CSM group in this study could be due to other differences in patient

characteristics (age and duration of symptoms), which were not comparable at admission.

Duration of symptoms

In our study, duration of symptoms was mildly associated with functional scores

at admission at 12 months follow-up. However, after adjustments for age and baseline

severity score, duration of symptoms appears to be associated with functional score at

admission and follow up, though this is not significant. The question as to whether CSM

patients with indications for surgery should be offered operative interventions

irrespective of duration of symptoms is still unclear.” Our findings are inconsistent with

some other studies in the literature that support the notion of long-standing mechanical

compression causing additional circulatory impairment of the spinal cord [15, 19, 21, 46].

60

We suspect that these differences may be due to the interpretation of the onset of CSM.

Heterogeneity of samples, non-consecutive methods of recruitment and insufficient

descriptions of patients associated with retrospective design in previous studies could

also have contributed to the observed differences in functional scores. Although

Mastronardi et al prospectively analyzed CSM patients, these results must be cautiously

interpreted because baseline severity score and age were not similar between groups and

the number of patients in each subgroup was small [21].

MR imaging features

Based on the findings of our systematic review, transverse area of spinal cord,

combined patterns of signal intensity changes on T1/T2WI, and number of compressed

segments were found to be associated with functional scores at long term follow up

before and after adjustment for age, duration of symptoms, and baseline severity score.

The data obtained for this thesis did not support the findings of previous studies. We can

speculate that several factors may have contributed to these results. Firstly, the

inconsistencies in findings could be due to heterogeneity of the patients in this sample

population. The findings vary based on different etiology, ossification of posterior

ligaments (OPLL) versus cervical spondylotic myelopathy (CSM) vs herniated disc (HD)

[27, 31]. Differences could also be due to inter-institutional variations in MRI protocols.

For example, previous studies used T1-weighted axial imaging to measure spinal cord

deformity. At our institution (Toronto Western Hospital), MRI protocols for the cervical

spine include axial T2 slices. Differences among clinicians are another source of

variation. Based on our observations from reliability testing, the measurements of

61

transverse area of spinal cord is subjective; clinicians had different approaches to

interpret the exact location and boundaries of the most compressed site of the spinal cord,

especially in multisegmental CSM. Based on the findings of intra- and inter-rater

reliability project (Appendix 3), the interpretations of MR images varied depending on

the specialty and years of practice. In our study, we found that the percentage of

agreement was 68% to 76% and overall correlation was moderate to good. We

recommend that the use of this measurement technique be applied in a larger sample size.

In our study, we established that the variations in functional outcomes defined by

mJOA score after surgery cannot be further explained using MR modality, in addition to

age and baseline mJOA score. Our findings suggest that assessments of T1-/T2 signal

intensity changes, degree of spinal cord compression and number of levels involved in

compression have no statistically significant effect on post-operative functional status as

measured using the mJOA scale, and provide no additional clinically important

information in predicting function after surgery. We suspect that the study does not

support the use of MR imaging features as predictors because of the ceiling effect present

in mJOA measurements at follow up, which leads to poor discriminative response thus

resulting in low responsiveness. All of the subjects were fully developed in their ability to

function, therefore, no subjects scored below 10 on mJOA scoring system. The majority

of patients scored on the mild side of spectrum at follow up. Therefore, one of the

limitations of the study was the use of poorly variable pool of CSM individuals at follow-

up. A future study will require a better outcome measure than the mJOA that would have

capacity to differentiate subjects more precisely from all severity groups at follow-up.

62

Similar to these findings, Singh et al reported low levels of sensitivity to change in JOA

score (r=0.21) compared to SF-36 (r=0.32), Nurick score (r=0.42) and MDI (r=0.52),

indicating that the scale is possibly less sensitive when differentiating milder levels of

severity [23]. Predicting a perfect correlation between the clinical scores with poor

sensitivity and the findings seen on MR images of spinal cord remains a challenge. In

addition, MRI provides a quantitative measure as opposed to qualitatively subjective

report of observers to differentiate severity of CSM. More research is needed to

investigate in greater detail about the psychometric properties (reliability and validity) of

the modified version of JOA (mJOA) scale.

The availability of these predictors enables spine surgeons and referring

physicians to provide more information to patients in consulting sessions prior to surgery,

and in guiding their therapeutic decision making. It provides better allocation of services

and becomes useful in designing clinical trials to test the effect of surgical interventions

on outcomes.

6.2. Implications of the findings

The most significant finding of this study is that there are now known reliable

measures (transverse area and anteroposterior diameter of spinal cord) to assess the

degree of spinal cord compression using digitized/magnified images and a standardized

written protocol. In the past, there was a lack of concordance in the literature on the

optimal techniques to quantitatively assess MRIs in patients with CSM. It has not been

possible to replicate previously published results due to lack of availability of information

on MRI protocol details and its measures.

63

The findings also enhance knowledge which lends insight into how MR imaging

should be approached and analyzed with this population. Perhaps some studies should

include assessments using T2-weighted images as opposed to T1-weighted and have a

consistent approach in the selection of the most compressed site, especially in CSM cases

with multilevel involvements due to degenerative changes of spine.

In summary, the predictive model provides a detailed profile of patient

characteristics and their variability, enabling a clinician to council patients on individual

bases. We also report details about variability, age and baseline severity score.

Furthermore, the data was collected in a prospective fashion, which fills a void currently

existing in the literature. This study provides a detailed exploratory analysis, providing

new insights on discriminative abilities of mJOA scale in the area of CSM research.

6.3. Limitations

The present study has several limitations. The first is the absence of confirmed

reliability, validity and responsiveness of the modified version of JOA scoring system

and some MR imaging based predictive variables (number of compressed segments) used

in the baseline examinations. In addition, the modified version of JOA scale, which has

limited usefulness in detecting the precise benefit of surgery for mild CSM patients due

to a ceiling effect, was used in our study. The majority of patients scored on the mild side

of spectrum at follow up. A future study will require a better outcome measure than the

mJOA scale that would have a capacity to differentiate subjects more precisely from all

severity groups at follow-up. Second, a study with one single recruitment centre might

potentially systematically under- or overestimate measurement errors due to particular

64

characteristics of patients. Multicentre trial data in and outside of Toronto may help to

establish more representative estimates of CSM parameters. Finally, although the

findings were based on the secondary analysis of a prospectively collected data, there was

a restriction in the types of MR imaging features collected.

6.4. Future directions

Based on the results from this study it appears that age and baseline severity score

at admission can both provide valuable information and can be part of a new

multidimensional scoring system for clinicians to counsel patients with CSM.

Because the cumulative effect of age, gender, duration of symptoms, baseline

severity score and MR imaging predictors on functional score assessed by mJOA scale

following surgery in the present study were from a single centre and investigated for the

first time, a similar analysis must be conducted on the data collected from larger North

American and international CSM clinical trials databases to determine if our results can

be reproduced in other geographical regions with similar estimates for the magnitude of

all associations (beta values).

In light of the need to establish the predictive value of MR imaging features of

functional outcomes after adjusting for other important predictors, the mJOA scale

requires improvements to the existing measurements and should potentially add some

new ones. For example, some studies has shown that JOA score underestimates the initial

handicap in the hands, often among the first of patients’ complaints [69]. Similarly,

recovery of manual dexterity is poorly judged by this score. Potentially, the domain

including the functioning of the upper limb needs to be reconsidered to make it more

65

quantitative as compared to qualitative estimate that currently is. Gait dysfunction is the

most important issue in CSM patients regarding the surgical outcome and clinical

deficits. The measurements of ambulation have shown the relative advantages over

previous clinical assessment scales in determining clinical severity and, particularly, in

the detection of change following surgery [70]. In study by Singh et al 2001, walking-

related parameters were shown to have good correlation, along with validity, with other

functional and impairment scales such as the myelopathy disability index (MDI), the

Nurick Scale and the short form health survey (SF-36) in CSM setting [24]. Potentially

adding a new domain with a walking component may enable more accurate prediction of

patients’ functioning after treatment. Further work on the mJOA scale is necessary to

confirm its psychometric properties including reliability, construct validity and its

discriminative abilities (responsiveness).

Alternatively, MR imaging with T2 weighting has been reported to have a level of

sensitivity ranging from 15% to 65% [71], but low specificity for the visualization of

intramedullary pathology. The development of a more advanced spinal imaging

technique such as diffusion tensor imaging (DTI) with fractional anisotropy, diffusion-

weighted imaging (DWI), functional magnetic resonance (fMR), diffusion coefficient

(ADC), may enable more accurate correlations between imaging and clinical

presentation.

In addition, given that spin-echo MR imaging has limited pathophysiologic

usefulness in detecting myelopathy, diffusion-tensor imaging (DTI) and diffusion

weighted imaging (DWI) may be more useful in identifying additional shearing injuries

66

that are not visible on conventional MR images. In general, DTI analyzes the movement

of water in association with white matter fibers, providing three-dimensional

reconstruction of fiber tracts, and has the ability to help quantify the severity of injury to

individual white matter tracts [72]. Budzik et al found diffusion-tensor MR imaging to be

better correlate with clinical scores than T2WI in cervical spondylotic myelopathy [73].

Similarly, results of the Sagiuchi et al study showed that DWI has higher sensitivity for

detection of acute spinal cord imaging abnormality compared to standard MRI [74].

fMRI analysis of the spinal cord provides physiological readouts of neuronal

activity and neuronal plasticity, in a non-invasive manner. A number of studies have

demonstrated the utility of advanced MRI techniques in the setting of spinal cord injuries

with reliable results and good sensitivity to changes in neuronal activity.

ADC and fractional anisotropy may be beneficial in assessing a correlation

between imaging and clinical presentation. Demir et al found that diffusion ADC values

were a more sensitive indicator of spinal cord injury than T2-weighted images. The study

demonstrated a higher sensitivity when combined with electrophysiological examination

with sensitivity of 92% and negative predictive value of 75% compared to the T2-

weighted images that had 53% sensitivity and 50% negative predictive value [71]. Facon

et al. performed a similar study in six cervical spondylosis patients and determined that

the fractional anisotropy values had significantly higher sensitivity and specificity in the

detection of spinal cord abnormalities than T2 weighted images [72].

In conclusion, imaging indexes based on pathophysiologic models may enable

more accurate prediction of CSM and thereby facilitate better assessment of the prognosis

and better application of treatment strategies.

67

Conclusion

A predictive model of functional outcomes was developed to predict functional

outcome of patients undergoing surgery according to their age and baseline severity

score, though changes on MR imaging were not independently predictive of outcome. In

addition to validating reports in the existing literature, our study results suggest that MRI

is a reliable tool yielding reproducing stable measurements. Some work on

responsiveness of the current mJOA scale is needed to establish the ability of MRI to

predict the functional outcomes of CSM patients.

This study has shed some light on the need for a more responsive functional scale

than the mJOA that could detect more clinically important changes in functional

outcomes. More specifically, the main issue explained above with the mJOA scale is the

presence of ceiling effect with lack of discrimination of functional deficits in milder

patients with CSM. This is preliminary work which provides a first step in developing a

multidimensional scoring system for prediction of functional outcomes in CSM using

demographic, clinical and MR imaging domains.

Moreover, the proportions of variance in follow up functional scores explained by

age and baseline score is low, suggesting that this field has long way to go before

achieving equipoise in refusing someone surgery on the basis of unfavourable baseline

characteristics.

68

CHAPTER 7

REFERENCE LIST

1. Emery, S., Cervical spondylotic myelopathy: diagnosis and treatment. . Journal of the American Academy of Orthopaedic Surgeons, 2001. 9(6): p. 376-385.

2. Cadotte, D.W., Karpova, A.V., Fehlings,M.G. , Cervical spondylotic myelopathy: surgical outcomes in the elderly. Int. J. Clin. Rheumatol, 2010. 5(3): p. 327-337.

3. Montgomery, D.M. and R.S. Brower, Cervical spondylotic myelopathy. Clinical syndrome and natural history. [Review] [54 refs]. Orthopedic Clinics of North America. 23(3):487-93, 1992 Jul., 1992.

4. Adams, C.B., Logue, V., Some functional effects of operations for cervical spondylotic myelopathy. Brain, 1971. 94: p. 587-594.

5. Law, M.D., Jr., Bernhardt, M., White, A.A., Evaluation and management of cervical spondylotic myelopathy. Instr Course Lect 1995. 44: p. 99-110.

6. Young, W.F., Cervical spondylotic myelopathy: a common cause of spinal cord dysfunction in older persons. . Am Fam Physician 2000. 62: p. 1064-1070, 1073, 2000.

7. Matz, P.G., et al., The natural history of cervical spondylotic myelopathy. J Neurosurg Spine, 2009. 11(2): p. 104-11.

8. Houten, J.K. and P.R. Cooper, Laminectomy and posterior cervical plating for multilevel cervical spondylotic myelopathy and ossification of the posterior longitudinal ligament: effects on cervical alignment, spinal cord compression, and neurological outcome. Neurosurgery. 52(5):1081-7; discussion 1087-8, 2003 May., 2003.

9. Hirabayashi, K., Miyakawa, J., Satomi, K., Maruyama, T., Wakano, K., Operative results and postoperative progression of ossification among patients with offication of cervical posterior longitudinal ligaments. . Spine 1981. 6(4): p. 354-364.

10. Chen, C.J., et al., Intramedullary high signal intensity on T2-weighted MR images in cervical spondylotic myelopathy: prediction of prognosis with type of intensity. Radiology. 221(3):789-94, 2001 Dec., 2001.

11. Benzel, E.C., et al., Cervical laminectomy and dentate ligament section for cervical spondylotic myelopathy. Journal of Spinal Disorders. 4(3):286-95, 1991 Sep., 1991.

12. Park, Y.S., et al., Predictors of outcome of surgery for cervical compressive myelopathy: retrospective analysis and prospective study. Neurologia Medico-Chirurgica. 46(5):231-8; discussion 238-9, 2006 May., 2006.

13. Handa, Y., et al., Evaluation of prognostic factors and clinical outcome in elderly patients in whom expansive laminoplasty is performed for cervical myelopathy due to multisegmental spondylotic canal stenosis. A retrospective comparison with younger patients. Journal of Neurosurgery. 96(2 Suppl):173-9, 2002 Mar., 2002.

14. Matsuda, Y., et al., Outcomes of surgical treatment for cervical myelopathy in patients more than 75 years of age. Spine. 24(6):529-34, 1999 Mar 15., 1999.

69

15. Yamazaki, T., et al., Cervical spondylotic myelopathy: surgical results and factors affecting outcome with special reference to age differences. Neurosurgery. 52(1):122-6; discussion 126, 2003 Jan., 2003.

16. Hasegawa K, H.T., Chiba Y, Hirano T, Watanabe K, Yamazaki A. , Effects of surgical treatment for cervical spondylotic myelopathy in patients > or _ 70 years of age: a retrospective comparative study. J Spinal Disord Tech. , 2002. 15: p. 458-460.

17. Kohno K, K.Y., Oka Y, Matsui S, Ohue S, Sakaki S. , Evaluation of prognostic factors following expansive laminoplasty for cervical spinal stenotic myelopathy. . Surg Neurol. , 1997. 48: p. 237–245.

18. Nagata, K., et al., Cervical myelopathy in elderly patients: clinical results and MRI findings before and after decompression surgery. Spinal Cord. 34(4):220-6, 1996 Apr., 1996.

19. Morio, Y., et al., Correlation between operative outcomes of cervical compression myelopathy and mri of the spinal cord. Spine. 26(11):1238-45, 2001 Jun 1., 2001.

20. Yagi M, N.K., Kihara M, Horiuchi Y, Long-term surgical outcome and risk factors in patients with cervical myelopathy and a change in signal intensity of intramedullary spinal cord on magnetic resonance imaging. J Neurosurg Spine, 2010. 12: p. 59–65.

21. Mastronardi, L., et al., Prognostic relevance of the postoperative evolution of intramedullary spinal cord changes in signal intensity on magnetic resonance imaging after anterior decompression for cervical spondylotic myelopathy. Journal of Neurosurgery Spine. 7(6):615-22, 2007 Dec., 2007.

22. Fukushima, T., et al., Magnetic resonance imaging study on spinal cord plasticity in patients with cervical compression myelopathy. Spine. 16(10 Suppl):S534-8, 1991 Oct., 1991.

23. Singh A, C.H., Comparison of seven different scales used to quantify severity of cervical spondylotic myelopathy and post-operative improvement. Journal of Outcome Measures, 2001. 5(1): p. 798-818.

24. Singh, A., et al., Clinical and radiological correlates of severity and surgery-related outcome in cervical spondylosis. Journal of Neurosurgery. 94(2 Suppl):189-98, 2001 Apr., 2001.

25. Yukawa, Y., et al., MR T2 Image Classification in Cervical Compression Myelopathy. Spine, 2007. 32(15): p. 1675–1678.

26. Alafifi, T., Kern, R.,Fehlings, M. , Clinical and MRI Predictors of Outcome After Surgical Intervention for Cervical Spondylotic Myelopathy. Journal of Neuroimaging, 2006. 17(4): p. 315-322.

27. Okada, Y., et al., Magnetic resonance imaging study on the results of surgery for cervical compression myelopathy. Spine. 18(14):2024-9, 1993 Oct 15., 1993.

28. Chung, S., Chung, KH. , Factors affecting the surgical results of expansive laminoplasty for cervical spondylotic myelopathy. . Int Orthop, 2002. 26(6): p. 334-338.

29. Wada, E., M. Ohmura, and K. Yonenobu, Intramedullary changes of the spinal cord in cervical spondylotic myelopathy. Spine. 20(20):2226-32, 1995 Oct 15., 1995.

70

30. Uchida, K., Nakajima,H., Sato,R., Kokubo, Y., Yayama,T., Kobayashi,S., Baba, H., Multivariate analysis of the neurological outcome of surgery for cervical compressive myelopathy. Journal of Orthopaedic Science 2005. 10: p. 564–573.

31. Yone, K., et al., Preoperative and postoperative magnetic resonance image evaluations of the spinal cord in cervical myelopathy. Spine. 17(10 Suppl):S388-92, 1992 Oct., 1992.

32. Fernandez de Rota, J.J., et al., Cervical spondylotic myelopathy due to chronic compression: the role of signal intensity changes in magnetic resonance images. Journal of Neurosurgery Spine. 6(1):17-22, 2007 Jan., 2007.

33. Nagata, K., Kiyonaga, K., Ohashi, MS., Miyazaki, S., Inoue, A. , Clinical value of magnetic resonance imaging for cervical myelopathy. Spine 1990. 15(11): p. 1089-1096.

34. Matsuyama, Y., N. Kawakami, and K. Mimatsu, Spinal cord expansion after decompression in cervical myelopathy. Investigation by computed tomography myelography and ultrasonography. Spine. 20(15):1657-63, 1995 Aug 1., 1995.

35. Ramanauskas WL, W.H., Metes JJ, Lazo A, Kelly JK., MR imaging of compressive myelomalacia. J Comput Assist Tomogr. , 1989. 13(3): p. 300-404.

36. Takahashi M, S.Y., Miyawaki M, Bussaka H., Increased MR signal intensity secondary to chronic cervical cord compression. Neuroradiology., 1987. 29(6): p. 550-556.

37. Morio Y, Y.K., Kuranobu K, Murata M, Tuda K., Does increased signal intensity of the spinal cord on MR images due to cervical myelopathy predict prognosis? Arch Orthop Trauma Surg. , 1994. 113(5): p. 254-259.

38. Al-Mefty O, H.L., Middleton TH, Smith RR, Fox JL., Myelopathic cervical spondylotic lesions demonstrated by magnetic resonance imaging. J Neurosurg. , 1988. 68(2): p. 217-222.

39. Mehalic TF, P.R., Applebaum BI., Magnetic resonance imaging and cervical spondylotic myelopathy. Neurosurgery, 1990. 26(2): p. 226-227.

40. Serizawa Y, O.K., Tanaka K, Tamaki S, Matsuura K, Uchihara T., Spontaneous resolution of an acute spontaneous spinal epidural hematoma without neurological deficits. Intern Med. , 1995. 34(10): p. 992-994.

41. Mihara, H., et al., Cervical myelopathy caused by C3-C4 spondylosis in elderly patients: a radiographic analysis of pathogenesis. Spine. 25(7):796-800, 2000 Apr 1., 2000.

42. Tuszynski MH, S.J., Fawcett JW, Lammertse D, Kalichman M, Rask C, Curt A, Ditunno JF, Fehlings MG, Guest JD, Ellaway PH, Kleitman N, Bartlett PF, Blight AR, Dietz V, Dobkin BH, Grossman R, Privat A; , Guidelines for the conduct of clinical trials for spinal cord injury as developed by the ICCP Panel: clinical trial inclusion/exclusion criteria and ethics. Spinal Cord, 2007. 45(3): p. 222-231.

43. Peolsson A, H.R., Vavruch L, Prediction of fusion and importance of radiological variables for the outcome of anterior cervical decompression and fusion. Eur Spine J, 2004. 13: p. 229–234.

44. McCormack, B.M. and P.R. Weinstein, Cervical spondylosis. An update. [Review] [116 refs]. Western Journal of Medicine. 165(1-2):43-51, 1996 Jul-Aug., 1996.

71

45. Lee, T.T., G.R. Manzano, and B.A. Green, Modified open-door cervical expansive laminoplasty for spondylotic myelopathy: operative technique, outcome, and predictors for gait improvement. Journal of Neurosurgery. 86(1):64-8, 1997 Jan., 1997.

46. Hashizume, Y., Iijima, S., Kishimoto, H. Yanagi,T. , Pathology of Spinal Cord Lesions caused by Ossification of the Posterior Longitudinal Ligament Acta neuropathology, 1984. 63: p. 1230-130.

47. Uchida K, N.H., Sato R, Kokubo Y, Yayama T, Kobayashi S, Baba H., Multivariate analysis of the neurological outcome of surgery for cervical compressive myelopathy. J Orthop Sci., 2005. 10(6): p. 564-573.

48. Shinomiya K, M.N., Furuya K., Study of experimental cervical spondylotic myelopathy. Spine (Phila Pa 1976). 1992. 7(10 Suppl): p. S383-387.

49. Ryan R, H.S., Broclain D, Horey D, Oliver S, Prictor M, Cochrane consumers & communication review group: study quality guide. 2007: p. 1-50.

50. Nurick, S., The pathogenesis of the spinal cord disorder associated with cervical spondylosis. Brain. 95(1):87-100, 1972., 1972.

51. Chung SS, L.C., Chung KH., Factors affecting the surgical results of expansive laminoplasty for cervical spondylotic myelopathy. Int Orthop., 2002. 26(6): p. 334-338.

52. Kasai Y, U.A., New evaluation method using preoperative magnetic resonance imaging for cervical spondylotic myelopathy. Arch Orthop Trauma Surg., 2001. 121(9): p. 508-510.

53. Nagata K, K.K., Ohashi T, Sagara M, Miyazaki S, Inoue A., Clinical value of magnetic resonance imaging for cervical myelopathy. Spine (Phila Pa 1976). , 1990. 15(11): p. 1088-1096.

54. Matsuyama Y, K.N., Yanase M, Yoshihara H, Ishiguro N, Kameyama T, Hashizume Y., Cervical myelopathy due to OPLL: clinical evaluation by MRI and intraoperative spinal sonography. J Spinal Disord Tech. , 2004. 17(5): p. 401-404.

55. Mizuno J, N.H., Inoue T, Hashizume Y., Clinicopathological study of "snake-eye appearance" in compressive myelopathy of the cervical spinal cord. J Neurosurg., 2003. 99(2 Suppl)(162-168).

56. Yukawa Y, K.F., Yoshihara H, Yanase M, Ito K., MR T2 image classification in cervical compression myelopathy: predictor of surgical outcomes. Spine (Phila Pa 1976). , 2007. 32(15): p. 1675-1678.

57. Papadopoulos CA, K.P., Papagelopoulos PJ, Karampekios S, Hadjipavlou AG., Surgical decompression for cervical spondylotic myelopathy: correlation between operative outcomes and MRI of the spinal cord. Orthopedics., 2004. 27(10): p. 1087-1091.

58. Wada, E., et al., Can intramedullary signal change on magnetic resonance imaging predict surgical outcome in cervical spondylotic myelopathy? Spine. 24(5):455-61; discussion 462, 1999 Mar 1., 1999.

59. Tanaka J, S.N., Tokimura F, Doi K, Inoue S. , Operative results of canal-expansive laminoplasty for cervical spondylotic myelopathy in elderly patients. . Spine, 1999. 24: p. 2308-2312.

72

60. Tani T, Y.H., Kimura J. , Cervical spondylotic myelopathy in elderly people: a high incidence of conduction block at C3-4 or C4-5. . J Neurol Neurosurg Psychiatry, 1999. 66: p. 456–464.

61. Suri A, C.R., Mehta VS, Gaikwad S, Pandey RM., Effect of intramedullary signal changes on the surgical outcome of patients with cervical spondylotic myelopathy. Spine J., 2003. 3(1): p. 33-45.

62. Andersen T, C.F., Laursen M, Hoy K, Hansen ES, Bunger C, Smoking as a predictor of negative outcome in lumbar spinal fusion. . Spine, 2001. 26: p. 2623–2628.

63. Glassman SD, A.S., Parker A, Burke D, Johnson JR, Dimar JR The effect of cigarette smoking and smoking cessation on spinal fusion. Spine, 2000. 25: p. 2608–2615.

64. Concato J, F.A., Holford TR. , The risk of determining risk with multivariable models. Annals of Internal Medicine, 1993. 118: p. 201-210.

65. Geoffrey R. Norman, D.L.S., Biostatistics: The Bare Essentials. 2008, People's medical publishing house Shelton.

66. Feinstein, A., Multivariate analysis: an introduction. . 1996, London: Yale Univ Pr.

67. Rothman KJ, G.S., Modern epidemiology. . 1998, Philadelphia: Lippincott-Raven. 68. Vickers AJ, A.D., Analysing controlled trials with baseline and follow up

measurements. . BMJ, 2001. 323: p. 1123-1126. 69. Pascal -Moussellard H, D.L.-R., Olindo S, Rouvillain J-L, Catonné Y

Neurological recovery after cervical cord decompression for canal stenosis myelopathy. Elsevier Masson SAS, 2006. 91: p. 607-614.

70. Singh A, C.H., Quantitative assessment of cervical spondylotic myelopathy by a simple walking test. Lancet 1999. 354: p. 370–373.

71. Demir A, R.M., Moonen CT, Vital JM, Dehais J, Arne P, Caillé JM, Dousset V., Diffusion-weighted MR imaging with apparent diffusion coefficient and apparent diffusion tensor maps in cervical spondylotic myelopathy. Radiology, 2003. 229(1): p. 37-43.

72. Facon D, O.A., Fillard P, Lepeintre JF, Tournoux-Facon C, Ducreux D., MR diffusion tensor imaging and fiber tracking in spinal cord compression. AJNR Am J Neuroradiol, 2005. 26(6): p. 1587-1594.

73. Budzik JF, B.V., Le Thuc V, Duhamel A, Assaker R, Cotten A., Diffusion tensor imaging and fibre tracking in cervical spondylotic myelopathy. Eur Radiol., 2010.

74. Sagiuchi T, T.S., Endo M, Hayakawa K., Diffusion-weighted MRI of the Cervical Cord in Acute Spinal Cord Injury With Type II Odontoid Fracture. J Comput Assist Tomogr. , 2002. 26(4): p. 654-656.

73

TABLES

CHAPTER 3: Systematic review Table 3.1: presents criteria in a modified version of quality assessment checklist Yes No Comments

Source description Was the source of participants adequately described?

Referral pattern Was the recruitment method adequately described? eg. Representative sample: participants were selected as consecutive or random cases.

Patients characteristics Was the population of interest adequately described for key characteristics: severity, co-morbidity, inclusion/exclusion criteria, age and sex? Yes, if all characteristics are reported. No, if the description is limited to age and sex characteristics, or none.

Representative sample

Sample size Was the sample size large enough? The rule of thumb: At least 10 cases per independent variable are required at a power of 80% and a 5% significance level (eg. The author runs a comparison for age, sex, symptom duration, pre-/post-operative neurological scores, etc).

Blinding Blinded assessor Were MRI assessors involved in the study blinded to clinical data? eg. Blinded outcome assessment: assessor was unaware of prognostic factors at the time of outcome assessment.

Baseline comparability Compared baseline performance of clinical status Is baseline performance of clinical status measured? If yes, is the absolute difference between the groups less than 10%? If yes, score the quality criterion as YES. If no, did the analysis take into consideration the baseline imbalance (for example, analysis of co-variance or analysis by change scores between groups? eg. Statistical adjustment: multivariate analyses conducted with adjustment for potentially confounding factors. If yes, score the quality criterion as YES. If no, score the quality criterion as NO. Otherwise, if no comparison is completed, then NA

74

Compared baseline performance of other predictive variables Is baseline performance of age, sex and symptom duration measured? If yes, is the absolute difference between the groups less than 10%? If yes, score the quality criterion as YES. If no, did the analysis take into consideration the baseline imbalance (for example, analysis of co-variance or analysis by change scores between groups? eg. Statistical adjustment: multivariate analyses conducted with adjustment for potentially confounding factors. If yes, score the quality criterion as YES. If no, score the quality criterion as NO. Otherwise, if no comparison is completed, then NA

Complete Was follow up reported? If yes, was follow-up complete? Follow-up >80%: outcome data were available for at least 80% of participants at one follow-up point. If not, then score the quality criterion as NO.

Comparison of drop outs with remained Were those followed up comparable to those who dropped out?

Follow-up

Reasons of drop outs Were reasons for loss to follow-up provided?

Valid Were outcome measures adequately valid? Yes, if the prognostic study tested the validity of measurements used or referred to other studies which had established validity. Otherwise, no.

Validation of outcome measurement

Reliable Were outcome measures adequately reliable? Yes, if the prognostic study tested the reliability of measurements used or referred to other studies which had established reliability. Otherwise, no.

Validation of predictive factor measurement

Defined Were definitions or descriptions of MRI predictor adequately provided? Yes, if there is clear indication of measurement method such as detailed description of MRI protocol including planes (axial/sagittal and thickness of slices). Otherwise, no.

75

Reliable Were predictive factors measures adequately reliable? Yes, if inter/intra-observer reliability tests with/without coefficient value are reported (eg. Cronbach alpha or Kappa coefficients). Otherwise, no.

76

Table 3.2: Presents the summary of methodological limitations in a format of modified version of quality assessment checklist designed by the Cochrane collaboration group et al (2007) [No-0, Yes -1].

Representative sample Blinding Baseline comparability Follow up

Validity of clinical scales

Validity of exposure variables

Cohort Source description

Referral pattern

Patients characteristics

Sample size

Blinded assessor

Compared severity score

Compared baseline

performance

Complete Reliability Validity Definition

Reliability

Prospective cohort study

Nagata et al. 1990 0 0 0 0 0 0 0 0 1 0 1 0 Fukushima et al. 1991 0 0 0 1 0 0 0 1 1 0 1 0 Yukawa et al. 2007 0 0 1 1 1 1 0 0 1 0 1 1 Yone et al. 1992 0 0 0 0 0 0 0 0 1 0 0 0 Okada et al. 1993 0 0 0 1 0 0 0 1 1 0 1 0 Chen et al. 2001 1 1 0 1 1 1 0 1 1 0 1 1 Papadopolous et al. 2004 0 1 0 0 1 0 1 1 1 0 0 0 Singh et al. 2001 1 1 0 0 1 0 0 1 1 0 0 1 Mastronardi et al.2007 0 1 0 0 1 0 0 1 0 0 1 0 Fernandez et al. 2007 1 0 0 1 0 1 0 1 0 0 1 0 Retrospective cohort study

Nagata et al. 1996 0 0 0 1 0 0 0 1 1 0 1 0 Uchida et al. 2005 1 0 0 0 0 NA NA 1 1 0 0 0 Kasai et al. 2001 1 0 0 1 0 NA NA 0 1 0 1 0 Chung et al. 2002 0 0 0 0 0 1 0 0 1 0 0 0 Wada et al. 1999 1 0 0 0 1 1 1 0 1 0 1 0 Morio et al. 2001 0 0 0 1 1 1 1 1 1 0 1 1

77

Representative sample Blinding Baseline comparability Follow up

Validity of clinical scales

Validity of exposure variables

Cohort Source description

Referral pattern

Patients characteristics

Sample size

Blinded assessor

Compared severity score

Compared baseline

performance

Complete Reliability Validity Definition

Reliability

Yamazaki et al. 2002 1 0 0 0 0 1 0 1 1 0 0 0 Wada et al. 1995 0 0 0 0 0 1 0 1 1 0 1 0 Houten et al. 2003 0 1 0 0 0 1 0 1 0 0 1 0 Park et al. 2006 0 0 0 1 0 1 0 1 0 0 1 0 Case series Mizuno et al. 2003 0 0 0 1 0 0 0 1 1 0 0 0 Matsuyama et al., 2004 0 0 0 0 0 1 0 1 1 0 1 0 Matsuda et al. 1991 1 0 0 1 0 0 0 1 1 0 1 0

78

Table 3.3: Study design, sample size, type of outcome measures and level of evidence

Citation Study design Sample N=

Outcome measure scale Level of Evidence*

Nagata et al. 1990 Prospective cohort

300 JOA IV Follow up No

Inception point No Fukushima et al. 1991 Prospective cohort

55 JOA I

Follow up YES Inception point YES

(onset) Yukawa et al. 2007 Prospective cohort

142 JOA IV

Follow up NO Inception point NO

Yone et al. 1992 Prospective cohort

140 JOA IV Follow up NO

Inception point NO Okada et al. 1993 Prospective cohort

74 JOA IV

Follow up YES Inception point NO

(symptom duration?) Papadopolous et al. 2004

Prospective cohort

42 JOA IV Follow up YES

Inception point NO (symptom duration?)

Singh et al. 2001 Prospective cohort

69 Walking Test I Follow up YES

Inception point YES (surgery)

Chen et al. 2001 Prospective cohort

64 mJOA IV Follow up YES

Inception point NO Mastronardi et al.2007 Prospective cohort

42 mJOA I

Follow up YES Inception point YES (onset of symptoms)

79

Fernandez et al. 2007 Prospective cohort

67 mJOA I Follow up YES

Inception point YES (3 months before surgery)

Nagata et al. 1996 Retrospective cohort

173 JOA IIc

Uchida et al. 2005 Retrospective cohort

135 JOA IIc

Kasai et al. 2001 Retrospective cohort

128 JOA IIc

Chung et al. 2002 Retrospective cohort

113 JOA IIc

Wada et al. 1999 Retrospective cohort

85 JOA IIc

Morio et al. 2001 Retrospective cohort

73 JOA IIc

Yamazaki et al. 2002 Retrospective cohort

64 JOA IIc

Wada et al. 1995 Retrospective cohort

31 JOA IIc

Houten et al 2002 Retrospective cohort

38 mJOA IIc

Park et al 2006 Retrospective cohort

80 NCSS IIc

Mizuno et al. 2003

Case series study 134 JOA IV

Matsuyama et al., 2004


Matsuda et al. 1991


* http://www.eboncall.org/content/levels.html: NHS R&D Centre for Evidence-Based Medicine (Bob Phillips, Chris Ball, Dave Sackett, Brian Haynes, Sharon Straus and Finlay McAlister) (2002)

80

Table 3.4: Data extracted were groups of MRI features (signal intensity, spinal cord compression and spinal canal compromise) Table 3.4 (I): Descriptions of increased signal intensity (ISI) of the spinal cord in T2-/T1-weighted MRI

Predictive variable Author Method assessments:

Matsuda et al. 1991 1.5-tesla superconductive magnet* and a surface coil was used. The slices were from 3 to 5 mm thick.

Papadopolous et al. 2004 No description

Absence/presence of T2 signal intensity changes on sagittal view

Yukawa et al. 2007 1.5-T A surface coil was used. The slice width was 4 mm Absence/presence of T2 signal intensity changes on axial views

Mizuno et al. 2003

Snake-eye appearance was defined as one left- and one right-sided small round or elliptical high signal intensity lesion in the central gray matter near the ventrolateral posterior column

Absence/presence of T2 signal intensity changes (type of plane is not mentioned)

Singh et al. 2001 No description

Yukawa et al. 2007

Grade 0 none Grade 1 light (obscure) Grade 2 intense (bright)

Degree of intensity on sagittal T2WI

Chen et al. 2001

Type 0 no SI on T2 Type 1 (>50%) faint and fuzzy border Type 3 (>50%) intense and well-defined border

Three patterns of axial T1/sagittal T2 –weighted sequences

Morio et al. 2001 Alafifi et al. 2007 Mastronardi et al.2007

(A) normal intensity on both T1- and T2-weighted images (B) normal intensity on T1- weighted and high signal intensity on T2-weighted images (C) low signal intensity on T1-weighted and high signal intensity on T2-weighted images

Signal-intensity ratio on sagittal T2-WI

Okada et al. 1993

The intensity of the intramedullary, sagittal T2-weighted MRI cord signal at maximal compressed levels divided by comparable readings at contagious noncompressed sites

81

Table 3.4 (II): Descriptions of degree of spinal cord compression and/or canal compromise for cervical spondylotic myelopathy by magnetic resonance imaging (MRI) finding


Yone et al. 1992

No description Slice thickness: 5 mm

Anterioposterior diameter on sagittal T1WI

Kasai et al. 2001 A 1.5-T MRI device The slice width was set at 5 mm and the number of slices at 7.Sagittal view of T1-/T2-weighted images MRI cumulative score: 6 degrees of spinal stenosis captured on T1/T2-weighted sagittal imaging: Grade 0: normal image; Grade 1: either the anterior or posterior subarachnoid space is not maintained; Grade 2: both the anterior and posterior subarachnoid spaces are not maintained; Grade 3: either anterior or posterior spinal cord deformity, but the posterior or anterior subarachnoid space is maintained; Grade 4: either anterior or posteror spinal cord deformity is observed, and the posterior or anterior subarachnoid space is not maintained; Grade 5: spinal cord deformity is observed both anteriorly and posteriorly

Degrees of spinal cord on sagittal T1WI

Nagata et al.1996 None (0) Mild (1; flattening or concavity of the anterior surface only) Moderate (2; <50% reduction in maximal sagittal diameter) Severe (3; >50% reduction in sagittal diameter)

Okada et al. 1993 The transverse area at the site of maximal cord compression was measured with a digitizer linked to a computer

Transverse area on axial T1WI

Fukushima et al. 1991 MRI axial views perpendicular to the spinal cord were obtained with a 0.5 tesla superconducting MRI system Critical value of transverse area is 0.45 cm2

82

Chung et al. 2002 Thickness of slices was not reported Pre-operative T1-weighted axial imaging with a Signa 1.5-tesla Compression ratio=a/b: a Smallest sagital diameter of the spinal cord, b broadest transverse diameter of the cord at the same level

Chen et al. 2001

Cord compression ratio = sagittal diameter/transverse diameter The imagers were superconducting 1.5-T MR systems Section thickness was 4 mm with 1-mm gap on both sagittal and transverse images.

Compression ratio on axial T1-weighted

Okada et al. 1993

(Saggital diameter/transverse diameter)*100% MRI examinations were performed with a 0.5 Tesla Slice thickness =10 mm

Degree of diameter on sagittal view

Houten et al.2002 Thickness is not reported Grade 0: 360 degree cushion of CSF around SC Grade 1: loss of CSF cushion without indentation of SC. May have slight anterior cord flattening Grade 2: mild cord compression Grade 3: Severe spinal cord compression

Table 3.4 (III): Area of high T2-signal change for cervical spondylotic myelopathy by magnetic resonance imaging (MRI) finding


Wada et al. 1999 1.5-T with surface coil. Slice thickness =3-5 mm Mastronardi et al.2007 1.5-T with surface coil. Slice thickness =5 mm

Focal/ multisegmental high MRI intensity areas Fernandez et al. 2007

No thickness of slices was reported Type 0 no intramedullary high-signal intensity on T2-weighted images Type 1 high-signal intensity involved only one segment Type 2 high signal intensity extended over two segments

83

Table 3.5: Potential predictors with reported for univariate analyses and strength of association where available short (less than 6 months) and long (greater than 6 months) terms follow –up. Table 3.5 (I): Signal intensity changes as potential predictors

Prognostic factors Author Outcome Length of follow-up

Statistical significance

Strength of association

Yukawa et al 2007 JOA Long term p=0.033 p=0.0012

NA * NA **

Yone et al 1992 JOA Unknown p>0.05 NA * Papadopolous et al 2004 JOA Long term p>0.05

p<0.001 NA * NA **

Absence/presence of T2 signal intensity changes on sagittal view

Matsuda et al 1991 JOA Short term p<0.05 p<0.05

NA * NA **

Wada et al 1995 JOA Short term p>0.05 p>0.05

NA * NA **

Yamazaki et al 2002 JOA Long term p>0.05 NA *

Absence/presence of T2 signal intensity changes on axial/sagittal views

Chung et al 2002 JOA Long term p>0.05 NA * Absence/presence of T2 signal intensity changes on axial views

Mizuno et al 2003

JOA Short term p<0.001 NA *

Yukawa et al 2007 JOA Long term p=0.020 NA * Chen et al 2001 JOA Long term p=-0.018 NA *

Degree of intensity on sagittal T2WI

Uchida et al 2005 JOA Long term p>0.05 NA * Three patterns of axial T1/sagittal T2 –weighted sequences

Morio et al 2001 JOA Long term p = 0.0259 NA *

Signal-intensity ratio on sagittal T2-WI Okada et al 1993 JOA Unknown p<0.001 r=0.537 OPLL * r=0.426 CSM *

Fernandez et al 2007 mJOA Long term p>0.05 NA ** Absence/presence of T2 signal intensity changes on sagittal view Houten et al 2003 mJOA Short term p>0.05 NA ** Three patterns of axial T1/sagittal T2 –weighted sequences

Mastronardi et al 2007 mJOA Long term p=0.001 NA **

Absence/presence of T2 signal intensity changes (type of plane is not mentioned)

Singh et al 2001

Nurick Walking

Short term

p=0.03 p=0.0011

r=0.26 ** NA **

Area of signal intensity changes Wada et al 1995 JOA Short term p>0.05 NA *

84

p>0.05 NA ** Wada et al 1999 JOA Long term p<0.05

p<0.05 NA * NA **

Mastronardi et al 2007 mJOA Long term p=0.001 p<0.05

NA * NA **

Fernandez et al 2007 mJOA Long term p=0.001 NA * Table 3.5 (II): Severity of spinal cord compression as potential prognostic indicator Prognostic factors Author Outcome Length of

follow-up Statistical significance

Strength of association

Yone et al 1992

JOA Unknown p>0.05 p>0.05

NA * NA **

Anterioposterior diameter on sagittal T1WI

Kasai et al 2001 JOA Long term p<0.01 r=-0.436 * Degrees of spinal cord on sagittal T1WI Nagata et al 1996 JOA Long term p<0.05 NA **

Okada et al 1993 JOA Unknown p<0.01 (CSM/OPLL)

r=0.678/0.586 *

Fukushima et al 1991 JOA Long term p<0.05 r=0.295**

Transverse area on axial T1WI

Morio et al 2001 JOA Long term p=0.0517 p=0.0015

r=0.243 * r=0.398 **

Okada et al 1993 JOA Unknown p>0.05 NA * Chen et al 2001 JOA Long term p=0.836 r=0.026 *

Compression ratio on axial T1-weighted

Chung et al 2002 JOA Long term p<0.05 NA * Uchida et al 2005 JOA Long term p<0.05 in OPLL

p>0.05 in CSM NA ** NA **

Rate of flattening of the cord

Nagata et al 1990 JOA Long term p>0.05 NA * Grade 0 360 degree cushion of CSF around SC on…..

Houten et al 2003

mJOA Short term NA NA **

Degree of diameter on sagittal view Singh et al 2001 Nurick Short term p=0.60 r=0.07 ** Cord deformity on axial T1-weighted MRI Matsuyama et al 2004 JOA Short term NA

NA NA * NA **

*- recovery rate ** - post –operative functional score

85

Table 3.6: RESULTS - PREVIOUS PREDICTIVE MODELS

Study Name Year

Population Number

Fashion of selection

Range of years

Data collection

Statistics Outcome Measure

Recovery percentage

& Mean post-operative

score

Explained variation

(r2)

Variables in final model

Park 2006

80 Non-consecutive CSM cases 2000-2003 3 months after surgery

Patients charts

Stepwise, multivariate regression

NCSS Recovery (%) Maximum score 14

62.2% 25.2% Duration of symptoms Number of high intensity segments

Chen 2001

64 consecutive CSM cases, 1999-2000 6 months after surgery

Clinical database

ANCOVA mJOA Recovery (%) Maximum score 21

79.3%

47.9% Age Degree of intrinsic signal changes

Morio 2001

1998-1999 Non-consecutive CSM cases, Mean 3.4 years, range, 0.5–10 years after surgery

Clinical database

Stepwise, multivariate regression

JOA Recovery (%)

& Mean post score Maximum score 17

180% 14.5

29.7% 70.3%

Recovery percentage: Age Duration of symptoms Signal patterns Post-JOA: Age Duration Signal patterns

86

Baseline score Okada 1993

74 non-consecutive CSM cases No follow-up time was provided

Clinical database

Multiple regression analysis

JOA Recovery (%) Maximum score 17

(OPLL) 54.7% (CSM) 52.2% (CDH) 12.7%

71.8% 70.2%

Transverse area Signal Intensity ratio Duration of symptoms

Uchida 2005

1988-2001 Non-consecutive OPLLCSM cases Mean 8.3 years, range, 1.0–12.8 years

Medical records

Multiple regression analysis/ PCC(partial correlation coefficient)

JOA Recovery (%) Maximum score 17

Not reported Not reported

CSM group: Anterior Surgery Preoperative JOA score Crandall and Batzdorff’s type Radiographic abnormality Level of compression Rate of flattening of the cord Increased transverse area of the cord Grade of SCEP Laminoplasty Surgery Preoperative JOA score Crandall and Batzdorff’s type

87

Level of compression Rate of flattening of the cord Increased transverse area of the cord Grade of SCEP OPLL group Anterior Surgery & Laminoplasty Preoperative JOA score Crandall and Batzdorff’s type Spinal canal narrowing (preoperative CT) Type of OPLL Level of compression Rate of flattening of the cord Increased transverse area of the cord Grade of SCEP

88

CHAPTER 4: Material and Methods

Table 4.1: The mJOA scale for functional assessment for CSM* Score I. Motor dysfunction 0 Inability to move hands 1 Inability to eat with a spoon but able to move hands 2 Inability to button shirt but able to eat with a spoon 3 Able to button shirt with great difficulty 4 Able to button shirt with slight difficulty 5 No dysfunction II. Motor dysfunction of the lower extremities 0 Complete loss of motor and sensory function 1 Sensory preservation without ability to move legs 2 Able to move legs but unable to walk 3 Able to walk flat floor with a walking aid (such as a cane

or crutch) 4 Able to walk up and/or down stairs with hand rail 5 Moderate to significant lack of stability but able to walk

up and/or down stairs without hand rail 6 Mild lack of stability but walks unaided with smooth

reciprocation 7 No dysfunction III. Sensation 0 Complete loss of hand sensation 1 Severe sensory loss of pain 2 Mild sensory loss 3 No sensory loss IV. Sphincter dysfunction 0 Inability to micturate voluntarily 1 Marked difficulty with micturation 2 Mild to moderate difficulty with micturation 3 Normal micturation *From Benzel and colleagues, 1991.

89

Table 4.3: Standard parameters for cervical spine T1- and T2-weighted Magnetic Resonance Image (MRI) used in our study PROTOCOL: C-Spine 1.5T - start w/ 3-pl Loc & Asset Cal Series # 3 4 5 Scan Pl. / Mode Sag T2 Sag T1 Ax 3D T2 Pulse Sequence FrFSE FrFSE 3D FrFSE PSD File NPW, EDR NPW FC Name & Fast Imaging FR Options TR* / R-R#** 3200-6887 467-2616 2000-2500 TE1 / TE2* 110-119 10.1 97-106 ETL (Echo Train Length) 24-33 1--6 39 FOV (Field of View) 24-26 24-26 18-24 Slice Thickness 3 3 2.5 Spacing*** 3.3-3.5 3.3-3.5 2.5 # of Slices 13-18 13-18 24-80 Matrix 512X224 512X224 320X224 Phase FOV (Field of View) Frequency Direction A/P A/P R/L Number of excitation 2--4 1--2 1 Shim on Spatial Sat I,S,a,p I,S,a,p a Scan Time 0:26-17:22 0:55-23:43 4:35-5:44

*TE, echo time; TR, repetition time; ** R-R, rest & relaxation; ***Space, gap/space between slides.

90

CHAPTER 5: Results

Table 5.1: Characteristics of Patients with Cervical Spondylotic Myelopathy

% (No. of Patients) Characteristics n=61 Mean duration of symptoms ± SD (months) 21.1±18.2 Mean age ± SD (y) 56.2±11.9 Mean age (years)** <=65 years old 75% (46) >65years old 25% (15) Gender Female 31%(19) Male 69%(42) Severity of CSM*** Mild (mJOA>=15) 32% (19) Moderate (mJOA 12-14) 34% (21) Severe (mJOA<12) 34% (21) Anatomical level of stenosis C3/C4 9% (6) C4/C5 13% (8) C5/C6 25% (15) C6/C7 49% (30) Unknown 3% (2) Number of stenotic levels** One 45% (26) Two 23% (13) Three and more 32% (18) Unknown 6% (4) Signal intensity changes

91

Normal T1/Norm T2 20 (34%) Normal T1/High T2 28 (47%) Low T1/High T2 11 (19%) Surgical approach Anterior approach 42 (67%) Posterior approach 18 (30%) Anterior & posterior approach 1 (3%) Etiologies of myelopathy One etiology OPLL 6% (4) Spondylosis 37% (24) Disk 17% (11) Hypertrophic ligament flavum 2% (1) Subluxation 2% (1) Two etiologies 29% (19) Three etiologies 5% (3) Unknown 3% (2) Table 5.2: Values of the mJOA in CSM sample Baseline 12 months Change Score 95% CI for

change score mJOA functional scale

12.9+/-2.7 15.8+/-2.3 2.93+/-2.4 2.32-3.55

NOTE. Values are mean +/- SD. Abbreviation: CI, confidence interval.

92

Table 5. 3: Correlation matrix and coefficients between functional outcomes and independent variables Age Gender Duration of

symptoms Baseline score Signal

intensity changes

Transverse area Anteroposterior diameter

Number of compressed segments

Age 1.00 Gender 0.03 1.00 Duration of symptoms

0.27 -0.12 1.00

Baseline score 0.44 0.05 0.26 1.00 Signal intensity changes

0.24 0.12 0.13 0.13 1.00

Transverse area 0.27 0.13 0.08 0.29 0.39 1.00 Anteroposterior diameter

0.21 0.12 0.03 0.19

0.41 0.62

1.00

Number of compressed segments

0.20 0.12 0.20 0.32 0.24 0.35 0.23 1.00

93

Table 5.4: Unadjusted beta value estimates for independent variables (univariable analysis)

Variable Coefficient 95% CI P Value

R2

Baseline mJOA Age as dichotomized* <=65 years old >65 years old

-2.83 -1.42, -4.24 0.0002 0.20

Age as continuous -0.08 -0.13, -0.03 0.0051 0.12 Gender* 0.30 -1.15, 1.75 0.68 0.003 Duration of symptoms as dichotomized* <=12 months >12 months

-1.55 -2.97, -0.15 0.03 0.07

Duration of symptoms as continuous 0.00 -0.04, 0.04 0.97 0.00 TA as dichotomized 0.96 -0.4, 2.32 0.17 0.03 TA as continuous* 0.06 0.02, 0.10 0.02 0.08 AP diameter 0.43 -0.13, 0.99 0.14 0.03 Intensity signal changes* Low T1/high T2 vs. Normal T1/High T2 Low T1/high T2 vs. Normal T1/Norm T2

0.75 0.99

-0.16, 2.64 -0.96, 2.98

0.61 0.02

Number of compressed segments* ≥ 3 vs. 2 compressed segments ≥ 3 vs. 1 compressed segment

2.35 1.06

0.57, 4.13 -1.01, 3.13

0.04

0.10

Final mJOA

Baseline mJOA* 1.014 <.0001 0.30 Age as continuous -1.002 -3.005, 1.001 0.01 0.11 Age as dichotomized* <=65 years old >65 years old

-1.072 -3.110 , 0.966 <.0001 0.22

94

Gender* -1.018 -3.057, 1.022 0.33 0.06 Duration of symptoms as continuous 1.0 -1.003 , 3.003 0.72 0.002 Duration of symptoms as dichotomized* <=12 months >12 months

-1.034 -3.075, 1.007 0.09 0.05

TA as continuous* 1.0 -1.005, 3.005 0.24 0.02 TA as dichotomized 1.01 -1.030, 3.050 0.56 0.01 AP diameter as continuous 1.005 -1.012, 3.022 0.49 0.01 Intensity signal changes* Low T1/High T2 vs. Normal T1/High T2 Low T1/High T2 vs. Normal T1/NormT2

1.016 1.038

-1.037, 3.069 -1.017, 3.093

0.33

0.04

Number of compressed segments* ≥ 3 vs. 2 compressed segments ≥ 3 vs. 1 compressed segment

-1.00 1.03

-3.055 , 1.055 -1.017, 3.077

0.78

0.01

* Chosen exposure variables for multivariable analysis

95

Table 5.5: Statistical details of full models (multivariable analysis)

Dependent Variable Independent Variables

Coefficient 95% CI MSE for the Model

P Value for the Model

Adjusted R2 for the Model

Baseline mJOA score Age -2.83 -1.420, -4.240 2.44 p=0.0002 20% Follow-up mJOA score adjusted for baseline mJOA score

Age -1.04 -3.081, 1.001 0.06 p<0.0001 36%

96

FIGURES Figure D.1: Measurements for the antero-posterior diameter (AP) (A) and transverse area (TA) measurements of the spinal cord using T2-weighted MR image (B).

Figure D.2: T1-weighted image of the sagittal view revealing hypointensity in the spinal cord (C) and T2-weighted image of the sagittal view showing hyperintensity in the spinal cord (D) before surgery (arrow).

97

Figure D.3: (E) Focal compression (F) Multiple level of compression.

Figure D. 4: Distribution of baseline mJOA scores.

98

Figure D. 5: Distribution of post-operative mJOA scores at 12 months.

99

CHAPTER 8

APPENDICES Appendix 1 Search strategy (results: November 28, 2008) Database Searches # Ovid MEDLINE(R) 1. Magnetic Resonance Imaging/

2. (functional adj6 MRI).mp. 3. fMRI.mp. 4. (functional adj6 magnetic resonance imag:).mp. 5. magnetic resonanc: imag:.mp. 6. mr tomograph:.mp. 7. nmr imag:.mp. 8. nmr tomograph:.mp. 9. zeugmatograph:.mp. 10. functional mri:.mp. 11. chemical shift imag:.mp. 12. magnetization transfer contrast imag:.mp. 13. (mri adj2 scan:).mp. 14. proton spin tomograph:.mp. 15. or/1-14 16. exp cohort studies/ 17. exp prognosis/ 18. exp morbidity/ 19. exp mortality/ 20. exp survival analysis/ 21. exp models, statistical/ 22. prognos*.tw. 23. predict*.tw. 24. course*.tw. 25. diagnosed.tw. 26. cohort*.tw. 27. death.tw. 28. exp case-control studies/ 29. disease-free survival.mp. 30. medical: futil:.mp. 31. treatment outcome:.mp. 32. treatment failure:.mp. 33. exp disease progression/ 34. (disease adj1 progress:).mp. 35. fatal outcome:.mp. 36. hospital mortality:.mp. 37. exp survival analysis/ 38. natural histor:.mp.

6751

100

39. or/16-38 40. spinal cord diseases/ or spinal cord compression/ 41. cervical spondylotic myelopath:.mp. 42. cervical spond: myelopath:.mp. 43. (cervical adj2 myelopath:).mp. 44. spinal canal.mp. [mp=title, original title, abstract, name of substance word, subject heading word] 45. spinal cord.mp. [mp=title, original title, abstract, name of substance word, subject heading word] 46. spin: cord compress:.mp. 47. exp Cerebrospinal Fluid/ 48. cerebrospinal fluid:.tw. 49. central cord syndrome/ 50. or/40-49 51. 50 and 39 and 15 52. exp animals/ 53. exp human/ 54. 52 not (52 and 53) 55. 51 not 54

EMBASE 1. exp Magnetic Resonance Imaging/ 2. (functional adj6 MRI).mp. 3. fMRI.mp. 4. (functional adj6 magnetic resonance imag:).mp. 5. magnetic resonanc: imag:.mp. 6. mr tomograph:.mp. 7. nmr imag:.mp. 8. nmr tomograph:.mp. 9. zeugmatograph:.mp. 10. functional mri:.mp. 11. chemical shift imag:.mp. 12. magnetization transfer contrast imag:.mp. 13. (mri adj2 scan:).mp. 14. proton spin tomograph:.mp. 15. or/1-14 16. exp cohort studies/ 17. exp prognosis/ 18. exp morbidity/ 19. exp mortality/ 20. exp survival analysis/ 21. exp models, statistical/ 22. prognos*.tw. 23. predict*.tw.

101

24. course*.tw. 25. diagnosed.tw. 26. cohort*.tw. 27. death.tw. 28. exp case-control studies/ 29. disease-free survival.mp. 30. medical: futil:.mp. 31. treatment outcome:.mp. 32. treatment failure:.mp. 33. exp disease progression/ 34. (disease adj1 progress:).mp. 35. fatal outcome:.mp. 36. hospital mortality:.mp. 37. exp survival analysis/ 38. natural histor:.mp. 39. or/16-38 40. exp Spinal Cord Compression/ 41. cervical spondylotic myelopath:.mp. 42. cervical spond: myelopath:.mp. 43. (cervical adj2 myelopath:).mp. 44. spinal canal compromis:.mp. 45. spin: cord compress:.mp. 46. central cord syndrome/ 47. medulla: compress:.mp. 48. (spinal cord: adj2 pinch:).mp. 49. conus medullaris syndrome.mp. [mp=title, original title, abstract, name of substance word, subject heading word] 50. conus medullaris syndromes.mp. [mp=title, original title, abstract, name of substance word, subject heading word] 51. or/40-50 52. 39 and 51 and 15 53. exp animals/ 54. exp human/ 55. 53 not (53 and 54) 56. 52 not 55

102

Appendix 3: RELIABILITY

A comparison of four quantitative methods to assess spine stenosis and

spinal cord compression on magnetic resonance imaging in patients with

cervical spine myelopathy

F. 1. INTRODUCTION AND OVERVIEW F. 2. STUDY OBJECTIVE F. 3. HYPOTHESIS F. 4. STUDY DESIGN F. 5. TARGET POPULATION F. 6. DEFINITION OF MR IMAGING PARAMETERS

F.6.1 Strategies to improve reliability of MR imaging parameters (variation due to) F.6.1. a. Clinicians

F.6.1. b. Patients F.6.1. c. MR Imaging protocol F.6.1. d. Measurement errors

F. 7. SAMPLE SIZE F. 8. DATA ANALYSIS

INTRODUCTION AND OVERVIEW

Lack of standardized approaches to assess the severity of cord compression in the

setting of CSM may contribute to variability in interpretations of MRI-based features.

The process of developing a radiological measure to assess severity of CSM requires

selection of most suitable imaging modality, appraisal of reliability and determination of

validity. A valid measurement instrument needs to be reliable or reproducible, even

though reliability is not sufficient condition for validity. Reliability measures the degree

of consistency across repeated assessments of different patients by the same rater

(intrarater reliability) or agreement across different raters for the same patient (interrater

reliability). The estimate of reliability is significant because: 1) reliability represents the

minimal requirement for a valid clinical measure, and 2) efficiency of clinical trials relies

on reliable measurements.

103

STUDY OBJECTIVE

The objective is to investigate the intra- and inter-reliability of four published

methods of examining cord stenosis and canal compression on axial (transverse area

[TA], anteroposterior diameter [AP]) and sagittal (maximum spinal cord compression

[MSCC], maximum canal compromise [MCC]) MR imaging planes.

HYPOTHESIS

We hypothesize that using a systematic approach to evaluate cervical canal

stenosis and spinal cord compression with a magnified software based tools, written

instructions and consistent interpretations, TA, AP, MSCC and MCC would be

reproducible imaging assessment of the severity of cord compression in CSM patients,

irrespective of clinician’s background/experience, learning and CSM severity.

STUDY DESIGN Subjects:

The patients were randomly selected from a prospectively accrued database of

CSM patients who were referred for surgical treatment in our unit which is a large tertiary

care, university-based spine center.

Procedures:

Seventeen cervical spine digital MR images were evaluated by four spine

specialists (two neurosurgery, two orthopaedic surgery), in a blinded fashion on four

separate occasions, from North America (n=2), Europe (n=1) and Asia (n=1).

TARGET POPULATION

The patients had a clinical diagnosis of myelopathy confirmed by evidence of

cord compression on MRI. This project is based on analysis of a single centre (n=65)

which is part of a larger multicentre AOSpine North America CSM Trial; n=283 cases.

104

DEFINITION OF MR IMAGING PARAMETERS (EXPOSURE VARIA BLES)

Defined Radiological Parameters

Based on three dimensions from MRI, the maximum cord compression using T2-

weighted MRI and canal compromise using T1-weighted MRI were calculated using the

following formulas [9].

Maximum spinal cord compression (%):

where Di is the anteroposterior spinal cord diameter at the level of maximum spinal cord

compression, Da is the anteroposterior spinal cord diameter at the normal levels

immediately above, and Db is the anteroposterior spinal cord diameter at the normal

levels immediately below the level of injury (Figure F.1.).

Maximum canal compromise (%):

where Di is the anteroposterior spinal canal diameter at the level of maximum spinal cord

compression, Da is the anteroposterior spinal canal diameter at the normal levels

immediately above, and Db is the anteroposterior spinal canal diameter at the normal

levels immediately below the level of injury. Measurements of the normal canal

anteroposterior diameter should be taken at midvertebral body level.

Transverse area: was identified as the site of greatest compression using T2 axial view of

the spinal cord [3] (Figure F.2.).

Anteroposterior diameter: was identified as smallest sagittal diameter of the spinal cord,

[Yone et al 1992] (Figure F.2.).

105

Strategies to improve reliability of MR imaging parameters (variation due to)

Clinicians

First, raters were blinded to clinical and neurologic data. Second, raters assessed

the same patients on four occasions (or rounds), three days apart from each other to guard

against memory recall. Third, the scans were read individually and randomly. Fourth, for

validity of the experiment, the raters will be given the same images on all four occasions.

Fifth, the teaching session prior to the first round of measurements was conducted in one

meeting to ensure consistencies of images interpretations.

Patients

To ensure a range of symptoms severity for reliability testing, the modified

version of the Japanese Orthopaedic Association Scale (mJOA) (Table C.1), was used to

classify CSM into mild ( mJOA score >=15), moderate ( mJOA score 12-14) and severe

(mJOA< 12) degrees of functional disability. Of the seventeen subjects in this study, six

individuals had mild CSM (mJOA score >=15), five individuals had a moderate CSM

(mJOA score 12-14) and six individuals had severe CSM (mJOA score <12). As

described in Table F.1., the cases had varying numbers of levels of cord compression due

to a variety of different pathologies, which are commonly seen in clinical practice

including spondylosis, disc herniation, ossification of the posterior longitudinal ligament,

hypertrophy of the ligamentum flavum, degenerative subluxation and congenital stenosis.

MR Imaging protocol

The preoperative mid-sagittal T1-weighted, axial and midsagittal T2-weighted

MRI series of all patients were included in a CD-ROM with eFilm Lite (2003) and

Mango 2.0 software (Multi-Image Analysis GUI). Figures F.1. - F.2. demonstrate

examples of the measurement techniques used in this study. The patients were evaluated

by the spine specialists using operational guidelines, which detailed the methodologies

described by the original studies from Fehlings et al. 2007 [9] for maximum spinal cord

compression [MSCC] and maximum canal compromise [MCC], from Okada et al.1993

[3] for transverse area [TA], and from Yone et al. 1992 [reference] for anteroposterior

diameter [AP]. Raters were asked to amplify 200% the images, consistently across all

patients, using the E-film and Mango programs before measuring parameters, potentially

106

reduces the procedural variability of the measurements of cervical canal stenosis and

spinal cord compression in CSM.

SAMPLE SIZE

Given that the primary objective of this study was to assess the reliability of four

instruments in the setting of myelopathy, we calculated a sample size of seventeen

patients based on four raters carrying out four separate ratings of each subject in order to

obtain results with a Type I error of 5%, a minimal power of 80%, and a desired

interclass correlation coefficient (ICC) of 0.75 (expected level of ICC of 0.9) [11] [12].

DATA ANALYSIS

Data were entered and all analyses were performed using constructed data sets in

SAS, version 9.2 Software and Microsoft Excel 2003 software packages. Interrater and

intrarater reliability was evaluated using ICCs derived from two-way analysis of

variance (ANOVA)[13]. In general, ICCs range from 0 to 1, where 0 indicates no

agreement and 1 indicates perfect agreement/consistency [14]. Interpretation of the ICC

values was carried out according to the criteria proposed by Burdock et al [15]. The

criteria of Burdock et al [15] to interpret the minimum ICC value of 0.75 were used as a

reference for an excellent level of agreement/consistency. However, it is important to

acknowledge that such criteria are somewhat arbitrary.

Intrarater reliability and Interrater reliability.

ICC is a relative index of variability and ICC of 0.95 means that an estimated

95% of the observed score variance is due to true variance between subjects. The ICC

estimates were calculated according to Shrout-Fleiss models for random effects (Model

2) using 1) 2-way model, 2) random effect model with absolute agreement (the raters

assumed be randomly selected from the population), 3) include systematic error, 4) mean

score (the scores in the analysis represent the average of all trials from each subject)

(Fleiss et al. 1979). The intra-rater and inter-rater ICCs establish reliability of ratings

including systematic differences between raters.

Data are represented in terms of estimates of the true mean, standard deviations,

standard error of the mean (SEM) and confidence intervals [17].

107

RESULTS

As described in Table F.1, our study population was composed of four females

and thirteen males (age, 37–82 years; mean, 54.5 years) with varying severity of CSM.

Table F.1 Characteristics of the patients with Cervical Spondylotic Myelopathy (CSM)

Gender Age (yrs) Etiology of CSM

Number of stenotic segments

Severity of CSM by mJOA Grades Mild (mJOA score >=15) Moderate (mJOA score 12-14) Severe (mJOA score <12)

Male 50 Spondylosis + CS 1 18 Male 53 Spondylosis + CS 2 17 Male 52 Spondylosis + CS 3 15 Male 43 OPLL + HLF 2 15 Male 65 Disc 4 16 Male 60 Spondylosis 2 15 Male 38 Spondylosis + CS 1 13 Male 68 Spondylosis + SL 8 13 Male 61 Spondylosis + HLF 3 14 Male 37 Disc herniation 1 14 Female 54 SL 2 12 Male 82 OPLL 2 11 Male 52 OPLL + CS 4 8 Female 58 HLF 3 10 Male 59 SL+ CS + HLF 4 10 Female 55 Spondylosis 1 11 Female 40 Disc herniation 1 10

***mJOA - modified version of Japanese Orthopaedic Association Scale

(CSM - cervical spondylotic myelopathy/ SL- subluxation / CS- congenital stenosis / HLF- Hypertrophic ligament flavum/ OPLL-

Ossification of the posterior longitudinal ligament).

Descriptive statistics

The differences among the four raters for all four radiological parameters (MCC,

MSCC, TA and AP) met statistical significance based on two-way ANOVA with

Bonferroni post-hoc analysis (Table F.2).

The transverse area of spinal cord ranged from 32.8 to 122.0 mm2, with mean

value of 74.8±15.67 mm2, 80.0±23.17 mm2, 59.6±19.89 mm2 and 71.4±16.48 mm2 for

Rater 1-4, respectively, with the largest deviation reported by Rater 2 (Table F.2). Rater

2 and Rater 3 had consistently different ratings from Rater1 and 4 (p<0.05).

108

The anteroposterior diameter of spinal cord ranged from 0.40 to 0.43mm, with

mean value of 0.41±0.09mm, 0.43±0.07mm, 0.40± 0.08mm and 0.40±0.08 mm for Rater

1-4, respectively, with the largest deviation reported by Rater 1 (Table F.2). Rater 2 had

consistently different ratings from Rater 3 and 4 (p<0.05).

The maximum canal compromise ranged from 77.2 to 93.6, with mean value of

82.0±2.25, 82.4±3.71, 85.7±3.01 and 82.6±2.53 for Rater 1-4, respectively, with the

largest deviation reported by Rater 2 (Table F.2). Rater 3 had consistently different

ratings from Rater 1, 2 and 4 (p<0.05).

The maximum spinal cord compression ranged from 78.3 to 89.1, with mean

value of 82.8±2.71, 82.4±2.71, 84.1±2.43, and 82.1±2.37 for Rater 1-4, respectively, with

the largest deviation reported by Rater 1 and 2 (Table F.2). Rater 3 had consistently

different ratings from Rater 1, 2 and 4 (p<0.05).

Table F.2 presents the results in terms of means, standard deviations, minimum and

maximum values of 17 cases.

Measure (Mean±SD, Min, Max)

Rater 1 Rater 2 Rater 3 Rater 4

Transverse Area (TA) 74.8±15.67 38.0, 98.0

80.0±23.17 40.1, 122.0

59.6±19.89 32.8, 103.7

71.4±16.48 30.6, 92.2

Anteroposterior Diameter (AP)

0.41±0.09 0.2, 0.6

0.43±0.07 0.2, 0.6

0.40± 0.08 0.2, 0.6

0.40±0.08 0.2, 0.5

Maximum Canal Compromise (MCC)

82.0±2.25 78.8, 88.3

82.4±3.71 77.2, 89.8

85.7±3.01 82.1, 93.6

82.6±2.53 80.0, 90.2

Maximum Spinal Cord Compression (MSCC)

82.8±2.71 78.3, 88.5

82.4±2.71 79.3, 88.8

84.1±2.43 80.9, 89.1

82.1±2.37 78.5, 87.5

Assessment of Intrarater Reliability

Using the Shrout-Fleiss model for random effects, the intrarater consistency ICC’s

were 0.82, 0.99, 0.98, 0.88 for the transverse area of spinal cord, 0.76, 0.91, 0.88, 0.84

for the anterposterior diameter of spinal cord, 0.76, 0.89, 0.85, 0.76 were for the

assessment of maximum spinal cord compression using the T2-weighted MRIs for Rater

1-Rater 4, respectively; and 0.82, 0.97, 0.80, 0.72 were for the measurement of maximum

spinal compromise using the T1-weighted MRI for Rater 1-Rater 4, respectively.

109

Consistently, Rater 2 has ratings above the other three raters. According to the general

guidelines by Burdock et al. [15] , in our study, all four measurement methods had an

acceptable consistency (ICC values higher than 0.75) (Table F.3).

Table F. 3 outlines inter-observer agreement ICC values using the Shrout-Fleiss model

for random effects regarding spinal cord and canal deformities evaluated by TA, AP,

MSCC and MCC, respectively.

Measure TA AP MCC MSCC

Intra-rater (ICC, SEM*, 95% CI**)

Rater 1 0.82, 13.3 (0.62-0.93)

0.76, 0.06 (0.73-0.79)

0.82, 1.96 (0.61-0.93)

0.76, 2.56 (0.53-0.91)

Rater 2 0.99, 3.9 (0.94-1.00)

0.91, 0.04 (0.87-0.95)

0.97, 1.34 (0.93-0.99)

0.89, 1.53 (0.77-0.96)

Rater 3 0.98, 6.3 (0.95-0.99)

0.88, 0.05 (0.83-0.93)

0.80, 2.69 (0.59-0.92)

0.85, 1.86 (0.69-0.94)

Rater 4 0.88, 11.4 (0.75-0.95)

0.84, 0.05 (0.79-0.90)

0.72, 2.63 (0.43-0.89)

0.76, 2.36 (0.49-0.90)

*SEM= square root of MSE ** ICC 95% CI: ICC ± 1.96*SD*squared root of [ ICC (1 - ICC)], where SD = square root of (sst/n-1) (Weir et al 2005).

Assessment of Interrater Reliability

Using the Shrout-Fleiss model for random effects, the interrater agreement ICC’s

were 0.68, 0.69, 0.73 and 0.76 on 1st-4th session for the transverse area of spinal cord,

0.86, 0.72, 0.68, and 0.52 on 1st-4th session for the anterposterior diameter of spinal cord,

0.83, 0.65, 0.62, and 0.65 on 1st-4th session were for the assessment of maximum spinal

cord compression using the T2-weighted MRIs, and 0.46, 0.64, 0.46 and 0.52 on 1st-4th

session were for the measurement of maximum spinal compromise using the T1-weighted

MRI. Although, mean ICC’s consistently improved from session 1 to session 4 for

transverse area measurements (Table F.4), graphical representation illustrated normal

fluctuations (Figure F. 3).

110

Table F.4. Reliability Assessment (Using the Shrout-Fleiss model for random effects)

Measure TA AP MCC MSCC

Inter-rater (ICC, SEM*, 95% CI**)

Time 1 0.68, 15.6 (0.36-0.87)

0.86, 0.05 (0.84-0.88)

0.46, 3.0 (-0.01-0.76)

0.83, 2.07 (0.65-0.93)

Time 2 0.69, 17.8 (0.37-0.87)

0.72, 0.06 (0.69-0.75)

0.64, 2.79 (0.28-0.85)

0.65, 2.56 (0.30-0.86)

Time 3 0.73, 14.3 (0.42-0.89)

0.68, 0.06 (0.65-0.71)

0.46, 3.44 (-0.05-0.77)

0.62, 2.40 (0.24-0.85)

Time 4 0.76, 13.9 (0.50-0.90)

0.52, 0.06 (0.49-0.55)

0.52, 2.82 (0.10-0.79)

0.65, 2.40 (0.30-0.86)

*SEM= square root of MSE ** ICC 95% CI: ICC ± 1.96*SD*squared root of [ICC*(1 - ICC)], where SD = square root of (SST/n-1) (Weir et al 2005).

To explore the sources of systematic errors that contribute to ICCs mentioned in

Table F.3-F.4, three-way ANOVA was used to investigate time and rater as facets of

interest.

The data illustrated in Table F.5. - F.8 show the effect for trials (time facet) is

shown to be statically insignificant in three methods of spine and canal stenosis

assessment except the transverse area of spinal cord based on three-way ANOVA with

Bonferroni post-hoc analysis ([MSCC, p=0.28], [MCC, p = 0.35], [AP, p=0.12], [TA, p=

0.01]). This observation is also supported by consistently increased level of agreement

among four raters from Session 1 to Session 4 (Table F.4). However, the time differences

are illustrated as normal fluctuations (i.e. random error) (Figure F.3), indicating that

there is no systematic error in the data.

The data illustrated in Table F.5. - F.8 show the effect for rater is

shown to be statically significant in all four methods of spine and canal stenosis

assessment based on three-way ANOVA with Bonferroni post-hoc analysis ([MSCC,

p<0.0001], [MCC, p <0.0001], [AP, p=0.0008], [TA, p <0.0001]).

111

Table F.5. Analysis of Variance summary table for maximum spinal cord compression

(MSCC) measurements data set

SOURCE OF VARIATION

Df MS F Sig

Between subjects 16 68.58 (BMS) 15.30 <0.0001 Within subjects Between raters 3 53.37 (RMS) 11.90 <0.0001 Between times 3 5.76 (TMS) 1.28 0.2813 Rater*time 9 3.41 (RTMS) 0.76 0.6520 Rater*subject 48 9.05 (RSMS) 2.20 0.0004 Error (EMS) 4.48

Table F.6. Analysis of Variance summary table for maximum canal compromise (MCC)

measurements data set

Source of variation Df MS F Sig Between subjects 16 75.21(BMS) 15.20 <0.0001 Within subjects Between raters 3 193.49 (RMS) 39.09 <0.0001 Between times 3 5.46 (TMS) 1.10 0.3489 Rater*time 9 5.75 (RTMS) 1.16 0.3212 Rater*subject 48 20.76 (RSMS) 4.20 <0.0001 Error 4.95 (EMS)

Table F.7. Analysis of Variance summary table for transverse area of spinal cord (TA)

measurements data set

Source of variation Df MS F Sig Between subjects 16 3696.59(BMS) 41.17 <0.0001 Within subjects Between raters 3 5094.48(RMS) 56.74 <0.0001 Between times 3 343.369(TMS) 3.82 0.0108 Rater*time 9 157.22 (RTMS) 1.75 0.08 Rater*subject 48 700.00(RSMS) 7.80 <0.0001 Error 89.78(EMS)

112

Table F. 8. Analysis of Variance summary table for anteroposterior diameter (AP) of

spinal cord measurements data set

Source of variation Df MS F Sig Between subjects 16 0.047 (BMS) 18.92 <0.0001 Within subjects Between raters 3 0.0148(RMS) 5.85 0.0008 Between times 3 0.005(TMS) 1.99 0.1175 Rater*time 9 0.0033(RTMS) 1.33 0.2243 Rater*subject 48 0.0068(RSMS) 2.71 <0.0001 Error 0.0025(EMS)

DISCUSSION AND CONCLUSION

This project enhances the understanding of challenges in MRI interpretations in

CSM population. First, the advantage of T2W is that it provides a visual contrast to the

spinal cord due to its bright CSF. In contrast, T1W imaging shows indistinct anatomy

regions of bony canal and spinal cord typically presented in CSM population. This is

likely why MSCC provides more reliable measurements than MCC on T1W technique

(Table F.3. - F. 4). However, both measurement methods demonstrate the ability to

provide degree of spinal cord compression relative to its own normal values. Second, the

applications of software used for transverse area and anteroposterior diameter of spinal

cord are underdeveloped to establish more accurate estimates of spinal cord deformities.

For example, the application software used to assess the anteroposterior diameter

measurements appeared to hold 1-digit numbers. We suspect that the repeated reduction

to 1 digit could cause systematic build-up of error in the calculation of ICC value. Further

research requires it to utilize more rigorous mathematical procedures.

In contrast to the previously published studies (reference TA and AP), the

refinement of two published MR imaging techniques such as the TA and AP diameter

method took place with improvements in the written instructions. Furlan et al. 2007

supported the hypothesis that the interrater and intrarater reliability of MR imaging

assessments techniques are enhanced using magnified digitized images and therefore

reduce procedural variability of the measurements. In our study, the MR scans were

113

consistently magnified across all cases. Lack of publications of quantified intra- and

inter-reliability of the measurement methods listed above limit further comparisons.

Based on the findings of our study, the variances in the severity of population,

clinicians’ experience and individual approaches of MR imaging reading appear to

influence the procedural variability of measurements. Therefore, future studies should

include these details in the descriptions of study design and discussions. First, all four

methods appeared to be significantly varied by the raters’ individual interpretations based

on CSM severity. Second, specialty training seems to influence the variability of

measurements. After completing review of the circumstances of third rater’s consistently

higher ratings, it seems reasonable to speculate that the differences between raters could

have been influenced to some extent by specialty training Table F. 3. While all raters

were fellowship trained spine surgeons, Rater 2 had orthopaedic compared to

neurosurgery residency training background. Third, some individual approaches

employed by raters that were not apparent at the stage of designing protocol but crucial

for future studies. First, clinician may have an internal subjective standard as to what they

believe to be the anatomical midline of the spine on MR imaging. Secondly, fluctuations

of the internal subjective standard with the selection of the most compressed site, which

is partially contributed by the tendency of multilevel involvement as the result of

degenerative changes of spine in CSM.

Limitations

One limitation associated with statistical analysis of reliability is averaging of

ratings. If more than one measurement were performed, the means of several trials are

usually used to estimate reliability. Averaging data can increase the reliability coefficient

by minimizing the magnitude of differences between measurements. In our study, the

reliability is reported for the mean of all trials. Yet, practitioners typically administer a

single trial when determining a measure.

There are some limitations regarding our study design that are potential sources

for an increased inter-observer variation and, therefore, reduced reliability. First, a study

with one single recruitment centre might potentially systematically under- or

114

overestimate measurement errors due to particular characteristics of patients. Second, the

position of patients during MR imaging scanning might affect the results. When the

positioning is slightly changed from flexion to extension, the dural sac cross sectional

area diminishes. Despite careful selection of images, at least one report of abnormal

positioning was recognized. Third, the variations due to lack of standardized features of

imaging protocol such as different slice thicknesses of MRI scans might effect the results.

Although it is true that not all MR images had similar slice thickness that might have

introduced some bias, majority of scans (11/17) had slice thickness of 2.50 mm, the rest

had higher thickness of 3 mm. Nevertheless, methods used for the scans in this study

reflected the typical protocols available during the study period. Fourth, clinicians’ area

of expertise trained at different institutions is another potential limitation. However, we

anticipate that these limitations are actually relatively minor and reflect real world issues.

References:

1. Montgomery, D.M. and R.S. Brower, Cervical spondylotic myelopathy. Clinical syndrome and natural history. [Review] [54 refs]. Orthopedic Clinics of North America. 23(3):487-93, 1992 Jul., 1992.

2. Chen, C.J., et al., Intramedullary high signal intensity on T2-weighted MR images in cervical spondylotic myelopathy: prediction of prognosis with type of intensity. Radiology. 221(3):789-94, 2001 Dec., 2001.

3. Okada, Y., et al., Magnetic resonance imaging study on the results of surgery for cervical compression myelopathy. Spine. 18(14):2024-9, 1993 Oct 15., 1993.

4. Morio, Y., et al., Correlation between operative outcomes of cervical compression myelopathy and mri of the spinal cord. Spine. 26(11):1238-45, 2001 Jun 1., 2001.

5. Fukushima, T., et al., Magnetic resonance imaging study on spinal cord plasticity in patients with cervical compression myelopathy. Spine. 16(10 Suppl):S534-8, 1991 Oct., 1991.

6. Feinstein, A., Clinical biostatistics: XLI. Hard science, soft data, and the challenges of choosing clinical variables in research. . Clinical Pharmacology & Therapeutics, 1977. 22(0): p. 485–498.

7. Henrica C.W. de Veta, C.B.T., Dirk L. Knola, Lex M. Boutera, When to use agreement versus reliability measures. Journal of Clinical Epidemiology, 2006. 59 p. 1033–1039.

8. Wright J. G. , F.A.R., Improving the reliability of orthopaedic measurements. The Journal of Bone and Joint Surgery, 1992. 74B(2): p. 287-291.

9. Fehlings MG, F.J., Massicotte EM, et al. , Interobserver and intraobserver reliability of maximum canal compromise and spinal cord compression for evaluation of acute traumatic cervical spinal cord injury. . Spine 2006. 31: p. 1719–1725.

115

10. Bednarik, J., Kadanka, Z., Dusek, L., Kerkovsky, M., Vohanka, S., Novotny, O., Urbanek, I., Kratochvilova, D. , Presymptomatic spondylotic cervical myelopathy: an updated predictive model. . European Spine Journal, 2008. 17: p. 421–431.

11. Kraemer HC, K.A., Statistical alternatives in assessing reliability, consistency and individual differences for quantitative measures: application to behavioral measures of neonates. Psychol Bull 1976. 83: p. 914–921.

12. Walter S.D., E., M., Donner, A. , Sample size and optimal designs for reliability studies. . Statistics in medicine, 1998. 17: p. 101-110.

13. Shrout, P.E., Fleiss, J.L., Intraclass Correlations: Uses in Assessing Rater Reliability. . Psychological Bulletin, 1979. 86(2): p. 420-428.

14. Fleiss JL, C.J., The equivalence of weighted kappa and intraclass correlation coefficient as measures of reliability. . Educ Psychol Meas, 1973. 2: p. 113–117.

15. Burdock EIF, H.A., A new view of interobserver agreement. Perspect Psychol 1963. 16: p. 373–384.

16. Morris, R., ed. Assessing the reliability of clinical measurement. 1997, ed. , 1st ed. Oxford: Butterworth-Heinemann. 1-18.

17. Weir, J.P., Quantifying test-retest reliability using the intraclas correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 2005. 19(1): p. 231–240.

18. Furlan, J.C., Fehlings, M.G., Massicotte, E.M. Aarabi, B., Vaccaro, A.R. Bono, C.R., Madrazo, I. Villanueva, C., Grauer, J.N., Mikulis, M. , A quantitative and reproducible method to assess cord compression and canal stenosis after cervical spine trauma. . Spine, 2007. 32: p. 2083–2091.

19. Singh, A., et al., Clinical and radiological correlates of severity and surgery-related outcome in cervical spondylosis. Journal of Neurosurgery. 94(2 Suppl):189-98, 2001 Apr., 2001.

20. Boutin RD, S.L., Finnesey K. , MR imaging of degenerative diseases in the cervical spine. . Magn Reson Imaging Clin N Am 2000. 8: p. 471-490.

21. Emery, S., Cervical spondylotic myelopathy: diagnosis and treatment. . J Am Acad Orthop Surg 2001. 9: p. 376-88

Figure F.1: Measurements for the maximum spinal cord compression (MSCC) using T2-weighted MRI [Da,Dx,Db] and maximum canal compromise (MCC) using T1-weighted MRI [da,dx,db].

116

Figure F. 2: Measurements for the anteroposterior diameter (AP) and drawing of the transverse area (TA) of spinal cord using axial T2-weighted MRI.

118

Figure F. 3: These graphs illustrate that there was not a time dependency

(learning/fatigue) of the MCC, MSCC, AP and TA measurements for spine and canal

stenosis assessments.

119

Appendix 4 Grade of recommendation: Levels of Evidence Table (2002).

Grade of recommendation

Level of Evidence

Therapy: Whether a treatment is efficacious/ effective/harmful

Therapy: Whether a drug is superior to another drug in its same class

Prognosis Diagnosis Differential diagnosis/symptom prevalence study Economic and decision analysis

1a

SR (withhomogeneity*) of RCTs

SR (with homogeneity**) of head-to-head RCTs

SR (with homogeneity*) of inception cohort studies;CDR† validated in different populations

SR (with homogeneity*) of Level 1 diagnostic studies;CDR† with 1b studies from different clinical centres

SR (with homogeneity*) of prospective cohort studies

SR (with homogeneity*) of Level 1 economic studies

1b

Individual RCT (with narrow Confidence Interval‡)

Within a head-to-head RCT with clinically important outcomes

Individual inception cohort study with > 80% follow-up; CDR† validated in a single population

Validating** cohort study with good††† reference standards; or CDR† tested within one clinical centre

Prospective cohort study with good follow-up****

Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multi-way sensitivity analyses

A

1c All or none§ All or none case-series Absolute SpPins and SnNouts†† All or none case-series Absolute better-value or worse-value analyses‡‡

2a SR (withhomogeneity*) of cohort studies

Within a head-to-head RCT withvalidated surrogate outcomes‡‡‡

SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs

SR (with homogeneity*) of Level >2 diagnostic studies

SR (with homogeneity*) of 2b and better studies

SR (with homogeneity*) of Level >2 economic studies

2b

Individual cohort study (including low quality RCT; e.g., <80% follow-up)

Across RCTs of different drugs v. placebo in similar or different patients with clinically important or validated surrogate outcomes

Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation ofCDR† or validated onsplit-sample§§§ only

Exploratory** cohort study with good†††reference standards; CDR† after derivation, or validated only on split-sample§§§ or databases

Retrospective cohort study, or poor follow-up

Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses

2c "Outcomes" Research; Ecological studies

"Outcomes" Research Ecological studies Audit or outcomes research

3a

SR (withhomogeneity*) of case-control studies

Across subgroup analyses from RCTs of different drugs v. placebo in similar or different patients, with clinically important or validated surrogate outcome




B

3b

Individual Case-Control Study

Across RCTs of different drugs v. placebo in similar or different patients but with unvalidated surrogate outcomes

Non-consecutive study; or without consistently applied reference standards

Non-consecutive cohort study, or very limited population

Analysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations.

C 4

Case-series (andpoor quality cohort and case-control studies§§ )

Between non-randomised studies (observational studies and administrative database research) with clinically important outcomes

Case-series (and poor quality prognostic studies ***)

Case-control study, poor or non-independent reference standard

Case-series or superseded reference standards

Analysis with no sensitivity analysis

D 5 Expert opinion without explicit critical appraisal, or based on physiology,

Expert opinion without explicit critical appraisal, or based on physiology, bench research or




Expert opinion without explicit critical appraisal, or based on economic theory or "first

120

bench research or "first principles"

"first principles"; or non-randomised studies with unvalidated surrogate outcomes

"first principles" "first principles" "first principles" principles"

Source: Sackett DL, Straus SE, Richardson WS, Rosenberg WM, Haynes RB (2000) Evidence-based medicine: how to practice and teach EBM. Toronto: Churchill Livingstone.

1. These levels were generated in a series of iterations among members of the NHS R&D Centre for Evidence-Based Medicine (Bob Phillips, Chris Ball, Dave Sackett, Brian Haynes, Sharon Straus and Finlay McAlister).

2. Users can add a minus-sign "-" to denote the level of that fails to provide a conclusive answer because of: o EITHER a single result with a wide Confidence Interval (such that, for example, an ARR in an RCT is not statistically significant

but whose confidence intervals fail to exclude clinically important benefit or harm) o OR a Systematic Review with troublesome (and statistically significant) heterogeneity.

3. Grades of recommendation are shown as linked directly to a level of evidence. However levels speak only of the validity of a study not its clinical applicability. Other factors need to be taken into account (such as cost, easy of implementation, importance of the disease) before determining a grade. Grades that are currently in the guides link closely to the validity of the evidence - these will change over time to reflect better concerns that we highlight in the text of the guide or related CATs.

Notes * By homogeneity we mean a systematic review that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all systematic

reviews with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. As noted above, studies displaying worrisome heterogeneity should be tagged with a "-" at the end of their designated level.

† Clinical Decision Rule. (These are algorithms or scoring systems which lead to a prognostic estimation or a diagnostic category)

‡ See comment #2 for advice on how to understand, rate and use trials or other studies with wide confidence intervals.

§ Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it.

§§ By poor quality cohort study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both exposed and non-exposed individuals and/or failed to identify or appropriately control known confounders and/or failed to carry out a sufficiently long and complete follow-up of patients. By poor quality case-control study we mean one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded), objective way in both cases and controls and/or failed to identify or appropriately control known confounders.

§§§ Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into "derivation" and "validation" samples.

†† An "Absolute SpPin" is a diagnotic finding whose Specificity is so high that a Positive result rules-in the diagnosis. An "Absolute SnNout" is a diagnostic finding whose Sensitivity is so high that a Negative result rules-out the diagnosis.

121

‡‡ Better-value treatments are clearly as good but cheaper, or better at the same or reduced cost. Worse-value treatments are as good and more expensive, or worse and equally or more expensive.

††† Good reference standards are independent of the test, and applied blindly or objectively to applied to all patients. Poor reference standards are haphazardly applied, but still independent of the test. Use of a non-independent reference standard (where the 'test' is included in the 'reference', or where the 'testing' affects the 'reference') implies a level 4 study.

** Validating studies test the quality of a specific diagnostic test, based on prior evidence. An exploratory study collects information and trawls the data (e.g. using a regression analysis) to find which factors are 'significant'.

*** By poor quality prognostic cohort study we mean one in which sampling was biased in favour of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded, non-objective way, or there was no correction for confounding factors.

**** Good follow-up in a differential diagnosis study is >80%, with adequate time for alternative diagnoses to emerge (eg 1-6 months acute, 1 - 5 years chronic)

‡‡‡ Surrogate outcomes are considered validated only when the relationship between the surrogate outcome and the clinically important outcomes has been established in long-term RCTs.

predictive factors for outcome in patients having …...table 5.1: characteristics of patients with...

Documents