validation of an algorithm for identifying ms cases in ... · 3/5/2019  · with icd-9-cm or the...

14
ARTICLE OPEN ACCESS Validation of an algorithm for identifying MS cases in administrative health claims datasets William J. Culpepper, PhD, MA, Ruth Ann Marrie, MD, PhD, Annette Langer-Gould, MD, PhD, Mitchell T. Wallin, MD, MPH, Jonathan D. Campbell, PhD, Lorene M. Nelson, PhD, Wendy E. Kaye, PhD, Laurie Wagner, MPH, Helen Tremlett, PhD, Lie H. Chen, DrPH, Stella Leung, MSc, Charity Evans, PhD, Shenzhen Yao, MD, MSc, and Nicholas G. LaRocca, PhD, on behalf of the United States Multiple Sclerosis Prevalence Workgroup (MSPWG) Neurology ® 2019;92:e1016-e1028. doi:10.1212/WNL.0000000000007043 Correspondence Dr. Culpepper [email protected] Abstract Objective To develop a valid algorithm for identifying multiple sclerosis (MS) cases in administrative health claims (AHC) datasets. Methods We used 4 AHC datasets from the Veterans Administration (VA), Kaiser Permanente Southern California (KPSC), Manitoba (Canada), and Saskatchewan (Canada). In the VA, KPSC, and Manitoba, we tested the performance of candidate algorithms based on inpatient, outpatient, and disease-modifying therapy (DMT) claims compared to medical records review using sensitivity, specicity, positive and negative predictive values, and interrater reliability (Youden J statistic) both overall and stratied by sex and age. In Saskatchewan, we tested the algorithms in a cohort randomly selected from the general population. Results The preferred algorithm required 3 MS-related claims from any combination of inpatient, outpatient, or DMT claims within a 1-year time period; a 2-year time period provided little gain in performance. Algorithms including DMT claims performed better than those that did not. Sensitivity (86.6%96.0%), specicity (66.7%99.0%), positive predictive value (95.4%99.0%), and interrater reliability (Youden J = 0.600.92) were generally stable across datasets and across strata. Some variation in performance in the stratied analyses was observed but largely reected changes in the composition of the strata. In Saskatchewan, the preferred algorithm had a sensitivity of 96%, specicity of 99%, positive predictive value of 99%, and negative predictive value of 96%. Conclusions The performance of each algorithm was remarkably consistent across datasets. The preferred algorithm required 3 MS-related claims from any combination of inpatient, outpatient, or DMT use within 1 year. We recommend this algorithm as the standard AHC case denition for MS. RELATED ARTICLES Article The prevalence of MS in the United States: A population-based estimate using health claims data Page e1029 Views and Reviews A new way to estimate neurologic disease prevalence in the United States: Illustrated with MS Page 469 From the Department of Veterans Affairs Post Deployment Health Services (W.J.C., M.T.W.), Multiple Sclerosis Center of Excellence; University of Maryland (W.J.C.), Baltimore; Departments of Internal Medicine and Community Health Sciences (R.A.M., S.L.), Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada; Neurology Department (A.L.-G., L.H.C.), Kaiser Permanente Southern California, Los Angeles; Georgetown University School of Medicine (M.T.W.), Washington, DC; University of Colorado (J.C.), Denver; Stanford University School of Medicine (L.M.N.), CA; McKing Consulting Corp (W.E.K., L.W.), Atlanta, GA; Faculty of Medicine (Neurology) and Centre for Brain Health (H.T.), University of British Columbia, Vancouver; College of Pharmacy and Nutrition (C.E.), University of Saskatchewan; Health Quality Council (Saskatch- ewan) (S.Y.), Saskatoon, Canada; and National Multiple Sclerosis Society (N.G.L.), New York, NY. Go to Neurology.org/N for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article. United States Multiple Sclerosis Prevalence Workgroup (MSPWG) coinvestigators are listed in appendix 2 at the end of the article. The Article Processing Charge was funded by the National Multiple Sclerosis Society. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. e1016 Copyright © 2019 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology.

Upload: others

Post on 11-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

ARTICLE OPEN ACCESS

Validation of an algorithm for identifying MScases in administrative health claims datasetsWilliam J Culpepper PhD MA Ruth Ann Marrie MD PhD Annette Langer-Gould MD PhD

Mitchell T Wallin MD MPH Jonathan D Campbell PhD Lorene M Nelson PhD Wendy E Kaye PhD

Laurie Wagner MPH Helen Tremlett PhD Lie H Chen DrPH Stella Leung MSc Charity Evans PhD

Shenzhen Yao MD MSc and Nicholas G LaRocca PhD on behalf of the United States Multiple Sclerosis

Prevalence Workgroup (MSPWG)

Neurologyreg 201992e1016-e1028 doi101212WNL0000000000007043

Correspondence

Dr Culpepper

WilliamCulpeppervagov

AbstractObjectiveTo develop a valid algorithm for identifying multiple sclerosis (MS) cases in administrativehealth claims (AHC) datasets

MethodsWe used 4 AHC datasets from the Veterans Administration (VA) Kaiser Permanente SouthernCalifornia (KPSC) Manitoba (Canada) and Saskatchewan (Canada) In the VA KPSC andManitoba we tested the performance of candidate algorithms based on inpatient outpatientand disease-modifying therapy (DMT) claims compared to medical records review usingsensitivity specificity positive and negative predictive values and interrater reliability (YoudenJ statistic) both overall and stratified by sex and age In Saskatchewan we tested the algorithmsin a cohort randomly selected from the general population

ResultsThe preferred algorithm required ge3 MS-related claims from any combination of inpatientoutpatient or DMT claims within a 1-year time period a 2-year time period provided littlegain in performance Algorithms including DMT claims performed better than those thatdid not Sensitivity (866ndash960) specificity (667ndash990) positive predictive value(954ndash990) and interrater reliability (Youden J = 060ndash092) were generally stable acrossdatasets and across strata Some variation in performance in the stratified analyses was observedbut largely reflected changes in the composition of the strata In Saskatchewan the preferredalgorithm had a sensitivity of 96 specificity of 99 positive predictive value of 99 andnegative predictive value of 96

ConclusionsThe performance of each algorithm was remarkably consistent across datasets The preferredalgorithm required ge3 MS-related claims from any combination of inpatient outpatient orDMTusewithin 1 yearWe recommend this algorithm as the standard AHC case definition forMS

RELATED ARTICLES

ArticleThe prevalence of MS inthe United States Apopulation-based estimateusing health claims data

Page e1029

Views and ReviewsA new way to estimateneurologic diseaseprevalence in the UnitedStates Illustrated with MS

Page 469

From the Department of Veterans Affairs Post Deployment Health Services (WJC MTW) Multiple Sclerosis Center of Excellence University of Maryland (WJC) BaltimoreDepartments of Internal Medicine and Community Health Sciences (RAM SL) Max Rady College of Medicine Rady Faculty of Health Sciences University of Manitoba WinnipegCanada Neurology Department (AL-G LHC) Kaiser Permanente Southern California Los Angeles Georgetown University School of Medicine (MTW) Washington DCUniversity of Colorado (JC) Denver Stanford University School of Medicine (LMN) CA McKing Consulting Corp (WEK LW) Atlanta GA Faculty of Medicine (Neurology) andCentre for Brain Health (HT) University of British Columbia Vancouver College of Pharmacy and Nutrition (CE) University of Saskatchewan Health Quality Council (Saskatch-ewan) (SY) Saskatoon Canada and National Multiple Sclerosis Society (NGL) New York NY

Go to NeurologyorgN for full disclosures Funding information and disclosures deemed relevant by the authors if any are provided at the end of the article

United States Multiple Sclerosis Prevalence Workgroup (MSPWG) coinvestigators are listed in appendix 2 at the end of the article

The Article Processing Charge was funded by the National Multiple Sclerosis Society

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 40 (CC BY-NC-ND) which permits downloadingand sharing the work provided it is properly cited The work cannot be changed in any way or used commercially without permission from the journal

e1016 Copyright copy 2019 The Author(s) Published by Wolters Kluwer Health Inc on behalf of the American Academy of Neurology

The United States has historically been considered a regionwith a high prevalence of multiple sclerosis (MS) Howeverthe prevalence of MS remains poorly described1 and rigorousnational estimates are lacking Prevalence estimates are criticalfor understanding the national burden of MS planning forhealth service needs and supporting advocacy efforts

Multiple methods have been used to estimate the prevalence ofMS worldwide including surveys review of medical records orlists from health care institutions and administrative healthclaims (AHC) data1ndash3 AHC data are attractive because of theirrelatively low cost ease of use relative to other methods andpotential for repeated use over time However AHC data arenot collected for research purposes and their validity must beassessed before use4ndash6 Validation studies are possible for someAHC datasets in the United States such as the Department ofVeterans Affairs (VA) but not in others such as Medicare be-cause they cannot be linked back to clinical records

Therefore we aimed to test several case definitions (algo-rithms) for identifying persons with MS using multiple AHCdatasets We reasoned that if we could identify a best-casedefinition that performed well and comparably across diverseAHC datasets this would support their use in other healthclaims datasets in which validation studies were not feasibleFurther details and rationale for this approach are provided inone of our companion articles (Nelson et al7 ldquoANewWay toEstimate Neurologic Disease Prevalence in the United StatesMultiple Sclerosisrdquo) Second we aimed to determine howmuch the duration of the observation period would affect casedetection (ie underestimate prevalence) for appropriateadjustment in our companion prevalence article (Wallinet al8 ldquoThe Prevalence of MS in the United States APopulation-Based Estimate Using Health Care Claims Datardquo)

MethodsThis was a retrospective analysis using linked administrativeand clinical data that was conducted in parallel at 3 primarysites (dataset characteristics in table e-1 available from Dryaddoiorg105061dryad4c7s325) following a common pro-tocol The findings from the initial analysis were then tested ina fourth independent dataset Sources of the AHC datasets aredescribed below Clinical data were drawn from medicalrecords Neurologist-confirmed diagnoses of MS based on the2010 McDonald criteria9 served as the reference standard for

this study At the 3 primary sites the source population forthis study was limited to persons ge18 years of age who werealive during the study period (2000ndash2014) and had ge1 healthcare encounter associated with a diagnostic code for MS(Manitoba used any demyelinating disease to define its sourcepopulation)

Standard protocol approvals registrationsand patient consentsEach site obtained local Institutional Review Board or ethicsapproval University of Maryland Baltimore Baltimore VAResearch and Development Committee Kaiser PermanenteSouthern California (KPSC) University of Manitoba HealthResearch Ethics Board and University of Saskatchewan Bio-medical Research Ethics Board In Manitoba approval foradministrative data access was obtained from the Health In-formation Privacy Committee In Saskatchewan approval fordata access was obtained from the Saskatchewan Ministry ofHealth and the Saskatchewan Health Quality Council

Data sources

Department of Veterans AffairsThe VA provides comprehensive health care to asymp11 millionmilitary veterans at asymp300 hospitals and 1500 outpatientclinics and Veterans Centers distributed throughout theUnited States Puerto Rico andGuam Since 1997 the VA hasprospectively collected and maintained extensive data on allfunded health care services including hospitalizations phy-sician visits and prescriptions dispensed Diagnoses wererecorded using ICD-9 codes

The initial validation cohort was drawn from the VA case-finding algorithm study in 20065 which included all patientsin the mid-Atlantic region who were ge18 years of age with ge1encounter with a diagnosis code for MS (ICD-9 code 340)We added data from the Gulf War Era MS Cohort10 and datafrom screening of patients for a planned longitudinal MSstudy We followed the same standardized medical recordreview procedures (trained abstractors under the directionof MW) to verify the diagnosis of MS9 for all patients Intotal these studies yielded a cohort of 3452 patients (table 1)from 1999 through 2007

Kaiser Permanente Southern CaliforniaKPSC is a prepaid health maintenance organization witha membership during the study period of gt35 million persons

GlossaryAHC = administrative health claims CI = confidence interval DMT = disease-modifying therapy ICD-9 = InternationalClassification of Disease 9th revision ICD-9-CM = International Classification of Disease 9th revision clinical modification ICD-10-CA = International Classification of Disease 10th revision Canadian version ICD-10-CM = International Classification of Disease10th revision clinical modification IMS = Intercontinental Marketing Services IP = inpatient KPSC = Kaiser PermanenteSouthern California MS = multiple sclerosis NPV = negative predictive value OP = outpatient PPV = positive predictivevalue VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1017

and provides health care coverage toasymp20 of the population inthe region it serves The KPSC membership is largely repre-sentative of the general population in Southern California withrespect to ethnicity age and sex but underrepresents the lowestand highest ends of the socioeconomic spectrum11 Physicianvisits hospitalizations andmedications are captured in datasetsthat are linkable by a medical record number

The validation cohort consisted of all persons ge18 years of agewith ge1 inpatient (IP) or outpatient (OP) encounter di-agnosis code for MS (ICD-9 code 340) who were KPSChealth plan members between 2008 and 2010 had at least 6months of continuous membership during the study periodand had complete electronic health records (n = 4851) Werandomly selected 2935 (table 1) of these potential cases forin-depth medical records review by an MS specialist (AL-G)including IP and OP records MRI scans and diagnostic testresults to confirm a diagnosis of MS9

ManitobaIn the Canadian province of Manitoba Manitoba HealthSeniors and Active Living is responsible for the delivery ofpublicly funded health care to nearly all residents Severaldatasets are maintained that capture demographic in-formation (sex dates of birth health care coverage anddeath) hospitalizations physician services and prescriptiondrug dispensations Diagnoses for physician services are

recorded with the clinical modification of ICD-9 (ICD-9-CM) codes while diagnoses for hospitalizations are codedwith ICD-9-CM or the Canadian version of ICD-10 codes(ICD-10-CA) depending on the year

The Manitoba validation cohort started with the 400 patientsge18 years of age who were used to develop administrative casedefinitions of MS in 20076 This cohort was augmented byreview of the medical records of all persons participating inthe MS Clinic Registry (trained abstractors under the super-vision of RAM) at the Winnipeg MS Clinic who had con-sented to linkage of their medical records to administrativedata (89 of those approached) as of June 2015 and met theinclusion criteria as described above This expanded the val-idation sample to 1654 patients (table 1)

SaskatchewanThis constituted the fourth independent site In the Canadianprovince of Saskatchewan health care is publicly funded as inManitoba Datasets similar to those in Manitoba capture de-mographic information (sex dates of birth health care cov-erage and death) hospitalizations physician services andprescription drug dispensations Diagnoses for physicianservices are recorded with ICD-9-CM codes while diagnosesfor hospitalizations are coded with ICD-9 or ICD-10-CAcodes depending on the year

As described elsewhere12 the validation cohort included 200randomly selected cases from the Saskatoon MS Clinic withdiagnoses confirmed by an MS specialist and 200 controls(table 1) from the Inpatient Rehabilitation Center databaseThis database specifically captures diagnoses of chronic dis-eases including MS13

MS case definition algorithmsWe developed candidate algorithms using ICD-9ICD-10codes for MS (340G35) and lists of MS-specific disease-modifying therapies (DMTs) To develop these definitionswe reviewed the literature on validated administrative casedefinitions for MS5612ndash15 From this review we consideredthe following (1) 1 encounter with an ICD-9ICD-10 codefor MS is too nonspecific because these may represent rule-out codes (2) increasing the number of health careencounters required generally increases specificity at the ex-pense of sensitivity and (3) prescription claims for MS-specific DMTs may be helpful to improve sensitivity speci-ficity or both when available Therefore the candidate algo-rithms (table 2) varied the number of IP and OP encounterswith MS listed as one of the diagnoses for that encounter thenumber of prescription claims for a DMT required to classifya person as an MS case and the time period over which thecase definition should be met We included candidate algo-rithms that did not use prescription claims because these arenot available in all AHC datasets Because we anticipatedapplying these definitions in AHC datasets in which enroll-ment in a health insurance plan might be limited to short timeperiods we considered only 1- or 2-year time periods over

Table 1 Demographic characteristics of theMS validationcohorts by data source

Primary Validation Cohorts

VA KPSC MB SK

Total cases n 3452 2935 1654 400

Sex n ()

Male 2451 (740) 725 (242) 415 (251) 197 (493)

Female 1001 (260) 2253 (758) 1239 (749) 203 (507)

Age groupn ()

18ndash24 y 118 (34) 105 (35) 123 (74) a

25ndash34 y 928 (269) 400 (135) 434 (262) 25 (625)

35ndash44 y 1093 (316) 662 (225) 554 (335) 46 (115)

45ndash54 y 833 (241) 769 (262) 381 (230) 76 (190)

55ndash64 y 302 (87) 669 (228) 118 (71) 83 (208)

65ndash74 y 131 (38) 243 (85) a 80 (200)

ge75 y 47 (14) 87 (30) a a

Abbreviations KPSC = Kaiser Permanente Southern California MB = Man-itoba Canada MS = multiple sclerosis SK = Saskatchewan Canada VA =Veterans AdministrationSK represents a general population cohorta Cells suppressed due to small numbers in accordancewith data access andprivacy requirements

e1018 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

which to define a case Each component of these case defi-nitions is described further below

IP encounterWe defined an IP encounter as an IP admission (overnightstay) for which MS was recorded as one of the diagnoses Toaccount for transfers between institutions and to avoid doublecounting we collapsed multiple overlapping hospital admis-sion and discharge records into 1 IP encounter In addition anIP encounter with an admission date that occurred within 24hours of a prior IP encounter was counted as a single IPencounter56

PhysicianOP encountersWe defined OP encounters as a visit to an OP clinic with anICD code for MS Multiple OP visits by 1 patient within thesame day were treated as 1 OP encounter In addition OPencounters occurring within an MS IP (admission to dis-charge date) were not counted as unique OP encounters

Disease-modifying therapiesWe defined prescription claims as dispensations for any of theMS-specific DMTs approved before or during the study pe-riod including interferon beta-1a-SC interferon beta-1a-IMinterferon beta-1b-SC glatiramer acetate fingolimod andnatalizumab as identified with National Drug Codes and DrugIdentification numbers (tables e-1 and e-2 available fromDryad doiorg105061dryad4c7s325) To avoid mis-classification claims for natalizumab were not included if theindividual also had an ICD code for inflammatory boweldisease a disorder for which this medication is approvedfor use

A caveat in defining IP and OP encounters was that the VAimposes a hierarchy to the coding such that the primary di-agnosis for a given encounter should be listed first followedin descending order of importance by other diagnoses af-fecting that episode of care Not all AHC datasets impose sucha hierarchy (eg KPSC) Our assessment of the effect of thiscoding hierarchy (see Appendix 2 available from Dryad foradditional methods and results doiorg105061dryad

4c7s325) resulted in using MS as a primary diagnosis in theVA data and MS as any diagnosis in the KPSC and Manitobadatasets when counting MS encounters

AnalysisWe applied the candidate algorithms in each of the 3 primaryvalidation cohorts We compared the performance of thecandidate algorithms with the reference standard diagnoses ofMS frommedical records using sensitivity specificity positivepredictive value (PPV) and negative predictive value (NPV)In addition interrater agreement was compared between thealgorithms derived from AHC and medical records data usingthe crude accuracy score and Youden16 J statistic We chosethe J statistic over the κ statistic1718 because it is designed foruse with dichotomous data and equally weights false-positivesand false-negatives We interpreted the J statistic like κ whenneither the algorithm nor the medical records were consid-ered the reference standard as slight (0ndash020) fair(021ndash040) moderate (041ndash060) substantial (061ndash080)and excellent (081ndash10)19

To determine whether the algorithms performed consistentlyacross sociodemographic subgroups we stratified the refer-ence population by (1) sex and (2) age group (18ndash24 25ndash3435ndash44 45ndash54 55ndash64 65ndash74 ge75 years) and repeated ouranalyses for the 4 algorithms that performed best across the 3AHC datasets The ultimate choice of an algorithm was basedon performance and pragmatic considerations including easeof application and interpretation

Because our 3 primary validation datasets were drawn frompatients with at least 1 encounter for MS (or demyelinatingdisease in Manitoba) not the general population we expectedthat the specificity and predictive values of the case definitionswould be artificially depressed compared to the performance inthe general population Therefore we tested our case definitionalgorithms in a fourth independent dataset in Saskatchewan14

which included individuals with andwithoutMS as noted above

Prevalence with limited durationascertainment periodsWe intended to apply our optimal algorithm to multiple AHCdatasets available in the United States20 to determine preva-lence in 2010 based on a 3-year ascertainment period How-ever limited periods of observation may substantially reducecase detection and underestimate prevalence as is wellrecognized in the literature21ndash23 Capocaccia et al22 describehow they dealt with adjusting data from cancer registries thathad been active for short durations when creating the Europe-wide project to estimate the prevalence of the most importantcancers (EUROPREVAL) They computed what they calleda completeness index for multiple observation periods usingdata from the Connecticut Cancer Registry because it hasbeen in operation for gt50 years In the 4 registries that hadonly 5 observation years the completeness index ranged from042 to 045 The completeness index improved systematicallyas the observation period increased They go on to describe

Table 2 Description of candidate algorithms for MSa

Case definition name Number and type of claims

MS_A ge2 IP or ge3 OP

MS_B ge2 IP or ge4 OP

MS_C ge2 IP or ge5 OP

MS_D ge2 IP or ge3 OP or ge1 DMT

MS_E (IP + OP + DMT) ge 3

Abbreviations DMT = disease modifying therapy IP = inpatient admissionMS = multiple sclerosis OP = outpatient visita The performance of each algorithmwas evaluated on the basis of both a 1-year and a 2-year time period

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1019

their model-based methods to estimate the complete preva-lence based on a limited period using age- and sex-specificincidence and survival rates for cancer22 This information islacking in the US MS population and cannot be accuratelyestimated from data of only 3 yearsrsquo duration23 Therefore toassess the effect of a short ascertainment period we used theapproaches used by Ng et al21 and Capocaccia et al22 andestimated the prevalence of MS using 3 years (2008ndash2010)and 10 years (1999ndash2010) of data in the VA and Manitobadatasets and compared the difference in these estimates in2010 Prevalence in each year was defined as all persons ge18years meeting our preferred algorithm definition of MS ([IP +OP + DMT] ge 3) among those alive during the calendar yearThe 3-year and 10-year prevalence estimates in 2010 werecumulative estimates based on all those who had met thedefinition at any point in the ascertainment period and whowere alive in 2010

To validate the above analyses we repeated these proceduresusing the PharMetrics Plus Health Plan Claims Dataset fromIntercontinental Marketing Services (IMS) which was notused to develop the algorithm Here we compared the esti-mated prevalence derived from 3 years (2013ndash2015) and 9years (the longest period available) ending in 2015 The IMSdataset captures health care claims from asymp120 health plansacross the United States accounting for ge42 million coveredlives as of 2011 mirrors the demographics of the US Censuspopulation and has been used for national and regionalbenchmarking of health care use and cost2024 The datacontained in this dataset are comparable to the data in vali-dation datasets that included ICD-9-CM diagnosis and IPprocedure codes provider codes Current Procedural Ter-minology (OP procedures) codes National Drug Code placeand date of service enrollment date(s) sex year of birth andUS Census region We report 95 confidence intervals (CIs)25 for the ratio of the prevalence estimates of longer vs shorterintervals Although these prevalence estimates are highlycorrelated which would narrow these intervals substantiallywe report uncorrected 95 CIs to be conservative

Statistical analyses were conducted with either SAS version94 (SAS Institute Inc Cary NC) or SPSS version 22 (IBMInc Armonk NY)

Data availabilityThe datasets used in this study constitute secondary analyses ofdata from prior studies Therefore there are no data-sharingagreements in place for these data The exception concerns theIMS data which are a limited-use dataset provided gratis toUniversity of Maryland Baltimore faculty (WJC) and datasharing is not allowed under that agreement

ResultsThe characteristics of the 3 primary validation cohorts arerepresentative of the general MS population (table 1)56101326

As expected there was a higher proportion of males in the VAdata compared with KPSC and Manitoba but the age dis-tributions were similar across cohorts

Initial validation of algorithmGenerally when an algorithm was applied using 1 year of datathe sensitivity was lower than when applied using 2 years ofdata while specificity was higher when applied using 1 year ofdata (table 3) The more stringent the case definition that isthe greater the number of claims required was the lower thesensitivity was and the higher the specificity was The additionof prescription claims for DMT to an algorithm increasedsensitivity without a loss of specificity leading to better per-formance overall

Algorithm MS_E ([IP + OP + DMT] ge 3) had the highestsensitivity for both time periods In contrast the algorithm withthe highest specificity wasMS_C (ge2 IP orge5OP) in the 1-yearand 2-year time periods The PPVs were high for all algorithmswhereas the NPVs were moderately low suggesting that thealgorithms were better at ruling in cases of MS than ruling outnon-MS cases (in a population with at least 1 claim for MS)

The proportion of misclassified cases (false-positives and false-negatives) was relatively low as reflected in the moderatelyhigh to high accuracy scores algorithm MS_E had the highestaccuracy over both time periods The Youden J statistic mir-rored the crude accuracy scores revealing a substantial level ofagreement between each algorithm and the reference standardAlgorithm MS_E had the highest or second highest J statisticfor the 1-year time period with scores of ge060

We performed the stratified analyses on only the algorithmsthat included DMTs because these were the best-performingdefinitions (MS_D and MS_E in tables 3) Because thestratified results were nearly identical across these 4 casedefinitions we present only the results for MS_E-1 ([IP + OP+ DMT] ge 3) for the 1-year time period (figures 1 and 2)

The test characteristics remained stable by sex (men figure 1Bwomen figure 1C) including sensitivity specificity PPV andaccuracy In the VA the NPV and J statistic were lower amongwomen compared to men but were stable across sexes in theKPSC and Manitoba datasets Similarly sensitivity PPV andaccuracy were relatively consistent across age groups andsomewhat stable for specificity TheNPV and J statistic showedthe greatest variation however despite this variability theMS_E-1 algorithm performed well across datasets and age groupsThus given its consistent and superior performance algorithmMS_E-1 was chosen as the preferred algorithm

Validation in the general populationThe characteristics of the Saskatchewan validation cohort werecomparable to those of the other cohorts When we applied thealgorithms to the Saskatchewan general population specific-ities and PPVs were all ge096 and the NPVs were all ge091 Ofthe algorithms using 1 year of data algorithm MS_E-1

e1020 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

performed the best along with algorithm MS_D-1 For pre-ferred algorithmMS_E the sensitivity was 096 specificity was099 PPVwas 099 NPVwas 096 and Youden J was 096 Oneof the 2-year algorithms MS_D-2 also had a specificity of 099but a marginally higher sensitivity of 097

Corrections for limited-durationascertainment periodsFigure 3 compares estimated prevalence based on a 3-year(2008ndash2010) ascertainment period to that from 10 years(2000ndash2010) of data as of 2010 in the VA and Manitoba

Table 3 Summary of test statistics for each algorithm based on years of data for each data source

MS algorithm Data source Sensitivity Specificity PPV NPV Accuracy Youden J

1-y epoch (denoted by 21)

MS_A-1 ge2 IP or ge3 OP VA 861 825 978 396 857 069

KPSC 789 749 955 342 783 054

MB 897 672 953 465 870 057

MS_B-1 ge2 IP or ge4 OP VA 815 895 986 348 823 071

KPSC 669 823 964 269 690 050

MB 790 779 964 332 789 057

MS_C-1 ge2 IP or ge5 OP VA 761 904 986 294 775 067

KPSC 553 874 967 224 595 043

MB 685 856 973 267 705 054

MS_D-1 ge2 IP or ge3 OP or ge1 DMT VA 866 825 978 404 862 069

KPSC 874 730 957 461 856 060

MB 932 667 954 568 901 060

MS_E-1 (IP + OP + DMT) ge 3 VA 872 822 978 414 867 069

KPSC 855 762 961 436 843 062

MB 934 661 954 568 901 060

2-y epoch (denoted by 22)

MS_A-2 ge2 IP or ge3 OP VA 883 787 974 426 873 067

KPSC 834 707 951 385 818 051

MB 932 610 947 546 894 054

MS_B-2 ge2 IP or ge4 OP VA 860 848 981 400 859 071

KPSC 747 798 962 318 754 055

MB 876 697 956 429 855 057

MS_C-2 ge2 IP or ge5 OP VA 835 880 964 371 840 072

KPSC 662 851 968 270 686 051

MB 817 805 969 370 816 062

MS_D-2 ge2 IP or ge3 OP or ge1 DMT VA 885 787 974 430 875 067

KPSC 896 688 951 493 869 058

MB 950 610 948 620 910 056

MS_ E-2 (IP + OP + DMT) ge 3 VA 891 784 974 442 880 068

KPSC 881 709 954 466 859 059

MB 951 600 947 619 909 055

Abbreviations DMT = disease modifying therapy IP = inpatient admission KPSC =Kaiser Permanente Southern California MB = Manitoba Canada MS =multiple sclerosis NPV = negative predictive value OP = outpatient visit PPV = positive predictive value VA = Veterans Administration

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1021

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 2: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

The United States has historically been considered a regionwith a high prevalence of multiple sclerosis (MS) Howeverthe prevalence of MS remains poorly described1 and rigorousnational estimates are lacking Prevalence estimates are criticalfor understanding the national burden of MS planning forhealth service needs and supporting advocacy efforts

Multiple methods have been used to estimate the prevalence ofMS worldwide including surveys review of medical records orlists from health care institutions and administrative healthclaims (AHC) data1ndash3 AHC data are attractive because of theirrelatively low cost ease of use relative to other methods andpotential for repeated use over time However AHC data arenot collected for research purposes and their validity must beassessed before use4ndash6 Validation studies are possible for someAHC datasets in the United States such as the Department ofVeterans Affairs (VA) but not in others such as Medicare be-cause they cannot be linked back to clinical records

Therefore we aimed to test several case definitions (algo-rithms) for identifying persons with MS using multiple AHCdatasets We reasoned that if we could identify a best-casedefinition that performed well and comparably across diverseAHC datasets this would support their use in other healthclaims datasets in which validation studies were not feasibleFurther details and rationale for this approach are provided inone of our companion articles (Nelson et al7 ldquoANewWay toEstimate Neurologic Disease Prevalence in the United StatesMultiple Sclerosisrdquo) Second we aimed to determine howmuch the duration of the observation period would affect casedetection (ie underestimate prevalence) for appropriateadjustment in our companion prevalence article (Wallinet al8 ldquoThe Prevalence of MS in the United States APopulation-Based Estimate Using Health Care Claims Datardquo)

MethodsThis was a retrospective analysis using linked administrativeand clinical data that was conducted in parallel at 3 primarysites (dataset characteristics in table e-1 available from Dryaddoiorg105061dryad4c7s325) following a common pro-tocol The findings from the initial analysis were then tested ina fourth independent dataset Sources of the AHC datasets aredescribed below Clinical data were drawn from medicalrecords Neurologist-confirmed diagnoses of MS based on the2010 McDonald criteria9 served as the reference standard for

this study At the 3 primary sites the source population forthis study was limited to persons ge18 years of age who werealive during the study period (2000ndash2014) and had ge1 healthcare encounter associated with a diagnostic code for MS(Manitoba used any demyelinating disease to define its sourcepopulation)

Standard protocol approvals registrationsand patient consentsEach site obtained local Institutional Review Board or ethicsapproval University of Maryland Baltimore Baltimore VAResearch and Development Committee Kaiser PermanenteSouthern California (KPSC) University of Manitoba HealthResearch Ethics Board and University of Saskatchewan Bio-medical Research Ethics Board In Manitoba approval foradministrative data access was obtained from the Health In-formation Privacy Committee In Saskatchewan approval fordata access was obtained from the Saskatchewan Ministry ofHealth and the Saskatchewan Health Quality Council

Data sources

Department of Veterans AffairsThe VA provides comprehensive health care to asymp11 millionmilitary veterans at asymp300 hospitals and 1500 outpatientclinics and Veterans Centers distributed throughout theUnited States Puerto Rico andGuam Since 1997 the VA hasprospectively collected and maintained extensive data on allfunded health care services including hospitalizations phy-sician visits and prescriptions dispensed Diagnoses wererecorded using ICD-9 codes

The initial validation cohort was drawn from the VA case-finding algorithm study in 20065 which included all patientsin the mid-Atlantic region who were ge18 years of age with ge1encounter with a diagnosis code for MS (ICD-9 code 340)We added data from the Gulf War Era MS Cohort10 and datafrom screening of patients for a planned longitudinal MSstudy We followed the same standardized medical recordreview procedures (trained abstractors under the directionof MW) to verify the diagnosis of MS9 for all patients Intotal these studies yielded a cohort of 3452 patients (table 1)from 1999 through 2007

Kaiser Permanente Southern CaliforniaKPSC is a prepaid health maintenance organization witha membership during the study period of gt35 million persons

GlossaryAHC = administrative health claims CI = confidence interval DMT = disease-modifying therapy ICD-9 = InternationalClassification of Disease 9th revision ICD-9-CM = International Classification of Disease 9th revision clinical modification ICD-10-CA = International Classification of Disease 10th revision Canadian version ICD-10-CM = International Classification of Disease10th revision clinical modification IMS = Intercontinental Marketing Services IP = inpatient KPSC = Kaiser PermanenteSouthern California MS = multiple sclerosis NPV = negative predictive value OP = outpatient PPV = positive predictivevalue VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1017

and provides health care coverage toasymp20 of the population inthe region it serves The KPSC membership is largely repre-sentative of the general population in Southern California withrespect to ethnicity age and sex but underrepresents the lowestand highest ends of the socioeconomic spectrum11 Physicianvisits hospitalizations andmedications are captured in datasetsthat are linkable by a medical record number

The validation cohort consisted of all persons ge18 years of agewith ge1 inpatient (IP) or outpatient (OP) encounter di-agnosis code for MS (ICD-9 code 340) who were KPSChealth plan members between 2008 and 2010 had at least 6months of continuous membership during the study periodand had complete electronic health records (n = 4851) Werandomly selected 2935 (table 1) of these potential cases forin-depth medical records review by an MS specialist (AL-G)including IP and OP records MRI scans and diagnostic testresults to confirm a diagnosis of MS9

ManitobaIn the Canadian province of Manitoba Manitoba HealthSeniors and Active Living is responsible for the delivery ofpublicly funded health care to nearly all residents Severaldatasets are maintained that capture demographic in-formation (sex dates of birth health care coverage anddeath) hospitalizations physician services and prescriptiondrug dispensations Diagnoses for physician services are

recorded with the clinical modification of ICD-9 (ICD-9-CM) codes while diagnoses for hospitalizations are codedwith ICD-9-CM or the Canadian version of ICD-10 codes(ICD-10-CA) depending on the year

The Manitoba validation cohort started with the 400 patientsge18 years of age who were used to develop administrative casedefinitions of MS in 20076 This cohort was augmented byreview of the medical records of all persons participating inthe MS Clinic Registry (trained abstractors under the super-vision of RAM) at the Winnipeg MS Clinic who had con-sented to linkage of their medical records to administrativedata (89 of those approached) as of June 2015 and met theinclusion criteria as described above This expanded the val-idation sample to 1654 patients (table 1)

SaskatchewanThis constituted the fourth independent site In the Canadianprovince of Saskatchewan health care is publicly funded as inManitoba Datasets similar to those in Manitoba capture de-mographic information (sex dates of birth health care cov-erage and death) hospitalizations physician services andprescription drug dispensations Diagnoses for physicianservices are recorded with ICD-9-CM codes while diagnosesfor hospitalizations are coded with ICD-9 or ICD-10-CAcodes depending on the year

As described elsewhere12 the validation cohort included 200randomly selected cases from the Saskatoon MS Clinic withdiagnoses confirmed by an MS specialist and 200 controls(table 1) from the Inpatient Rehabilitation Center databaseThis database specifically captures diagnoses of chronic dis-eases including MS13

MS case definition algorithmsWe developed candidate algorithms using ICD-9ICD-10codes for MS (340G35) and lists of MS-specific disease-modifying therapies (DMTs) To develop these definitionswe reviewed the literature on validated administrative casedefinitions for MS5612ndash15 From this review we consideredthe following (1) 1 encounter with an ICD-9ICD-10 codefor MS is too nonspecific because these may represent rule-out codes (2) increasing the number of health careencounters required generally increases specificity at the ex-pense of sensitivity and (3) prescription claims for MS-specific DMTs may be helpful to improve sensitivity speci-ficity or both when available Therefore the candidate algo-rithms (table 2) varied the number of IP and OP encounterswith MS listed as one of the diagnoses for that encounter thenumber of prescription claims for a DMT required to classifya person as an MS case and the time period over which thecase definition should be met We included candidate algo-rithms that did not use prescription claims because these arenot available in all AHC datasets Because we anticipatedapplying these definitions in AHC datasets in which enroll-ment in a health insurance plan might be limited to short timeperiods we considered only 1- or 2-year time periods over

Table 1 Demographic characteristics of theMS validationcohorts by data source

Primary Validation Cohorts

VA KPSC MB SK

Total cases n 3452 2935 1654 400

Sex n ()

Male 2451 (740) 725 (242) 415 (251) 197 (493)

Female 1001 (260) 2253 (758) 1239 (749) 203 (507)

Age groupn ()

18ndash24 y 118 (34) 105 (35) 123 (74) a

25ndash34 y 928 (269) 400 (135) 434 (262) 25 (625)

35ndash44 y 1093 (316) 662 (225) 554 (335) 46 (115)

45ndash54 y 833 (241) 769 (262) 381 (230) 76 (190)

55ndash64 y 302 (87) 669 (228) 118 (71) 83 (208)

65ndash74 y 131 (38) 243 (85) a 80 (200)

ge75 y 47 (14) 87 (30) a a

Abbreviations KPSC = Kaiser Permanente Southern California MB = Man-itoba Canada MS = multiple sclerosis SK = Saskatchewan Canada VA =Veterans AdministrationSK represents a general population cohorta Cells suppressed due to small numbers in accordancewith data access andprivacy requirements

e1018 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

which to define a case Each component of these case defi-nitions is described further below

IP encounterWe defined an IP encounter as an IP admission (overnightstay) for which MS was recorded as one of the diagnoses Toaccount for transfers between institutions and to avoid doublecounting we collapsed multiple overlapping hospital admis-sion and discharge records into 1 IP encounter In addition anIP encounter with an admission date that occurred within 24hours of a prior IP encounter was counted as a single IPencounter56

PhysicianOP encountersWe defined OP encounters as a visit to an OP clinic with anICD code for MS Multiple OP visits by 1 patient within thesame day were treated as 1 OP encounter In addition OPencounters occurring within an MS IP (admission to dis-charge date) were not counted as unique OP encounters

Disease-modifying therapiesWe defined prescription claims as dispensations for any of theMS-specific DMTs approved before or during the study pe-riod including interferon beta-1a-SC interferon beta-1a-IMinterferon beta-1b-SC glatiramer acetate fingolimod andnatalizumab as identified with National Drug Codes and DrugIdentification numbers (tables e-1 and e-2 available fromDryad doiorg105061dryad4c7s325) To avoid mis-classification claims for natalizumab were not included if theindividual also had an ICD code for inflammatory boweldisease a disorder for which this medication is approvedfor use

A caveat in defining IP and OP encounters was that the VAimposes a hierarchy to the coding such that the primary di-agnosis for a given encounter should be listed first followedin descending order of importance by other diagnoses af-fecting that episode of care Not all AHC datasets impose sucha hierarchy (eg KPSC) Our assessment of the effect of thiscoding hierarchy (see Appendix 2 available from Dryad foradditional methods and results doiorg105061dryad

4c7s325) resulted in using MS as a primary diagnosis in theVA data and MS as any diagnosis in the KPSC and Manitobadatasets when counting MS encounters

AnalysisWe applied the candidate algorithms in each of the 3 primaryvalidation cohorts We compared the performance of thecandidate algorithms with the reference standard diagnoses ofMS frommedical records using sensitivity specificity positivepredictive value (PPV) and negative predictive value (NPV)In addition interrater agreement was compared between thealgorithms derived from AHC and medical records data usingthe crude accuracy score and Youden16 J statistic We chosethe J statistic over the κ statistic1718 because it is designed foruse with dichotomous data and equally weights false-positivesand false-negatives We interpreted the J statistic like κ whenneither the algorithm nor the medical records were consid-ered the reference standard as slight (0ndash020) fair(021ndash040) moderate (041ndash060) substantial (061ndash080)and excellent (081ndash10)19

To determine whether the algorithms performed consistentlyacross sociodemographic subgroups we stratified the refer-ence population by (1) sex and (2) age group (18ndash24 25ndash3435ndash44 45ndash54 55ndash64 65ndash74 ge75 years) and repeated ouranalyses for the 4 algorithms that performed best across the 3AHC datasets The ultimate choice of an algorithm was basedon performance and pragmatic considerations including easeof application and interpretation

Because our 3 primary validation datasets were drawn frompatients with at least 1 encounter for MS (or demyelinatingdisease in Manitoba) not the general population we expectedthat the specificity and predictive values of the case definitionswould be artificially depressed compared to the performance inthe general population Therefore we tested our case definitionalgorithms in a fourth independent dataset in Saskatchewan14

which included individuals with andwithoutMS as noted above

Prevalence with limited durationascertainment periodsWe intended to apply our optimal algorithm to multiple AHCdatasets available in the United States20 to determine preva-lence in 2010 based on a 3-year ascertainment period How-ever limited periods of observation may substantially reducecase detection and underestimate prevalence as is wellrecognized in the literature21ndash23 Capocaccia et al22 describehow they dealt with adjusting data from cancer registries thathad been active for short durations when creating the Europe-wide project to estimate the prevalence of the most importantcancers (EUROPREVAL) They computed what they calleda completeness index for multiple observation periods usingdata from the Connecticut Cancer Registry because it hasbeen in operation for gt50 years In the 4 registries that hadonly 5 observation years the completeness index ranged from042 to 045 The completeness index improved systematicallyas the observation period increased They go on to describe

Table 2 Description of candidate algorithms for MSa

Case definition name Number and type of claims

MS_A ge2 IP or ge3 OP

MS_B ge2 IP or ge4 OP

MS_C ge2 IP or ge5 OP

MS_D ge2 IP or ge3 OP or ge1 DMT

MS_E (IP + OP + DMT) ge 3

Abbreviations DMT = disease modifying therapy IP = inpatient admissionMS = multiple sclerosis OP = outpatient visita The performance of each algorithmwas evaluated on the basis of both a 1-year and a 2-year time period

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1019

their model-based methods to estimate the complete preva-lence based on a limited period using age- and sex-specificincidence and survival rates for cancer22 This information islacking in the US MS population and cannot be accuratelyestimated from data of only 3 yearsrsquo duration23 Therefore toassess the effect of a short ascertainment period we used theapproaches used by Ng et al21 and Capocaccia et al22 andestimated the prevalence of MS using 3 years (2008ndash2010)and 10 years (1999ndash2010) of data in the VA and Manitobadatasets and compared the difference in these estimates in2010 Prevalence in each year was defined as all persons ge18years meeting our preferred algorithm definition of MS ([IP +OP + DMT] ge 3) among those alive during the calendar yearThe 3-year and 10-year prevalence estimates in 2010 werecumulative estimates based on all those who had met thedefinition at any point in the ascertainment period and whowere alive in 2010

To validate the above analyses we repeated these proceduresusing the PharMetrics Plus Health Plan Claims Dataset fromIntercontinental Marketing Services (IMS) which was notused to develop the algorithm Here we compared the esti-mated prevalence derived from 3 years (2013ndash2015) and 9years (the longest period available) ending in 2015 The IMSdataset captures health care claims from asymp120 health plansacross the United States accounting for ge42 million coveredlives as of 2011 mirrors the demographics of the US Censuspopulation and has been used for national and regionalbenchmarking of health care use and cost2024 The datacontained in this dataset are comparable to the data in vali-dation datasets that included ICD-9-CM diagnosis and IPprocedure codes provider codes Current Procedural Ter-minology (OP procedures) codes National Drug Code placeand date of service enrollment date(s) sex year of birth andUS Census region We report 95 confidence intervals (CIs)25 for the ratio of the prevalence estimates of longer vs shorterintervals Although these prevalence estimates are highlycorrelated which would narrow these intervals substantiallywe report uncorrected 95 CIs to be conservative

Statistical analyses were conducted with either SAS version94 (SAS Institute Inc Cary NC) or SPSS version 22 (IBMInc Armonk NY)

Data availabilityThe datasets used in this study constitute secondary analyses ofdata from prior studies Therefore there are no data-sharingagreements in place for these data The exception concerns theIMS data which are a limited-use dataset provided gratis toUniversity of Maryland Baltimore faculty (WJC) and datasharing is not allowed under that agreement

ResultsThe characteristics of the 3 primary validation cohorts arerepresentative of the general MS population (table 1)56101326

As expected there was a higher proportion of males in the VAdata compared with KPSC and Manitoba but the age dis-tributions were similar across cohorts

Initial validation of algorithmGenerally when an algorithm was applied using 1 year of datathe sensitivity was lower than when applied using 2 years ofdata while specificity was higher when applied using 1 year ofdata (table 3) The more stringent the case definition that isthe greater the number of claims required was the lower thesensitivity was and the higher the specificity was The additionof prescription claims for DMT to an algorithm increasedsensitivity without a loss of specificity leading to better per-formance overall

Algorithm MS_E ([IP + OP + DMT] ge 3) had the highestsensitivity for both time periods In contrast the algorithm withthe highest specificity wasMS_C (ge2 IP orge5OP) in the 1-yearand 2-year time periods The PPVs were high for all algorithmswhereas the NPVs were moderately low suggesting that thealgorithms were better at ruling in cases of MS than ruling outnon-MS cases (in a population with at least 1 claim for MS)

The proportion of misclassified cases (false-positives and false-negatives) was relatively low as reflected in the moderatelyhigh to high accuracy scores algorithm MS_E had the highestaccuracy over both time periods The Youden J statistic mir-rored the crude accuracy scores revealing a substantial level ofagreement between each algorithm and the reference standardAlgorithm MS_E had the highest or second highest J statisticfor the 1-year time period with scores of ge060

We performed the stratified analyses on only the algorithmsthat included DMTs because these were the best-performingdefinitions (MS_D and MS_E in tables 3) Because thestratified results were nearly identical across these 4 casedefinitions we present only the results for MS_E-1 ([IP + OP+ DMT] ge 3) for the 1-year time period (figures 1 and 2)

The test characteristics remained stable by sex (men figure 1Bwomen figure 1C) including sensitivity specificity PPV andaccuracy In the VA the NPV and J statistic were lower amongwomen compared to men but were stable across sexes in theKPSC and Manitoba datasets Similarly sensitivity PPV andaccuracy were relatively consistent across age groups andsomewhat stable for specificity TheNPV and J statistic showedthe greatest variation however despite this variability theMS_E-1 algorithm performed well across datasets and age groupsThus given its consistent and superior performance algorithmMS_E-1 was chosen as the preferred algorithm

Validation in the general populationThe characteristics of the Saskatchewan validation cohort werecomparable to those of the other cohorts When we applied thealgorithms to the Saskatchewan general population specific-ities and PPVs were all ge096 and the NPVs were all ge091 Ofthe algorithms using 1 year of data algorithm MS_E-1

e1020 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

performed the best along with algorithm MS_D-1 For pre-ferred algorithmMS_E the sensitivity was 096 specificity was099 PPVwas 099 NPVwas 096 and Youden J was 096 Oneof the 2-year algorithms MS_D-2 also had a specificity of 099but a marginally higher sensitivity of 097

Corrections for limited-durationascertainment periodsFigure 3 compares estimated prevalence based on a 3-year(2008ndash2010) ascertainment period to that from 10 years(2000ndash2010) of data as of 2010 in the VA and Manitoba

Table 3 Summary of test statistics for each algorithm based on years of data for each data source

MS algorithm Data source Sensitivity Specificity PPV NPV Accuracy Youden J

1-y epoch (denoted by 21)

MS_A-1 ge2 IP or ge3 OP VA 861 825 978 396 857 069

KPSC 789 749 955 342 783 054

MB 897 672 953 465 870 057

MS_B-1 ge2 IP or ge4 OP VA 815 895 986 348 823 071

KPSC 669 823 964 269 690 050

MB 790 779 964 332 789 057

MS_C-1 ge2 IP or ge5 OP VA 761 904 986 294 775 067

KPSC 553 874 967 224 595 043

MB 685 856 973 267 705 054

MS_D-1 ge2 IP or ge3 OP or ge1 DMT VA 866 825 978 404 862 069

KPSC 874 730 957 461 856 060

MB 932 667 954 568 901 060

MS_E-1 (IP + OP + DMT) ge 3 VA 872 822 978 414 867 069

KPSC 855 762 961 436 843 062

MB 934 661 954 568 901 060

2-y epoch (denoted by 22)

MS_A-2 ge2 IP or ge3 OP VA 883 787 974 426 873 067

KPSC 834 707 951 385 818 051

MB 932 610 947 546 894 054

MS_B-2 ge2 IP or ge4 OP VA 860 848 981 400 859 071

KPSC 747 798 962 318 754 055

MB 876 697 956 429 855 057

MS_C-2 ge2 IP or ge5 OP VA 835 880 964 371 840 072

KPSC 662 851 968 270 686 051

MB 817 805 969 370 816 062

MS_D-2 ge2 IP or ge3 OP or ge1 DMT VA 885 787 974 430 875 067

KPSC 896 688 951 493 869 058

MB 950 610 948 620 910 056

MS_ E-2 (IP + OP + DMT) ge 3 VA 891 784 974 442 880 068

KPSC 881 709 954 466 859 059

MB 951 600 947 619 909 055

Abbreviations DMT = disease modifying therapy IP = inpatient admission KPSC =Kaiser Permanente Southern California MB = Manitoba Canada MS =multiple sclerosis NPV = negative predictive value OP = outpatient visit PPV = positive predictive value VA = Veterans Administration

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1021

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 3: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

and provides health care coverage toasymp20 of the population inthe region it serves The KPSC membership is largely repre-sentative of the general population in Southern California withrespect to ethnicity age and sex but underrepresents the lowestand highest ends of the socioeconomic spectrum11 Physicianvisits hospitalizations andmedications are captured in datasetsthat are linkable by a medical record number

The validation cohort consisted of all persons ge18 years of agewith ge1 inpatient (IP) or outpatient (OP) encounter di-agnosis code for MS (ICD-9 code 340) who were KPSChealth plan members between 2008 and 2010 had at least 6months of continuous membership during the study periodand had complete electronic health records (n = 4851) Werandomly selected 2935 (table 1) of these potential cases forin-depth medical records review by an MS specialist (AL-G)including IP and OP records MRI scans and diagnostic testresults to confirm a diagnosis of MS9

ManitobaIn the Canadian province of Manitoba Manitoba HealthSeniors and Active Living is responsible for the delivery ofpublicly funded health care to nearly all residents Severaldatasets are maintained that capture demographic in-formation (sex dates of birth health care coverage anddeath) hospitalizations physician services and prescriptiondrug dispensations Diagnoses for physician services are

recorded with the clinical modification of ICD-9 (ICD-9-CM) codes while diagnoses for hospitalizations are codedwith ICD-9-CM or the Canadian version of ICD-10 codes(ICD-10-CA) depending on the year

The Manitoba validation cohort started with the 400 patientsge18 years of age who were used to develop administrative casedefinitions of MS in 20076 This cohort was augmented byreview of the medical records of all persons participating inthe MS Clinic Registry (trained abstractors under the super-vision of RAM) at the Winnipeg MS Clinic who had con-sented to linkage of their medical records to administrativedata (89 of those approached) as of June 2015 and met theinclusion criteria as described above This expanded the val-idation sample to 1654 patients (table 1)

SaskatchewanThis constituted the fourth independent site In the Canadianprovince of Saskatchewan health care is publicly funded as inManitoba Datasets similar to those in Manitoba capture de-mographic information (sex dates of birth health care cov-erage and death) hospitalizations physician services andprescription drug dispensations Diagnoses for physicianservices are recorded with ICD-9-CM codes while diagnosesfor hospitalizations are coded with ICD-9 or ICD-10-CAcodes depending on the year

As described elsewhere12 the validation cohort included 200randomly selected cases from the Saskatoon MS Clinic withdiagnoses confirmed by an MS specialist and 200 controls(table 1) from the Inpatient Rehabilitation Center databaseThis database specifically captures diagnoses of chronic dis-eases including MS13

MS case definition algorithmsWe developed candidate algorithms using ICD-9ICD-10codes for MS (340G35) and lists of MS-specific disease-modifying therapies (DMTs) To develop these definitionswe reviewed the literature on validated administrative casedefinitions for MS5612ndash15 From this review we consideredthe following (1) 1 encounter with an ICD-9ICD-10 codefor MS is too nonspecific because these may represent rule-out codes (2) increasing the number of health careencounters required generally increases specificity at the ex-pense of sensitivity and (3) prescription claims for MS-specific DMTs may be helpful to improve sensitivity speci-ficity or both when available Therefore the candidate algo-rithms (table 2) varied the number of IP and OP encounterswith MS listed as one of the diagnoses for that encounter thenumber of prescription claims for a DMT required to classifya person as an MS case and the time period over which thecase definition should be met We included candidate algo-rithms that did not use prescription claims because these arenot available in all AHC datasets Because we anticipatedapplying these definitions in AHC datasets in which enroll-ment in a health insurance plan might be limited to short timeperiods we considered only 1- or 2-year time periods over

Table 1 Demographic characteristics of theMS validationcohorts by data source

Primary Validation Cohorts

VA KPSC MB SK

Total cases n 3452 2935 1654 400

Sex n ()

Male 2451 (740) 725 (242) 415 (251) 197 (493)

Female 1001 (260) 2253 (758) 1239 (749) 203 (507)

Age groupn ()

18ndash24 y 118 (34) 105 (35) 123 (74) a

25ndash34 y 928 (269) 400 (135) 434 (262) 25 (625)

35ndash44 y 1093 (316) 662 (225) 554 (335) 46 (115)

45ndash54 y 833 (241) 769 (262) 381 (230) 76 (190)

55ndash64 y 302 (87) 669 (228) 118 (71) 83 (208)

65ndash74 y 131 (38) 243 (85) a 80 (200)

ge75 y 47 (14) 87 (30) a a

Abbreviations KPSC = Kaiser Permanente Southern California MB = Man-itoba Canada MS = multiple sclerosis SK = Saskatchewan Canada VA =Veterans AdministrationSK represents a general population cohorta Cells suppressed due to small numbers in accordancewith data access andprivacy requirements

e1018 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

which to define a case Each component of these case defi-nitions is described further below

IP encounterWe defined an IP encounter as an IP admission (overnightstay) for which MS was recorded as one of the diagnoses Toaccount for transfers between institutions and to avoid doublecounting we collapsed multiple overlapping hospital admis-sion and discharge records into 1 IP encounter In addition anIP encounter with an admission date that occurred within 24hours of a prior IP encounter was counted as a single IPencounter56

PhysicianOP encountersWe defined OP encounters as a visit to an OP clinic with anICD code for MS Multiple OP visits by 1 patient within thesame day were treated as 1 OP encounter In addition OPencounters occurring within an MS IP (admission to dis-charge date) were not counted as unique OP encounters

Disease-modifying therapiesWe defined prescription claims as dispensations for any of theMS-specific DMTs approved before or during the study pe-riod including interferon beta-1a-SC interferon beta-1a-IMinterferon beta-1b-SC glatiramer acetate fingolimod andnatalizumab as identified with National Drug Codes and DrugIdentification numbers (tables e-1 and e-2 available fromDryad doiorg105061dryad4c7s325) To avoid mis-classification claims for natalizumab were not included if theindividual also had an ICD code for inflammatory boweldisease a disorder for which this medication is approvedfor use

A caveat in defining IP and OP encounters was that the VAimposes a hierarchy to the coding such that the primary di-agnosis for a given encounter should be listed first followedin descending order of importance by other diagnoses af-fecting that episode of care Not all AHC datasets impose sucha hierarchy (eg KPSC) Our assessment of the effect of thiscoding hierarchy (see Appendix 2 available from Dryad foradditional methods and results doiorg105061dryad

4c7s325) resulted in using MS as a primary diagnosis in theVA data and MS as any diagnosis in the KPSC and Manitobadatasets when counting MS encounters

AnalysisWe applied the candidate algorithms in each of the 3 primaryvalidation cohorts We compared the performance of thecandidate algorithms with the reference standard diagnoses ofMS frommedical records using sensitivity specificity positivepredictive value (PPV) and negative predictive value (NPV)In addition interrater agreement was compared between thealgorithms derived from AHC and medical records data usingthe crude accuracy score and Youden16 J statistic We chosethe J statistic over the κ statistic1718 because it is designed foruse with dichotomous data and equally weights false-positivesand false-negatives We interpreted the J statistic like κ whenneither the algorithm nor the medical records were consid-ered the reference standard as slight (0ndash020) fair(021ndash040) moderate (041ndash060) substantial (061ndash080)and excellent (081ndash10)19

To determine whether the algorithms performed consistentlyacross sociodemographic subgroups we stratified the refer-ence population by (1) sex and (2) age group (18ndash24 25ndash3435ndash44 45ndash54 55ndash64 65ndash74 ge75 years) and repeated ouranalyses for the 4 algorithms that performed best across the 3AHC datasets The ultimate choice of an algorithm was basedon performance and pragmatic considerations including easeof application and interpretation

Because our 3 primary validation datasets were drawn frompatients with at least 1 encounter for MS (or demyelinatingdisease in Manitoba) not the general population we expectedthat the specificity and predictive values of the case definitionswould be artificially depressed compared to the performance inthe general population Therefore we tested our case definitionalgorithms in a fourth independent dataset in Saskatchewan14

which included individuals with andwithoutMS as noted above

Prevalence with limited durationascertainment periodsWe intended to apply our optimal algorithm to multiple AHCdatasets available in the United States20 to determine preva-lence in 2010 based on a 3-year ascertainment period How-ever limited periods of observation may substantially reducecase detection and underestimate prevalence as is wellrecognized in the literature21ndash23 Capocaccia et al22 describehow they dealt with adjusting data from cancer registries thathad been active for short durations when creating the Europe-wide project to estimate the prevalence of the most importantcancers (EUROPREVAL) They computed what they calleda completeness index for multiple observation periods usingdata from the Connecticut Cancer Registry because it hasbeen in operation for gt50 years In the 4 registries that hadonly 5 observation years the completeness index ranged from042 to 045 The completeness index improved systematicallyas the observation period increased They go on to describe

Table 2 Description of candidate algorithms for MSa

Case definition name Number and type of claims

MS_A ge2 IP or ge3 OP

MS_B ge2 IP or ge4 OP

MS_C ge2 IP or ge5 OP

MS_D ge2 IP or ge3 OP or ge1 DMT

MS_E (IP + OP + DMT) ge 3

Abbreviations DMT = disease modifying therapy IP = inpatient admissionMS = multiple sclerosis OP = outpatient visita The performance of each algorithmwas evaluated on the basis of both a 1-year and a 2-year time period

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1019

their model-based methods to estimate the complete preva-lence based on a limited period using age- and sex-specificincidence and survival rates for cancer22 This information islacking in the US MS population and cannot be accuratelyestimated from data of only 3 yearsrsquo duration23 Therefore toassess the effect of a short ascertainment period we used theapproaches used by Ng et al21 and Capocaccia et al22 andestimated the prevalence of MS using 3 years (2008ndash2010)and 10 years (1999ndash2010) of data in the VA and Manitobadatasets and compared the difference in these estimates in2010 Prevalence in each year was defined as all persons ge18years meeting our preferred algorithm definition of MS ([IP +OP + DMT] ge 3) among those alive during the calendar yearThe 3-year and 10-year prevalence estimates in 2010 werecumulative estimates based on all those who had met thedefinition at any point in the ascertainment period and whowere alive in 2010

To validate the above analyses we repeated these proceduresusing the PharMetrics Plus Health Plan Claims Dataset fromIntercontinental Marketing Services (IMS) which was notused to develop the algorithm Here we compared the esti-mated prevalence derived from 3 years (2013ndash2015) and 9years (the longest period available) ending in 2015 The IMSdataset captures health care claims from asymp120 health plansacross the United States accounting for ge42 million coveredlives as of 2011 mirrors the demographics of the US Censuspopulation and has been used for national and regionalbenchmarking of health care use and cost2024 The datacontained in this dataset are comparable to the data in vali-dation datasets that included ICD-9-CM diagnosis and IPprocedure codes provider codes Current Procedural Ter-minology (OP procedures) codes National Drug Code placeand date of service enrollment date(s) sex year of birth andUS Census region We report 95 confidence intervals (CIs)25 for the ratio of the prevalence estimates of longer vs shorterintervals Although these prevalence estimates are highlycorrelated which would narrow these intervals substantiallywe report uncorrected 95 CIs to be conservative

Statistical analyses were conducted with either SAS version94 (SAS Institute Inc Cary NC) or SPSS version 22 (IBMInc Armonk NY)

Data availabilityThe datasets used in this study constitute secondary analyses ofdata from prior studies Therefore there are no data-sharingagreements in place for these data The exception concerns theIMS data which are a limited-use dataset provided gratis toUniversity of Maryland Baltimore faculty (WJC) and datasharing is not allowed under that agreement

ResultsThe characteristics of the 3 primary validation cohorts arerepresentative of the general MS population (table 1)56101326

As expected there was a higher proportion of males in the VAdata compared with KPSC and Manitoba but the age dis-tributions were similar across cohorts

Initial validation of algorithmGenerally when an algorithm was applied using 1 year of datathe sensitivity was lower than when applied using 2 years ofdata while specificity was higher when applied using 1 year ofdata (table 3) The more stringent the case definition that isthe greater the number of claims required was the lower thesensitivity was and the higher the specificity was The additionof prescription claims for DMT to an algorithm increasedsensitivity without a loss of specificity leading to better per-formance overall

Algorithm MS_E ([IP + OP + DMT] ge 3) had the highestsensitivity for both time periods In contrast the algorithm withthe highest specificity wasMS_C (ge2 IP orge5OP) in the 1-yearand 2-year time periods The PPVs were high for all algorithmswhereas the NPVs were moderately low suggesting that thealgorithms were better at ruling in cases of MS than ruling outnon-MS cases (in a population with at least 1 claim for MS)

The proportion of misclassified cases (false-positives and false-negatives) was relatively low as reflected in the moderatelyhigh to high accuracy scores algorithm MS_E had the highestaccuracy over both time periods The Youden J statistic mir-rored the crude accuracy scores revealing a substantial level ofagreement between each algorithm and the reference standardAlgorithm MS_E had the highest or second highest J statisticfor the 1-year time period with scores of ge060

We performed the stratified analyses on only the algorithmsthat included DMTs because these were the best-performingdefinitions (MS_D and MS_E in tables 3) Because thestratified results were nearly identical across these 4 casedefinitions we present only the results for MS_E-1 ([IP + OP+ DMT] ge 3) for the 1-year time period (figures 1 and 2)

The test characteristics remained stable by sex (men figure 1Bwomen figure 1C) including sensitivity specificity PPV andaccuracy In the VA the NPV and J statistic were lower amongwomen compared to men but were stable across sexes in theKPSC and Manitoba datasets Similarly sensitivity PPV andaccuracy were relatively consistent across age groups andsomewhat stable for specificity TheNPV and J statistic showedthe greatest variation however despite this variability theMS_E-1 algorithm performed well across datasets and age groupsThus given its consistent and superior performance algorithmMS_E-1 was chosen as the preferred algorithm

Validation in the general populationThe characteristics of the Saskatchewan validation cohort werecomparable to those of the other cohorts When we applied thealgorithms to the Saskatchewan general population specific-ities and PPVs were all ge096 and the NPVs were all ge091 Ofthe algorithms using 1 year of data algorithm MS_E-1

e1020 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

performed the best along with algorithm MS_D-1 For pre-ferred algorithmMS_E the sensitivity was 096 specificity was099 PPVwas 099 NPVwas 096 and Youden J was 096 Oneof the 2-year algorithms MS_D-2 also had a specificity of 099but a marginally higher sensitivity of 097

Corrections for limited-durationascertainment periodsFigure 3 compares estimated prevalence based on a 3-year(2008ndash2010) ascertainment period to that from 10 years(2000ndash2010) of data as of 2010 in the VA and Manitoba

Table 3 Summary of test statistics for each algorithm based on years of data for each data source

MS algorithm Data source Sensitivity Specificity PPV NPV Accuracy Youden J

1-y epoch (denoted by 21)

MS_A-1 ge2 IP or ge3 OP VA 861 825 978 396 857 069

KPSC 789 749 955 342 783 054

MB 897 672 953 465 870 057

MS_B-1 ge2 IP or ge4 OP VA 815 895 986 348 823 071

KPSC 669 823 964 269 690 050

MB 790 779 964 332 789 057

MS_C-1 ge2 IP or ge5 OP VA 761 904 986 294 775 067

KPSC 553 874 967 224 595 043

MB 685 856 973 267 705 054

MS_D-1 ge2 IP or ge3 OP or ge1 DMT VA 866 825 978 404 862 069

KPSC 874 730 957 461 856 060

MB 932 667 954 568 901 060

MS_E-1 (IP + OP + DMT) ge 3 VA 872 822 978 414 867 069

KPSC 855 762 961 436 843 062

MB 934 661 954 568 901 060

2-y epoch (denoted by 22)

MS_A-2 ge2 IP or ge3 OP VA 883 787 974 426 873 067

KPSC 834 707 951 385 818 051

MB 932 610 947 546 894 054

MS_B-2 ge2 IP or ge4 OP VA 860 848 981 400 859 071

KPSC 747 798 962 318 754 055

MB 876 697 956 429 855 057

MS_C-2 ge2 IP or ge5 OP VA 835 880 964 371 840 072

KPSC 662 851 968 270 686 051

MB 817 805 969 370 816 062

MS_D-2 ge2 IP or ge3 OP or ge1 DMT VA 885 787 974 430 875 067

KPSC 896 688 951 493 869 058

MB 950 610 948 620 910 056

MS_ E-2 (IP + OP + DMT) ge 3 VA 891 784 974 442 880 068

KPSC 881 709 954 466 859 059

MB 951 600 947 619 909 055

Abbreviations DMT = disease modifying therapy IP = inpatient admission KPSC =Kaiser Permanente Southern California MB = Manitoba Canada MS =multiple sclerosis NPV = negative predictive value OP = outpatient visit PPV = positive predictive value VA = Veterans Administration

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1021

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 4: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

which to define a case Each component of these case defi-nitions is described further below

IP encounterWe defined an IP encounter as an IP admission (overnightstay) for which MS was recorded as one of the diagnoses Toaccount for transfers between institutions and to avoid doublecounting we collapsed multiple overlapping hospital admis-sion and discharge records into 1 IP encounter In addition anIP encounter with an admission date that occurred within 24hours of a prior IP encounter was counted as a single IPencounter56

PhysicianOP encountersWe defined OP encounters as a visit to an OP clinic with anICD code for MS Multiple OP visits by 1 patient within thesame day were treated as 1 OP encounter In addition OPencounters occurring within an MS IP (admission to dis-charge date) were not counted as unique OP encounters

Disease-modifying therapiesWe defined prescription claims as dispensations for any of theMS-specific DMTs approved before or during the study pe-riod including interferon beta-1a-SC interferon beta-1a-IMinterferon beta-1b-SC glatiramer acetate fingolimod andnatalizumab as identified with National Drug Codes and DrugIdentification numbers (tables e-1 and e-2 available fromDryad doiorg105061dryad4c7s325) To avoid mis-classification claims for natalizumab were not included if theindividual also had an ICD code for inflammatory boweldisease a disorder for which this medication is approvedfor use

A caveat in defining IP and OP encounters was that the VAimposes a hierarchy to the coding such that the primary di-agnosis for a given encounter should be listed first followedin descending order of importance by other diagnoses af-fecting that episode of care Not all AHC datasets impose sucha hierarchy (eg KPSC) Our assessment of the effect of thiscoding hierarchy (see Appendix 2 available from Dryad foradditional methods and results doiorg105061dryad

4c7s325) resulted in using MS as a primary diagnosis in theVA data and MS as any diagnosis in the KPSC and Manitobadatasets when counting MS encounters

AnalysisWe applied the candidate algorithms in each of the 3 primaryvalidation cohorts We compared the performance of thecandidate algorithms with the reference standard diagnoses ofMS frommedical records using sensitivity specificity positivepredictive value (PPV) and negative predictive value (NPV)In addition interrater agreement was compared between thealgorithms derived from AHC and medical records data usingthe crude accuracy score and Youden16 J statistic We chosethe J statistic over the κ statistic1718 because it is designed foruse with dichotomous data and equally weights false-positivesand false-negatives We interpreted the J statistic like κ whenneither the algorithm nor the medical records were consid-ered the reference standard as slight (0ndash020) fair(021ndash040) moderate (041ndash060) substantial (061ndash080)and excellent (081ndash10)19

To determine whether the algorithms performed consistentlyacross sociodemographic subgroups we stratified the refer-ence population by (1) sex and (2) age group (18ndash24 25ndash3435ndash44 45ndash54 55ndash64 65ndash74 ge75 years) and repeated ouranalyses for the 4 algorithms that performed best across the 3AHC datasets The ultimate choice of an algorithm was basedon performance and pragmatic considerations including easeof application and interpretation

Because our 3 primary validation datasets were drawn frompatients with at least 1 encounter for MS (or demyelinatingdisease in Manitoba) not the general population we expectedthat the specificity and predictive values of the case definitionswould be artificially depressed compared to the performance inthe general population Therefore we tested our case definitionalgorithms in a fourth independent dataset in Saskatchewan14

which included individuals with andwithoutMS as noted above

Prevalence with limited durationascertainment periodsWe intended to apply our optimal algorithm to multiple AHCdatasets available in the United States20 to determine preva-lence in 2010 based on a 3-year ascertainment period How-ever limited periods of observation may substantially reducecase detection and underestimate prevalence as is wellrecognized in the literature21ndash23 Capocaccia et al22 describehow they dealt with adjusting data from cancer registries thathad been active for short durations when creating the Europe-wide project to estimate the prevalence of the most importantcancers (EUROPREVAL) They computed what they calleda completeness index for multiple observation periods usingdata from the Connecticut Cancer Registry because it hasbeen in operation for gt50 years In the 4 registries that hadonly 5 observation years the completeness index ranged from042 to 045 The completeness index improved systematicallyas the observation period increased They go on to describe

Table 2 Description of candidate algorithms for MSa

Case definition name Number and type of claims

MS_A ge2 IP or ge3 OP

MS_B ge2 IP or ge4 OP

MS_C ge2 IP or ge5 OP

MS_D ge2 IP or ge3 OP or ge1 DMT

MS_E (IP + OP + DMT) ge 3

Abbreviations DMT = disease modifying therapy IP = inpatient admissionMS = multiple sclerosis OP = outpatient visita The performance of each algorithmwas evaluated on the basis of both a 1-year and a 2-year time period

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1019

their model-based methods to estimate the complete preva-lence based on a limited period using age- and sex-specificincidence and survival rates for cancer22 This information islacking in the US MS population and cannot be accuratelyestimated from data of only 3 yearsrsquo duration23 Therefore toassess the effect of a short ascertainment period we used theapproaches used by Ng et al21 and Capocaccia et al22 andestimated the prevalence of MS using 3 years (2008ndash2010)and 10 years (1999ndash2010) of data in the VA and Manitobadatasets and compared the difference in these estimates in2010 Prevalence in each year was defined as all persons ge18years meeting our preferred algorithm definition of MS ([IP +OP + DMT] ge 3) among those alive during the calendar yearThe 3-year and 10-year prevalence estimates in 2010 werecumulative estimates based on all those who had met thedefinition at any point in the ascertainment period and whowere alive in 2010

To validate the above analyses we repeated these proceduresusing the PharMetrics Plus Health Plan Claims Dataset fromIntercontinental Marketing Services (IMS) which was notused to develop the algorithm Here we compared the esti-mated prevalence derived from 3 years (2013ndash2015) and 9years (the longest period available) ending in 2015 The IMSdataset captures health care claims from asymp120 health plansacross the United States accounting for ge42 million coveredlives as of 2011 mirrors the demographics of the US Censuspopulation and has been used for national and regionalbenchmarking of health care use and cost2024 The datacontained in this dataset are comparable to the data in vali-dation datasets that included ICD-9-CM diagnosis and IPprocedure codes provider codes Current Procedural Ter-minology (OP procedures) codes National Drug Code placeand date of service enrollment date(s) sex year of birth andUS Census region We report 95 confidence intervals (CIs)25 for the ratio of the prevalence estimates of longer vs shorterintervals Although these prevalence estimates are highlycorrelated which would narrow these intervals substantiallywe report uncorrected 95 CIs to be conservative

Statistical analyses were conducted with either SAS version94 (SAS Institute Inc Cary NC) or SPSS version 22 (IBMInc Armonk NY)

Data availabilityThe datasets used in this study constitute secondary analyses ofdata from prior studies Therefore there are no data-sharingagreements in place for these data The exception concerns theIMS data which are a limited-use dataset provided gratis toUniversity of Maryland Baltimore faculty (WJC) and datasharing is not allowed under that agreement

ResultsThe characteristics of the 3 primary validation cohorts arerepresentative of the general MS population (table 1)56101326

As expected there was a higher proportion of males in the VAdata compared with KPSC and Manitoba but the age dis-tributions were similar across cohorts

Initial validation of algorithmGenerally when an algorithm was applied using 1 year of datathe sensitivity was lower than when applied using 2 years ofdata while specificity was higher when applied using 1 year ofdata (table 3) The more stringent the case definition that isthe greater the number of claims required was the lower thesensitivity was and the higher the specificity was The additionof prescription claims for DMT to an algorithm increasedsensitivity without a loss of specificity leading to better per-formance overall

Algorithm MS_E ([IP + OP + DMT] ge 3) had the highestsensitivity for both time periods In contrast the algorithm withthe highest specificity wasMS_C (ge2 IP orge5OP) in the 1-yearand 2-year time periods The PPVs were high for all algorithmswhereas the NPVs were moderately low suggesting that thealgorithms were better at ruling in cases of MS than ruling outnon-MS cases (in a population with at least 1 claim for MS)

The proportion of misclassified cases (false-positives and false-negatives) was relatively low as reflected in the moderatelyhigh to high accuracy scores algorithm MS_E had the highestaccuracy over both time periods The Youden J statistic mir-rored the crude accuracy scores revealing a substantial level ofagreement between each algorithm and the reference standardAlgorithm MS_E had the highest or second highest J statisticfor the 1-year time period with scores of ge060

We performed the stratified analyses on only the algorithmsthat included DMTs because these were the best-performingdefinitions (MS_D and MS_E in tables 3) Because thestratified results were nearly identical across these 4 casedefinitions we present only the results for MS_E-1 ([IP + OP+ DMT] ge 3) for the 1-year time period (figures 1 and 2)

The test characteristics remained stable by sex (men figure 1Bwomen figure 1C) including sensitivity specificity PPV andaccuracy In the VA the NPV and J statistic were lower amongwomen compared to men but were stable across sexes in theKPSC and Manitoba datasets Similarly sensitivity PPV andaccuracy were relatively consistent across age groups andsomewhat stable for specificity TheNPV and J statistic showedthe greatest variation however despite this variability theMS_E-1 algorithm performed well across datasets and age groupsThus given its consistent and superior performance algorithmMS_E-1 was chosen as the preferred algorithm

Validation in the general populationThe characteristics of the Saskatchewan validation cohort werecomparable to those of the other cohorts When we applied thealgorithms to the Saskatchewan general population specific-ities and PPVs were all ge096 and the NPVs were all ge091 Ofthe algorithms using 1 year of data algorithm MS_E-1

e1020 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

performed the best along with algorithm MS_D-1 For pre-ferred algorithmMS_E the sensitivity was 096 specificity was099 PPVwas 099 NPVwas 096 and Youden J was 096 Oneof the 2-year algorithms MS_D-2 also had a specificity of 099but a marginally higher sensitivity of 097

Corrections for limited-durationascertainment periodsFigure 3 compares estimated prevalence based on a 3-year(2008ndash2010) ascertainment period to that from 10 years(2000ndash2010) of data as of 2010 in the VA and Manitoba

Table 3 Summary of test statistics for each algorithm based on years of data for each data source

MS algorithm Data source Sensitivity Specificity PPV NPV Accuracy Youden J

1-y epoch (denoted by 21)

MS_A-1 ge2 IP or ge3 OP VA 861 825 978 396 857 069

KPSC 789 749 955 342 783 054

MB 897 672 953 465 870 057

MS_B-1 ge2 IP or ge4 OP VA 815 895 986 348 823 071

KPSC 669 823 964 269 690 050

MB 790 779 964 332 789 057

MS_C-1 ge2 IP or ge5 OP VA 761 904 986 294 775 067

KPSC 553 874 967 224 595 043

MB 685 856 973 267 705 054

MS_D-1 ge2 IP or ge3 OP or ge1 DMT VA 866 825 978 404 862 069

KPSC 874 730 957 461 856 060

MB 932 667 954 568 901 060

MS_E-1 (IP + OP + DMT) ge 3 VA 872 822 978 414 867 069

KPSC 855 762 961 436 843 062

MB 934 661 954 568 901 060

2-y epoch (denoted by 22)

MS_A-2 ge2 IP or ge3 OP VA 883 787 974 426 873 067

KPSC 834 707 951 385 818 051

MB 932 610 947 546 894 054

MS_B-2 ge2 IP or ge4 OP VA 860 848 981 400 859 071

KPSC 747 798 962 318 754 055

MB 876 697 956 429 855 057

MS_C-2 ge2 IP or ge5 OP VA 835 880 964 371 840 072

KPSC 662 851 968 270 686 051

MB 817 805 969 370 816 062

MS_D-2 ge2 IP or ge3 OP or ge1 DMT VA 885 787 974 430 875 067

KPSC 896 688 951 493 869 058

MB 950 610 948 620 910 056

MS_ E-2 (IP + OP + DMT) ge 3 VA 891 784 974 442 880 068

KPSC 881 709 954 466 859 059

MB 951 600 947 619 909 055

Abbreviations DMT = disease modifying therapy IP = inpatient admission KPSC =Kaiser Permanente Southern California MB = Manitoba Canada MS =multiple sclerosis NPV = negative predictive value OP = outpatient visit PPV = positive predictive value VA = Veterans Administration

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1021

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 5: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

their model-based methods to estimate the complete preva-lence based on a limited period using age- and sex-specificincidence and survival rates for cancer22 This information islacking in the US MS population and cannot be accuratelyestimated from data of only 3 yearsrsquo duration23 Therefore toassess the effect of a short ascertainment period we used theapproaches used by Ng et al21 and Capocaccia et al22 andestimated the prevalence of MS using 3 years (2008ndash2010)and 10 years (1999ndash2010) of data in the VA and Manitobadatasets and compared the difference in these estimates in2010 Prevalence in each year was defined as all persons ge18years meeting our preferred algorithm definition of MS ([IP +OP + DMT] ge 3) among those alive during the calendar yearThe 3-year and 10-year prevalence estimates in 2010 werecumulative estimates based on all those who had met thedefinition at any point in the ascertainment period and whowere alive in 2010

To validate the above analyses we repeated these proceduresusing the PharMetrics Plus Health Plan Claims Dataset fromIntercontinental Marketing Services (IMS) which was notused to develop the algorithm Here we compared the esti-mated prevalence derived from 3 years (2013ndash2015) and 9years (the longest period available) ending in 2015 The IMSdataset captures health care claims from asymp120 health plansacross the United States accounting for ge42 million coveredlives as of 2011 mirrors the demographics of the US Censuspopulation and has been used for national and regionalbenchmarking of health care use and cost2024 The datacontained in this dataset are comparable to the data in vali-dation datasets that included ICD-9-CM diagnosis and IPprocedure codes provider codes Current Procedural Ter-minology (OP procedures) codes National Drug Code placeand date of service enrollment date(s) sex year of birth andUS Census region We report 95 confidence intervals (CIs)25 for the ratio of the prevalence estimates of longer vs shorterintervals Although these prevalence estimates are highlycorrelated which would narrow these intervals substantiallywe report uncorrected 95 CIs to be conservative

Statistical analyses were conducted with either SAS version94 (SAS Institute Inc Cary NC) or SPSS version 22 (IBMInc Armonk NY)

Data availabilityThe datasets used in this study constitute secondary analyses ofdata from prior studies Therefore there are no data-sharingagreements in place for these data The exception concerns theIMS data which are a limited-use dataset provided gratis toUniversity of Maryland Baltimore faculty (WJC) and datasharing is not allowed under that agreement

ResultsThe characteristics of the 3 primary validation cohorts arerepresentative of the general MS population (table 1)56101326

As expected there was a higher proportion of males in the VAdata compared with KPSC and Manitoba but the age dis-tributions were similar across cohorts

Initial validation of algorithmGenerally when an algorithm was applied using 1 year of datathe sensitivity was lower than when applied using 2 years ofdata while specificity was higher when applied using 1 year ofdata (table 3) The more stringent the case definition that isthe greater the number of claims required was the lower thesensitivity was and the higher the specificity was The additionof prescription claims for DMT to an algorithm increasedsensitivity without a loss of specificity leading to better per-formance overall

Algorithm MS_E ([IP + OP + DMT] ge 3) had the highestsensitivity for both time periods In contrast the algorithm withthe highest specificity wasMS_C (ge2 IP orge5OP) in the 1-yearand 2-year time periods The PPVs were high for all algorithmswhereas the NPVs were moderately low suggesting that thealgorithms were better at ruling in cases of MS than ruling outnon-MS cases (in a population with at least 1 claim for MS)

The proportion of misclassified cases (false-positives and false-negatives) was relatively low as reflected in the moderatelyhigh to high accuracy scores algorithm MS_E had the highestaccuracy over both time periods The Youden J statistic mir-rored the crude accuracy scores revealing a substantial level ofagreement between each algorithm and the reference standardAlgorithm MS_E had the highest or second highest J statisticfor the 1-year time period with scores of ge060

We performed the stratified analyses on only the algorithmsthat included DMTs because these were the best-performingdefinitions (MS_D and MS_E in tables 3) Because thestratified results were nearly identical across these 4 casedefinitions we present only the results for MS_E-1 ([IP + OP+ DMT] ge 3) for the 1-year time period (figures 1 and 2)

The test characteristics remained stable by sex (men figure 1Bwomen figure 1C) including sensitivity specificity PPV andaccuracy In the VA the NPV and J statistic were lower amongwomen compared to men but were stable across sexes in theKPSC and Manitoba datasets Similarly sensitivity PPV andaccuracy were relatively consistent across age groups andsomewhat stable for specificity TheNPV and J statistic showedthe greatest variation however despite this variability theMS_E-1 algorithm performed well across datasets and age groupsThus given its consistent and superior performance algorithmMS_E-1 was chosen as the preferred algorithm

Validation in the general populationThe characteristics of the Saskatchewan validation cohort werecomparable to those of the other cohorts When we applied thealgorithms to the Saskatchewan general population specific-ities and PPVs were all ge096 and the NPVs were all ge091 Ofthe algorithms using 1 year of data algorithm MS_E-1

e1020 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

performed the best along with algorithm MS_D-1 For pre-ferred algorithmMS_E the sensitivity was 096 specificity was099 PPVwas 099 NPVwas 096 and Youden J was 096 Oneof the 2-year algorithms MS_D-2 also had a specificity of 099but a marginally higher sensitivity of 097

Corrections for limited-durationascertainment periodsFigure 3 compares estimated prevalence based on a 3-year(2008ndash2010) ascertainment period to that from 10 years(2000ndash2010) of data as of 2010 in the VA and Manitoba

Table 3 Summary of test statistics for each algorithm based on years of data for each data source

MS algorithm Data source Sensitivity Specificity PPV NPV Accuracy Youden J

1-y epoch (denoted by 21)

MS_A-1 ge2 IP or ge3 OP VA 861 825 978 396 857 069

KPSC 789 749 955 342 783 054

MB 897 672 953 465 870 057

MS_B-1 ge2 IP or ge4 OP VA 815 895 986 348 823 071

KPSC 669 823 964 269 690 050

MB 790 779 964 332 789 057

MS_C-1 ge2 IP or ge5 OP VA 761 904 986 294 775 067

KPSC 553 874 967 224 595 043

MB 685 856 973 267 705 054

MS_D-1 ge2 IP or ge3 OP or ge1 DMT VA 866 825 978 404 862 069

KPSC 874 730 957 461 856 060

MB 932 667 954 568 901 060

MS_E-1 (IP + OP + DMT) ge 3 VA 872 822 978 414 867 069

KPSC 855 762 961 436 843 062

MB 934 661 954 568 901 060

2-y epoch (denoted by 22)

MS_A-2 ge2 IP or ge3 OP VA 883 787 974 426 873 067

KPSC 834 707 951 385 818 051

MB 932 610 947 546 894 054

MS_B-2 ge2 IP or ge4 OP VA 860 848 981 400 859 071

KPSC 747 798 962 318 754 055

MB 876 697 956 429 855 057

MS_C-2 ge2 IP or ge5 OP VA 835 880 964 371 840 072

KPSC 662 851 968 270 686 051

MB 817 805 969 370 816 062

MS_D-2 ge2 IP or ge3 OP or ge1 DMT VA 885 787 974 430 875 067

KPSC 896 688 951 493 869 058

MB 950 610 948 620 910 056

MS_ E-2 (IP + OP + DMT) ge 3 VA 891 784 974 442 880 068

KPSC 881 709 954 466 859 059

MB 951 600 947 619 909 055

Abbreviations DMT = disease modifying therapy IP = inpatient admission KPSC =Kaiser Permanente Southern California MB = Manitoba Canada MS =multiple sclerosis NPV = negative predictive value OP = outpatient visit PPV = positive predictive value VA = Veterans Administration

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1021

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 6: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

performed the best along with algorithm MS_D-1 For pre-ferred algorithmMS_E the sensitivity was 096 specificity was099 PPVwas 099 NPVwas 096 and Youden J was 096 Oneof the 2-year algorithms MS_D-2 also had a specificity of 099but a marginally higher sensitivity of 097

Corrections for limited-durationascertainment periodsFigure 3 compares estimated prevalence based on a 3-year(2008ndash2010) ascertainment period to that from 10 years(2000ndash2010) of data as of 2010 in the VA and Manitoba

Table 3 Summary of test statistics for each algorithm based on years of data for each data source

MS algorithm Data source Sensitivity Specificity PPV NPV Accuracy Youden J

1-y epoch (denoted by 21)

MS_A-1 ge2 IP or ge3 OP VA 861 825 978 396 857 069

KPSC 789 749 955 342 783 054

MB 897 672 953 465 870 057

MS_B-1 ge2 IP or ge4 OP VA 815 895 986 348 823 071

KPSC 669 823 964 269 690 050

MB 790 779 964 332 789 057

MS_C-1 ge2 IP or ge5 OP VA 761 904 986 294 775 067

KPSC 553 874 967 224 595 043

MB 685 856 973 267 705 054

MS_D-1 ge2 IP or ge3 OP or ge1 DMT VA 866 825 978 404 862 069

KPSC 874 730 957 461 856 060

MB 932 667 954 568 901 060

MS_E-1 (IP + OP + DMT) ge 3 VA 872 822 978 414 867 069

KPSC 855 762 961 436 843 062

MB 934 661 954 568 901 060

2-y epoch (denoted by 22)

MS_A-2 ge2 IP or ge3 OP VA 883 787 974 426 873 067

KPSC 834 707 951 385 818 051

MB 932 610 947 546 894 054

MS_B-2 ge2 IP or ge4 OP VA 860 848 981 400 859 071

KPSC 747 798 962 318 754 055

MB 876 697 956 429 855 057

MS_C-2 ge2 IP or ge5 OP VA 835 880 964 371 840 072

KPSC 662 851 968 270 686 051

MB 817 805 969 370 816 062

MS_D-2 ge2 IP or ge3 OP or ge1 DMT VA 885 787 974 430 875 067

KPSC 896 688 951 493 869 058

MB 950 610 948 620 910 056

MS_ E-2 (IP + OP + DMT) ge 3 VA 891 784 974 442 880 068

KPSC 881 709 954 466 859 059

MB 951 600 947 619 909 055

Abbreviations DMT = disease modifying therapy IP = inpatient admission KPSC =Kaiser Permanente Southern California MB = Manitoba Canada MS =multiple sclerosis NPV = negative predictive value OP = outpatient visit PPV = positive predictive value VA = Veterans Administration

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1021

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 7: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

datasets The underestimation of a 3-year ascertainmentperiod ranged from 37 (95 CI 13ndash66) in the VAdataset to 47 (95 CI 23ndash76) in the Manitoba datasetIn the IMS (validation) dataset the underestimation ofa 3-year (2013ndash2015) ascertainment period compared to

a 9-year period (2007ndash2015) as of 2015 was 39 (95 CI13ndash71) The estimated prevalence of MS rose steeply upto about 2010 in all 3 datasets (figures 3 and 4) and thenincreased an average of 23y thereafter (21 in the VAfigure 3A 25 in the IMS figure 4)

Figure 1 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by sex across the VA KPSC and MB datasets

(A) Men and women (B) men only and (C) women onlyData arepresented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Perma-nente Southern California MB =ManitobaMS =multiple sclerosisNPV = negative predictive value OP = outpatient PPV = positivepredictive value VA = Department of Veterans Affairs

e1022 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 8: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

DiscussionWe undertook this study to identify an algorithm that wouldaccurately and consistently identify cases of MS across dis-parate AHC datasets representing different types of health

care systems and geographic regions The consistency of thefindings across these datasets supports the generalizability ofthe algorithms to other AHC datasets in the United Stateswhen such validation studies are not feasible The preferredalgorithm (MS_E-1) required ge3 encounters (IP +OP+DMT)

Figure 2 Performance of algorithm MS_E-1 [(IP + OP + DMT) ge 3] stratified by age group across the VA KPSC and MBdatasets

(A) Sensitivity (B) specificity (C) positive predictive value (D) negative predictive value (E) accuracy and (F) Youden J statisticData fromManitoba (MB) for the55- to 64- and 64- to 74-year age groups are suppressed due to small cell sizes Data are presented as a proportion that can range between 0 and 1 DMT =disease-modifying therapy IP = inpatient KPSC = Kaiser Permanente Southern California MS = multiple sclerosis NPV = negative predictive value OP =outpatient PPV = positive predictive value VA = Department of Veterans Affairs

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1023

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 9: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

for MS in a 1-year period Algorithms that did not includeDMT performed less well than those that did However weincluded the non-DMT algorithms to assess their ability toaccurately identify cases of MS in studies in which pharmacy

claims data might not be available MS_A-1 performed nearlyas well as the preferred algorithm (MS_E-1) and could beused when DMT is not available Thus by testing a variety ofalgorithms our study provides value and the findings can be

Figure 3 Comparison of prevalence based on a 3- vs 10-year ascertainment period as of 2010 in the (A) VA and (B) MBdatasets

CI = confidence interval MB = Manitoba VA =Department of Veterans Affairs

Figure 4 Comparison of prevalence based on a 3- vs 9-year ascertainment period as of 2015 in the IMS (validation) dataset

CI = confidence interval IMS = IntercontinentalMarketing Services

e1024 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 10: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

applied in future research into the prevalence of MS acrossjurisdictions

Algorithm MS E-1 ([IP + OP + DMT] ge 3) was preferredbecause of its consistent high performance and ease ofapplication We hereafter refer to it as the MS algorithmThe 1-year algorithm readily allows annually updating ofprevalence estimates in datasets with limited years of dataSensitivity and PPV were high consistent with previouslypublished definitions developed with similar methods inthe United States (VA) and Canada561213 Specificity inour primary validation cohorts was modest reflecting thatour study cohorts were selected on the basis of having atleast 1 diagnosis for MS Tonelli et al27 evaluated algo-rithms to measure 40 comorbid conditions in AHC datafrom Calgary Alberta Canada They defined an algorithmas valid if both sensitivity and PPV were ge070 compared toa reference cohort They did not consider specificity orNPV because they are generally ge090 in the assessment ofchronic disease in the general population Consistent withthese observations and recommendations when the MSalgorithm was applied to the general population in Sas-katchewan sensitivity (096) specificity (099) and PPV(099) markedly improved These findings support ourpreferred algorithm as a valid method for identifying casesof MS in AHC datasets

Minor variation in performance was observed whenstratified by sex and age However accuracy remainedhigh and the Youden J statistic remained substantialThese findings indicate that the MS algorithm was an ac-curate and reliable algorithm for identifying cases of MSin these disparate datasets and outperforms similar algo-rithms for identifying amyotrophic lateral sclerosis28 andParkinson disease29

When using AHC data it is important for investigators todetermine whether there is any hierarchy to the way thatdiagnosis codes are recorded because the performance of analgorithm can be adversely affected In the VA such a codinghierarchy exists and when encounters with MS in any di-agnosis position were used vs in the primary diagnosis posi-tion specificity was compromised as was the Youden Jstatistic Surprisingly accuracy was only slightly reduced butwould likely be more adversely affected when applied to anentire dataset as opposed to the subset of cases with a di-agnosis code for MS as used in this study

Prior reports have defined cohorts in claims data based ona single encounter with MS listed as one of the diagnoses3031

This strategy can result in the inclusion of a substantialnumber of false-positives Prior work on the VA algorithm5

showed that a rigorous algorithm (similar to the MS algo-rithm) ruled out 43 of the cases who had only a single MSencounter This guided our decision to require multipleencounters as part of all of our algorithms We strongly en-courage other investigators to avoid the use of a single MS

encounter to identify an MS cohort because depending onthe proportion of false-positives included results could besubstantially biased

We found that prevalence estimates varied with the dura-tion of the observation period in 3 different datasets (2from the United States and 1 from Manitoba Canada)suggesting that observation periods should be maximizedto optimize case detection Specifically we found that anobservation period of 3 years underestimated the preva-lence of MS by 37 (95 CI 13ndash66) in the VA datasetand 47 (95 CI 23ndash76) in the Manitoba dataset Itshould be noted that the 95 CI for these adjustmentfactors overlapped implying that the growth range isnonsignificantly different for these 2 datasets The findingsin the IMS (validation) dataset corroborated results fromthe VA and suggest that a 3-year ascertainment periodunderestimates the 10-year prevalence in US datasets byasymp38 regardless of the final year of the time period ofinterest (2010 vs 2015) This is consistent with previousfindings in Quebec Canada regarding systemic lupuserythematosus a chronic disease like MS for which healthcare use may vary over time or with disease activity21 Inthat study when a 3-year period was used prevalence wasunderestimated by 66 compared with a 15-year periodwhile a 5-year period underestimated prevalence by 39Similar results have been reported for the assessment of theprevalence of cancer in Europe22 Registries with 5 years ofdata underestimated prevalence by 42 to 45 as theduration of the registry increased the underestimation ofprevalence decreased systematically For investigatorswithout access to many years of data these findings suggestthat an adjustment factor should be applied to obtaina more stable and accurate prevalence estimate The dif-ference between the US (VA and IMS) and Manitobadatasets likely results because Manitoba has a universalhealth system with a single government payer and its AHCdatasets capture gt98 of the population6 whereas the UShealth care system involves multiple insurance payersmaking it more difficult to achieve complete ascertainmentof cases Thus the underestimation value of 37 providesa conservative (ie lower bound) inflation factor whereasan inflation factor of 47 based on the Manitoba data likelybetter reflects what might be expected with nearly fullcapture of the US population over a 10-year period(ie upper bound)

This study included a limited number of AHC datasets thatwere restricted to North America which may limit general-izability to datasets from other countries However thedatasets that were included provided a diverse sampling ofthe MS population The IMS validation dataset did not coverthe same time period as the VA and Manitoba datasetshowever the 3-year underestimation in prevalence was nearlythe same as the VA dataset despite differences in the timeperiods assessed Our algorithms relied on ICD-9-CM coding(except for IP encounters in the Manitoba dataset which

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1025

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 11: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

used ICD-10-CM coding from 2004 onward) and have notbeen validated with ICD-10-CM coding It is unlikely thealgorithms will perform differently when ICD-10-CM codingis used because MS is still represented by a single 3-digit code

We conducted a rigorous assessment of algorithms to identifycases of MS in several different AHC datasets The algorithmof choice required the sum of ge3 MS-related IP admissionsMS-related OP encounters or pharmacy claims for a DMTwithin a calendar year Furthermore this algorithm performedremarkably well across different datasets and demographicgroups We propose that this MS algorithm be adopted forfuture MS-related studies using AHC datasets to enhancecomparability across studies

AcknowledgmentThe authors thank the following members of the US MultipleSclerosis Prevalence Workgroup for their contributions AlbertLo Robert McBurney Oleg Muravov Bari Talente SteveBuka and Leslie Ritter The authors also thank the followingkey staff members at the National MS Society for theircontributions to this project Cathy Carlson Tim CoetzeeSherri Giger Weyman Johnson Eileen Madray GrahamMcReynolds and Cyndi Zagieboylo In addition the authorsare indebted to Shan Jin (Baltimore VA Medical Center) forhis assistance in compiling the VA data The results andconclusions presented are those of the authors No officialendorsement by the Department of Veterans Affairs ManitobaHealth Saskatchewan Ministry of Health SaskatchewanHealth Quality Council or Pharmetrics Inc is intended orshould be inferred

Study fundingThe study was funded by contracts from the National MSSociety to the principal investigators of the US MultipleSclerosis Prevalence Workgroup (HC-1508-05693)

DisclosureW Culpepper has additional research funding from the Na-tional MS Society and receives support from the VA MSCenter of Excellence R Marrie is supported by the WaughFamily Chair in Multiple Sclerosis and receives researchfunding from Canadian Institutes of Health Research Re-search Manitoba Multiple Sclerosis Society of CanadaMultiple Sclerosis Scientific Foundation National MultipleSclerosis Society Crohnrsquos and Colitis Canada and the Con-sortium of Multiple Sclerosis Centers She also serves on theEditorial Board of Neurology A Langer-Gould was site prin-cipal investigator for 2 industry-sponsored phase 3 clinicaltrials (Biogen Idec Hoffman-LaRoche) and was site principalinvestigator for 1 industry-sponsored observation study(Biogen Idec) She receives grant support from the NIHNational Institute of Neurological Disorders and Stroke(NINDS) Patient-Centered Outcomes Research Instituteand the National MS Society M Wallin has served on datasafety monitoring boards for NINDS-funded trials has beenan associate editor for the Encyclopedia of the Neurologic

Sciences and received funding support from the National MSSociety and the Department of Veterans Affairs Merit ReviewResearch Program L Nelson receives grants from the Cen-ters for Disease Control and Prevention NIH and NationalMS Society and contracts from the Agency for Toxic Sub-stances and Diseases Registry She receives compensation forserving as a consultant to Acumen Inc and is on the DataMonitoring Committee of Neuropace J Campbell has con-sultancy or research grants over the past 5 years with theAgency for Healthcare Research and Quality ALSAMFoundation Amgen AstraZeneca Bayer Biogen IdecBoehringer Ingelheim Centers for Disease Control andPrevention Colorado Medicaid Enterprise CommunityPartners Inc Institute for Clinical and Economic ReviewMallinckrodt NIH National MS Society Kaiser PermanentePhRMA Foundation Teva Research in Real Life Ltd Re-spiratory Effectiveness Group and Zogenix Inc H Tremlettis the Canada Research Chair for Neuroepidemiology andMultiple Sclerosis She receives research support from theNational MS Society the Canadian Institutes of Health Re-search the Multiple Sclerosis Society of Canada and theMultiple Sclerosis Scientific Research Foundation In ad-dition in the last 5 years she has received research supportfrom the Multiple Sclerosis Society of Canada the MichaelSmith Foundation for Health Research and the UK MSTrust as well as travel expenses to attend conferences allspeaker honoraria are either declined or donated to an MScharity or to an unrestricted grant for use by her researchgroup W Kaye receives funding from the Agency for ToxicSubstances and Disease Registry the National MS Societyand the Association for the Accreditation of Human Re-search Protection Programs L Wagner receives fundingfrom the Agency for Toxic Substances and Disease Registryand National MS Society L Chen is employed full-time byKPSC S Leung is employed full-time by the University ofManitoba C Evans receives research funding from theCanadian Institutes of Health Research SaskatchewanHealth Research Foundation and the Multiple SclerosisSociety of Canada S Yao is employed full-time by theHealth Quality Council N LaRocca is employed full-timeby the National MS Society Go to NeurologyorgN forfull disclosures

Publication historyReceived by Neurology July 20 2018 Accepted in final form November24 2018

Appendix 1 Authors

Name Location Role Contribution

William JCulpepperPhD MA

Veterans HealthAdministrationWashington DCUniversity of MD

Author Study concept anddesign statisticalanalyses preparationof draft manuscriptcritical review revisionof the manuscript forcontent approval offinal manuscript

e1026 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 12: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

References1 Evans C Beland SG Kulaga S et al Incidence and prevalence of multiple sclerosis in

the Americas a systematic review Neuroepidemiology 201340195ndash2102 Kingwell E Marriott JJ Jette N et al Incidence and prevalence of multiple sclerosis in

Europe a systematic review BMC Neurol 2013131283 Makhani N Morrow SA Fisk J et al MS incidence and prevalence in Africa Asia Australia

and New Zealand a systematic review Mult Scler Relat Disord 2014348ndash604 Tricco AC Pham B Rawson NSB Manitoba and Saskatchewan administrative health

care utilization databases are used differently to answer epidemiologic researchquestions J Clin Epidemiol 200861192ndash197e112

5 Culpepper WJ Ehrmantraut M Wallin MT Flannery K Bradham DD VeteransHealth Administration Multiple Sclerosis Surveillance Registry the problem of case-finding from administrative databases J Rehabil Res Dev 20064317

6 Marrie RA YuN Blanchard JF Leung S Elliott L The rising prevalence and changingage distribution of multiple sclerosis in Manitoba Neurology 201074465ndash471

7 Nelson MT Wallin MT Marrie RA et al A new way to estimate neurologic diseaseprevalence in the United States illustrated with MS Neurology201992469ndash480

8 Wallin MT Culpepper WJ Campbell J et al The prevalence of multiple sclerosis inthe United States a population-based estimate using health care claims datasetsNeurology201992e1029ndashe1040

9 Polman CH Reingold SC Banwell B et al Diagnostic criteria for multiple sclerosis2010 revisions to the McDonald criteria Ann Neurol 201169292ndash302

10 Wallin MT Culpepper WJ Coffman P et al The Gulf War era multiple sclerosiscohort age and incidence rates by sex race and service Brain 20121351778ndash1785

11 Koebnick C Langer Gould A Gould MK et al Do the sociodemographic charac-teristics of members of a large integrated health care system represent the populationof interest Permanente J 20121637ndash41

12 Widdifield J Ivers NM Young J et al Development and validation of an adminis-trative data algorithm to estimate the disease burden and epidemiology of multiplesclerosis in Ontario Canada Mult Scler 2015211045ndash1054

13 Al-Sakran LH Marrie RA Blackburn DF Knox KB Evans CD Establishing theincidence and prevalence of multiple sclerosis in Saskatchewan Can J Neurol Sci201845295ndash303

14 Marrie RA Fisk J Stadnyk K et al The incidence and prevalence of multiple sclerosisin Nova Scotia Can J Neurol Sci 201340824ndash831

15 Canadian Institute for Health Information Rehabilitation 2016 Available at cihicaentypes-of-carehospital-carerehabilitation Accessed May 5 2016

16 Youden WJ Index for rating diagnostic tests Cancer 1950332ndash35

Appendix 1 (continued)

Name Location Role Contribution

Ruth AnnMarrieMD PhD

University ofManitoba WinnipegCanada

Author Study concept anddesign supervisionof data collectionand analysescritical reviewrevision ofthe manuscriptfor contentapproval of finalmanuscript

AnnetteLanger-Gould MD

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Study concept anddesign statisticalanalyses chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Mitchell TWallin MDMPH

Veterans HealthAdministrationWashington DCGeorgetownUniversity

Author Study concept anddesign chart reviewconfirmation of MSdiagnosis criticalreview revision of themanuscript forcontent approval offinal manuscript

Lorene MNelsonPhD

Stanford UniversityCA

Author Study concept anddesign critical reviewrevision of themanuscript forcontent

JonathanCampbellPhD

University of DenverCO

Author Statistical analysescritical review of themanuscript forcontent

HelenTremlettPhD

University of BritishColumbia VancouverCanada

Author Statistical analysescritical review of themanuscript forcontent

WendyKaye PhD

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

LaurieWagnerMPH

McKing ConsultingCorp Atlanta GA

Author Statistical analysescritical review of themanuscript forcontent

Lie HChen DrPH

Kaiser PermanenteSouthern CaliforniaLos Angeles

Author Statistical analysescritical review of themanuscript forcontent

StellaLeung MSc

University ofManitoba WinnipegCanada

Author Statistical analysescritical review of themanuscript forcontent

CharityEvans PhD

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

ShenzhenYao MDMSc

University ofSaskatchewanSaskatoon Canada

Author Statistical analysescritical review of themanuscript forcontent

Appendix 1 (continued)

Name Location Role Contribution

Nicholas GLaRoccaPhD

National MultipleSclerosis Society NewYork NY

Author Statistical analysescritical review of themanuscript forcontent approval offinal manuscript

Appendix 2 Coinvestigators and members of the USMultiple Sclerosis Prevalence Workgroup

Name Location Role Contribution

Albert LoMD PhD

Brown UniversityProvidence RI

Coinvestigator Studyconcept anddesign

RobertMcBurneyPhD

Accelerated CureProject Boston MA

Coinvestigator Studyconcept anddesign

OlegMuravovPhD

Agency for ToxicSubstances andDisease RegistryDivision of HealthStudies Atlanta GA

Coinvestigator Studyconcept anddesign

BariTalenteEsq

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

LeslieRitter

National MS SocietyWashington DC

Coinvestigator Studyconcept anddesign

NeurologyorgN Neurology | Volume 92 Number 10 | March 5 2019 e1027

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 13: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

17 Landis JR Koch GG The measurement of observer agreement for categorical dataBiometrics 197733159ndash174

18 Sim J Wright CC The kappa statistic in reliability studies use interpretation andsample size requirements Phys Ther 200585257ndash268

19 Viera AJ Garrett JM Understanding interobserver agreement the kappa statisticFam Med 200537360ndash363

20 Trisolini M Honeycutt A Wiener J Lesesne S Global Economic Impact of MultipleSclerosis Research Triangle Park RTI International 2010

21 Ng R Bernatsky S Rahme E Observation period effects on estimation of systemiclupus erythematosus incidence and prevalence in Quebec J Rheumatol 2013401334ndash1336

22 Capocaccia R Colonna M Corazziari I et al Measuring cancer prevalence in Europethe EUROPREVAL Project Ann Oncol 200213831ndash839

23 Griffiths RI OrsquoMalley CD Herbert RJ Danese MD Misclassification of incidentconditions using claims data impact of varying the period used to exclude pre-existingdisease BMC Med Res Methodol 20131332

24 Dilokthornsakul P Valuck RJ Nair KV Corboy JR Allen RR Campbell JD Multiplesclerosis prevalence in the United States commercially insured population Neurology2016861014ndash1021

25 Tiwari Li Y Zou Z Interval estimation for ratios of correlated age-adjusted ratesJ Data Sci 20108471ndash482

26 Culpepper WJ Wallin MT Magder LS et al VHA Multiple Sclerosis SurveillanceRegistry and its similarities to other multiple sclerosis cohorts J Rehab Res Dev 201552(3)263ndash272

27 Tonelli M Wiebe N Fortin M et al Methods for identifying 30 chronic con-ditions application to administrative data BMC Med Inform Decis Mak 20151531

28 Kaye WE Sanchez M Wu J Feasibility of creating a national registry using admin-istrative data in the United States Amytroph Lateral Scler Frontotemporal Degener201415433ndash439

29 Szumski NR Cheng EM Optimizing algorithms to identify Parkinsonrsquos disease caseswithin an administrative database Mov Disord 20092451ndash56

30 Hoer A Schiffhorst G Zimmermann A et al Multiple sclerosis in Germany dataanalysis of administrative prevalence and healthcare delivery in the statutory healthsystem BMC Health Serv Res 200414381

31 Chastek BJ Oleen-Burkey M Lopez-Bresnahan MV Medical chart validation of analgorithm for identifying multiple sclerosis relapse in healthcare claims J Med Econ201013618ndash625

e1028 Neurology | Volume 92 Number 10 | March 5 2019 NeurologyorgN

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 14: Validation of an algorithm for identifying MS cases in ... · 3/5/2019  · with ICD-9-CM or the Canadian version of ICD-10 codes (ICD-10-CA), depending on the year. ... the MS Clinic

DOI 101212WNL0000000000007043201992e1016-e1028 Published Online before print February 15 2019Neurology

William J Culpepper Ruth Ann Marrie Annette Langer-Gould et al datasets

Validation of an algorithm for identifying MS cases in administrative health claims

This information is current as of February 15 2019

ServicesUpdated Information amp

httpnneurologyorgcontent9210e1016fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9210e1016fullref-list-1

This article cites 29 articles 6 of which you can access for free at

Citations httpnneurologyorgcontent9210e1016fullotherarticles

This article has been cited by 2 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionprevalence_studiesPrevalence studies

httpnneurologyorgcgicollectionmultiple_sclerosisMultiple sclerosis

httpnneurologyorgcgicollectiondiagnostic_test_assessment_Diagnostic test assessment

httpnneurologyorgcgicollectioncohort_studiesCohort studies

httpnneurologyorgcgicollectionall_health_services_researchAll Health Services Researchfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2019 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology