assessing data quality for healthcare systems data … data quality for healthcare systems data used...

AssessingDataQualityforHealthcareSystemsDataUsedinClinicalResearch

TableofContents

Objective..................................................................................................................................................2TheNIHHealthCareSystemsResearchCollaboratory...........................................................3DataQualityAssessmentBackground...........................................................................................4DataQualityAssessmentDimensions...........................................................................................5

Completeness....................................................................................................................................................................5Accuracy..............................................................................................................................................................................6Consistency.....................................................................................................................................................................11

DataQualityAssessmentRecommendationsforCollaboratoryProjects......................12Recommendation1-Keydataqualitydimensions.......................................................................................12Recommendation2-Descriptionofformalofassessments.....................................................................12Recommendation3–Reportingdataqualityassessmentwithresearchresults............................13Useofworkflowanddataflowdiagramstoinformdataqualityassessment...................................13

ConcludingRemarks.........................................................................................................................14References..............................................................................................................................................14

AppendixI............................................................................................................................................17Definingdataquality...................................................................................................................................17Definingthequalityofresearchdata.....................................................................................................17Dataquality–relatedreviewcriteria......................................................................................................18Criterion1:Aredatacollectionmethodsadequatelyvalidated?............................................................19Criterion2:Validatedmethodsfortheelectronichealthrecordinformation?................................19Criterion3:Demonstratedqualityassuranceandharmonizationofdataelementsacrosshealthcaresystems/sites?........................................................................................................................................19Criterion4:AreplansadequatefordataqualitycontrolduringtheUH3phase?...........................20

References.......................................................................................................................................................20AppendixII:DataQualityAssessmentPlanInventory.........................................................22AppendixIII:InitialDataQualityAssessmentRecommendationsforCollaboratoryProjects.................................................................................................................................................25

TestingtherecommendationswiththeSTOPCRCproject........................................................................25SummaryoffindingsfromtestingwiththeSTOPCRCproject................................................................26References.......................................................................................................................................................................26

Assessing Data Quality

Prepared by: Meredith N. Zozus, PhD; W. Ed Hammond, PhD; Beverly B. Green, MD, MPH; Michael G. Kahn, MD, PhD; Rachel L. Richesson, PhD, MPH; Shelley A. Rusincovitch; Gregory E. Simon, MD, MPH; Michelle M. Smerek. Reviewed by: NIH Version: 1.0, last updated July 28, 2014 Collaboratory Phenotypes, Data Standards, and Data Quality Core

ObjectiveQualityassessmentofhealthcaredatausedinclinicalresearchisadevelopingareaofinquiry.Themethodsusedtoassesshealthcaredataqualityinpracticearevaried,andevidence-basedorconsensus“bestpractices”haveyettoemerge.1Further,healthcaredatahavelongbeencriticizedforaplethoraofqualityproblems.Toestablishcredibility,studiesthatusehealthcaredataareincreasinglyexpectedtodemonstratethatthequalityofthedataisadequatetosupportresearchconclusions.

Pragmaticclinicaltrials(PCTs)inhealthcaresettingsrelyupondatageneratedduringroutinepatientcaretosupporttheidentificationofindividualresearchsubjectsorcohortsaswellasoutcomes.Knowingwhetherdataareaccuratedependsonsomecomparison,e.g.,comparisontoasourceof“truth”ortoanindependentsourceofdata.Estimatinganerrorordiscrepancyrate,ofcourse,requiresarepresentativesampleforthecomparison.Assessingvariabilityintheerrorordiscrepancyratesbetweenmultipleclinicalresearchsiteslikewiserequiresasufficientsamplefromeachsite.Incaseswherethedatausedforthecomparisonareavailableelectronically,thecostofdataqualityassessmentislargelybasedontimerequiredforprogrammingandstatisticalanalysis.However,whenlabor-intensivemethodssuchasmanualreviewofpatientchartsareused,thecostisconsiderablyhigher.ThecostofrigorousdataqualityassessmentmayinsomecasespresentabarriertoconductingPCTs.Forthisreason,weseektohighlighttheneedformorecost-effectivemethodsforassessingdataquality.

Becauseofthepotentialcostimplicationsandthefearoftakingthe“pragmatic”outofPCTs,wefinditdifficulttomaketheserecommendations.However,theprinciplesunderlyingrecommendationsforapplyingdataqualityassessmenttoresearchthatuseshealthcaredataareirrefutable.Thecredibilityandreproducibilityofresearchdependsontheinvestigator’sdemonstrationthatthedataonwhichconclusionsarebasedareofsufficientqualitytosupportthem.Thus,theobjectiveofthisdocumentistoprovideguidance,basedonthebestavailableevidenceandpractice,forassessingdataqualityinPCTsconductedthroughtheNationalInstitutesofHealth(NIH)HealthCareSystemsResearchCollaboratory.

PRAGMATICCLINICALTRIAL(PCT):WeusethedefinitionarticulatedbytheClinicalandTranslationalScienceAwardspragmaticclinicaltrialsinfrastructure(PCTi)workshop:“Aprospectivecomparisonofacommunity,clinical,orsystem-levelinterventionandarelevantcomparatorinparticipantswhoaresimilartothoseaffectedbythecondition(s)understudyandinsettingsthataresimilartothoseinwhichtheconditionistypicallytreated.”2

TheNIHHealthCareSystemsResearchCollaboratoryTheNIHHealthCareSystemsResearchCollaboratory(http://www.nihcollaboratory.org)or“Collaboratory”isintendedtoimprovethewayclinicaltrialsareconductedbycreatingnewapproaches,infrastructure,andmethodsforcollaborativeresearch.Todevelopanddemonstratethesemethods,theCollaboratoryalsosupportsthedesignandrapidexecutionofhigh-impactPCTDemonstrationProjectsthat1)addressquestionsofmajorpublichealthimportanceand2)engagehealthcaredeliverysystemsinresearchpartnership.Organizationally,theCollaboratorycomprisesaseriesoftheseDemonstrationProjectsfundedfor1planningyear,withcompetitiverenewaltoallowtransitionintoactualtrialconduct,andaCoordinatingCentertoprovidesupportfortheseefforts.WithintheCoordinatingCenter,sevenWorkingGroups/Coresservetoidentify,develop,andpromotesolutionsforissuescentraltoconductingPCTs:1)electronichealthrecorduseinresearch;2)phenotypes,datastandards,anddataquality;3)patient-reportedoutcomes;4)healthcaresysteminteractions;5)regulatoryandethicalissues;6)biostatisticsandstudydesign;and7)stakeholderengagement.TheCoreshavethebidirectionalobjectivesofpromotingtheexchangeofinformationonmethodsandapproachesamongDemonstrationProjectsandtheCoordinatingCenter,aswellassynthesizinganddisseminatingbestpracticesderivedfromDemonstrationProjectexperiencestothelargerresearchcommunity.SupportedbytheNIHCommonFund,theCollaboratory’sultimategoalistoensurethathealthcareprovidersandpatientscanmakedecisionsbasedonthebestavailableclinicalevidence.

TheCollaboratoryprovidesanopportunitytoobservedataqualityassessmentplansandpracticesforPCTsconductedinhealthcaresettings.TheCollaboratory’sPhenotypes,DataStandards,andDataQuality(PDSDQ)Core3includesrepresentativesfromtheCollaboratoryCoordinatingCenterandDemonstrationProjects,researcherswithrelatedinterests,andNIHstaff.InkeepingwiththebidirectionalgoalsofthePDSDQCore,anactionresearchparadigmwasusedinwhichtheCoreinteractedwithDemonstrationProjects,observeddataqualityassessmentplansandpractices,participatedwhereinvited,andsynthesizedexperiencetogeneralizeinformationforothersembarkingonsimilarresearch.Wereportheretheobservationsanditerativelydeveloped(andstill-evolving)dataqualityassessmentmethodologyfromtheinitialplanninggrantyearfortheCollaboratory’sfirstsevenDemonstrationProjects.TheseresultshavebeenvettedbythePDSDQCoreandotherCollaboratoryparticipantsandrepresenttheexperienceofthisgroupatthetimeofdevelopment;however,theydonotrepresentofficialNIHopinionsorpositions.

DataQualityAssessmentBackgroundDependingonthescientificquestionposedbyagivenstudy,PCTsmayrelyondatageneratedduringroutinecareorondatacollectedprospectivelyforthestudy.Therefore,dataqualityassuranceandassessmentforsuchstudiesnecessarilyincludesmethodsforbothsituations:1)collectionofdataspecificallyforastudy,wheretheresearcherisabletoinfluenceorcontroltheoriginaldatacollectionand2)useofdatageneratedinroutinecare,wheretheresearcherhaslittleornocontroloverthedatacollection.Fortheformer,significantguidanceisavailableviatheGoodClinicalDataManagementPractices(GCDMP)document,4andwedonotfurtherdiscussqualityassuranceandassessmentmethodsforthesetypesofprospectiveresearchdata.Instead,thisguidancewillfocusontheuseorre-useofdatageneratedfromroutinepatientcare,basedonthefollowing:

1. Existingliteratureondataqualityassessmentforhealthcaredatathatarere-usedforresearch

2. ExperienceduringthefirstyearoftheCollaboratoryInthisdocument,werelyonamultidimensionaldefinitionofdataquality.Thedimensionsofaccuracyandcompletenessarethemostcommonlyassessedinhealth-relatedresearch.5Arecentreviewidentifiedfivedimensionsthathavebeenassessedinelectronichealthrecord(EHR)datausedforresearch;theyincludecompleteness,correctness,concordance,plausibility,andcurrency.6Accuracy,completeness,andconsistency(Table1)mostcloselyaffectthecapacityofdatatosupportresearchconclusionsandarethereforethefocusofourdiscussionhere.AbriefreviewoftheliteratureondefiningdataqualityisprovidedinAppendixI,andspecificdimensionsusedherearedefinedbelow.Unfortunately,definitionsofdataqualitydimensionsarehighlyvariableintheliterature.Thesectionsbelowoutlineconceptualdefinitionsofthesedimensionsfollowedbyoperationalexamples.Table1.DataQualityDimensionsDeterminingFitnessforUseofResearchDataDimension Conceptualdefinition OperationalexamplesCompleteness Presenceofthenecessarydata Presenceofnecessarydataelements,percent

ofmissingvaluesforadataelement,percentofrecordswithsufficientdatatocalculatearequiredvariable(e.g.,anoutcome)

Accuracy Closenessofagreementbetweenadatavalueandthetruevalue*

Percentofdatavaluesfoundtobeinerrorbasedonagoldstandard,percentofphysicallyimplausiblevalues,percentofdatavaluesthatdonotconformtorangeexpectations

Consistency Relevantuniformityindataacrossclinicalinvestigationsites,facilities,departments,unitswithinafacility,providers,orotherassessors

Comparableproportionsofrelevantdiagnosesacrosssites,comparableproportionsofdocumentedorderfulfillment(e.g.,returnedprocedurereportforordereddiagnostictests)

*ConsistentwiththeInternationalOrganizationforStandardization(ISO)8000Part2definitionofaccuracy,7replaced“propertyvalue”intheISO8000definitionwith“datavalue”forconsistencywiththelanguageusedinclinicalresearch.

BasedontheliteraturerelevanttodataqualityassessmentinthesecondaryuseofEHRdataandourexperiencethusfarwiththeCollaboratory(describedinAppendicesIIandIII),weofferasetofdataqualityassessmentrecommendationsforCollaboratoryprojects.First,wesummarizeimportantdimensions,commonorreportedapproachestocharacterizingthem,andcharacteristicsofanidealoperationalization.Next,westreamlinespecificrecommendationsforresearchersusingdatageneratedfromroutinecare.

DataQualityAssessmentDimensionsCompletenessConceptually,completenessisthepresenceofnecessarydata.TheoperationalizationofcompletenesspresentedbelowwasadaptedfromrecenttheoreticalworkbyWeiskopfetal.,8 inwhichacomprehensiveassessmentofcompletenesscoversfourmutuallyexclusiveareas:

1. Dataelementcompleteness:Thepresenceofallnecessaryvariablesinacandidatedataset;i.e.,“Aretheright‘columns’present?”Dataelementcompletenessisassessedbyexaminingmetadata,suchasadatadictionaryorlistofdataelementscontainedinadatasetandtheiraccompanyingdefinitions,andcomparingthisinformationagainstthevariablesrequiredintheanalyticorstatisticalplan.Withadequatedatadocumentation,dataelementcompletenesscanbeassessedwithoutexamininganydatavalues.

2. “Column”datavaluecompleteness:Thepercentageofdatavaluespresentforeachdataelement.Note,however,thatoften(asinnormalizedstructures)morethanonedataelementmaybestoredinadatabasecolumn.Thewordcolumnisusedtohelpthereadervisualizetheconceptandbecausenormalizeddatastructuresareoftenflattenedtoa1-column-per-data-elementformattogenerateandreportdataquality–relatedstatistics.Columndatavaluecompletenessisassessedbystructuringthedatasetina“1-column-per-data-element”formatandcalculatingthepercentageofnon-missingdataforeachcolumn,withnon-missingdefinedas“notnullandnototherwisecodedtoanullflavor.”Nullflavors(e.g.,notapplicable,notdone)aredefinedintheInternationalOrganizationforStandardization(ISO)210909andHealthLevelSevenInternational(HL7)10datatypedefinitionstandards.

3. Ascertainmentcompleteness:Thepercentageofeligiblecasespresent;i.e.,“Doyouhavetheright‘rows’inthedataset?”Ascertainmentusuallycannotbeverifiedwithabsolutecertainty.Assessmentoptionsaretypicallycomparisonbasedandincludebutarenotlimitedto:1)chartreviewinarepresentativesampleand2)comparisontooneormoreindependentdatasourcescoveringthesamepopulationorasubsetofthatpopulation.Ascertainmentcompletenessisaffectedbydataqualityproblems,byphenotypedefinitionandexecution,andbyfactorsthatbiasmembershipofadataset.Otherissuescommonlyevaluatedinanascertainment

assessmentincludethepresenceandextentofduplicaterecordsandrecordsforpatientsthatdonotexist(forexample:anerrorinthemedicalrecordnumbercreatesanewcase;apatientgivesanameotherthanhisorherown),orduplicateeventssuchasasingleprocedurebeingdocumentedmorethanonce.Ascertainmentcompletenessandphenotypevalidationsignificantlyoverlapingoalsandcanbeaccomplishedtogether.

4. “Row”datavaluecompleteness:Thepercentageofcases/patientswithsufficientdatavaluespresentforagivendatause.Rowdatavaluepresenceisassessedusingstudy-specificalgorithmsprogrammedtocalculatethepercentageofcaseswithalldataorwithstudy-relevantcombinationsofmissingandnon-missingdata(e.g.,inthecaseofbodymassindex[BMI],thepercentmissingof“eitherweightORheight”mightbecalculated,becausemissingeitherdatapointrendersthecaseunusableforcalculatingBMI).

Acomprehensivecompletenessassessmentconsistsofallfourcomponents.Intermsofeffort,columncompletenessisaccomplishedthroughareviewofdataelementsavailableinadatasource,andcolumndatavaluecompletenessandrowdatavaluecompletenessarestraightforwardcomputationalactivities.Ascertainmentcompleteness,however,canbearesource-intensivetask(e.g.,chartreviewonarepresentativesample;electroniccomparisonamongseveraldatasources).AdditionalguidanceanddiscussionregardingdatacompletenessinthesettingofEHRdataextractedforpragmaticclinicalresearchisavailablehere.11

Completeness,althoughnecessarytoestablishfitnessforuseinclinicalresearch,isnotsufficienttoevaluatethecompetenceofadatasettosupportresearchconclusions.Assessmentofaccuracyandconsistencyarealsonecessary.

AccuracyInkeepingwithISO8000standards,7wedefinedataaccuracyasthepropertyexhibitedbyadatavaluewhenitreflectsthetruestateoftheworldatthestatedorimpliedpointofassessment.Itfollowsthataninaccurateorerrantdatumdoesnotreflectthetruestateoftheworldatthestatedorimpliedpointofassessment.12Dataerrorsareinstancesofinaccuracy.

Detectionofdataerrorsisaccomplishedthroughcomparison;forexample,comparisonofadatasettosomeothersourceofinformation(Figure1).Thecomparisonmaybebetweenthedatavalueanda“sourceoftruth,”aknownstandard,asetofvalidvalues,aredundantmeasurement,independentlycollecteddataforthesameconcept,anupstreamdatasource,avalidatedindicatorofpossibleerrors,oraggregatestatistics.Asthesourceforcomparisonmovesfartherfroma“sourceoftruth,”wemovefromidentificationofdataerrorstoindicationsthatadatummaybeinerror.

DATAACCURACY:Theclosenessofagreementbetweenadatavalueandthetruevalue.

—adaptedfromISO80006

Weusethetermerrortodenoteanydeviationfromaccuracyregardlessofthecause.Forexample,aprogrammingproblemindatatransformationthatrendersanoriginallyaccuratevalueincorrectisconsideredtohavecausedadataerror.Becausedataaresubjecttomultipleprocessingsteps,somecountthenumberoferrors(consider,forinstance,adatavaluethathassustainedtwoproblemsthateachwouldhaveindividuallycausedanerrorforatotaloftwoerrors).Fromanoutcomesperspective,itisthenumberoffieldsinerrorthatmattersratherthanthenumberoferrors;thus,indataqualityassessment,thenumberofdatavaluesinerroriscountedratherthanthenumberoferrors.Differentagreementstatisticsmaybeapplieddependingonwhetherthesourceofcomparisonisconsideredasourceoftruthorgoldstandardversusanindependentsourceofinformation.

Operationally,aninstanceofinaccuracyordataerrorisanydiscrepancyidentifiedthroughsuchacomparisonthatcannotbeexplainedbydocumentation.12,13Thecaveat“notexplainedbydocumentation”isimportantbecauseeffortstoidentifydatadiscrepancies(i.e.,potentialerrors)canbeundertakenondataatdifferentstagesofprocessing.Suchprocessingsometimesincludestransformationsonthedatasuchasimputationsthatpurposefullychangethevalue.Inthesecases,adataconsumershouldexpectthechangestobesupportedbydocumentationandbetraceablethroughallofthedataprocessingsteps.

Accuracyhasbeendescribedintermsoftwobasicconcepts:1)representationaladequacy/inadequacy,definedastheextenttowhichanoperationalizationisconsistentwith/differsfromthedesiredconcept(validity),includingbutnotlimitedtoimprecisionorsemanticvariability,hamperinginterpretationofdataand2)informationlossanddegradation,includingbutnotlimitedtoreliability,changeovertime,anderror.14

Representationalinadequacyisthedegreetowhichadataelementdiffersfromthedesiredconcept.Forexample,aresearcherseekingobesepatientsforastudyusesBMItodefinetheobesityphenotype,knowingthatasmallpercentageofbulkybutleanbodybuildersmaybeincluded.Representationalinadequacyisbestaddressedatthepointinresearchdesignwhendataelementsandsourcesareselected.

Representationalinadequacycanbeaffectedbylocalworkanddataflowsofdataelementsusedinastudy,e.g.,differencesinlocalcodingpracticescausingdifferencesindatasetsacrossinstitutions.Thus,harmonizationofdataelementsacrosssitesisemphasizedinNIHreviewcriteriaforCollaboratoryPCTs(AppendixI).Documentingworkanddataflowsforeachdataelement,fromthepointoforigintotheanalysisdataset(traceability),haslongbeenrequiredinregulatedresearch,4reportedasabestpracticeintheinformationqualityliterature,andimplementedinhealthcaresettings.15Comparisonsofdatadefinitions,

REPRESENTATIONALINADEQUACY:Thedegreetowhichadataelementdiffersfromthedesiredconcept.

INACCURACY/DATAERROR:Anydiscrepancythatcannotbeexplainedbydocumentation.

workflows,anddataflowsacrossresearchsitesareasimportantinassessingrepresentationalinadequacyofhealthcaredataasistheuseofvalidatedquestionnairesinassessingsubjectiveconcepts.Somedifferencesinworkflowwillnotaffectrepresentation,whileothersmay;theonlywaytoknowistounderstandtheworkflowateachsiteandevaluatetheeffect,ifany,ofrepresentation.Suchdocumentationfordatacollectedinhealthcaresettingsmaynotbeaspreciseasthatforclinicaltrialdatacollectionprocesses.Forexample,itcanbedifficulttoassessthedatacaptureprocessofpatient-reportedoutcomes(PROs)inhealthcaresettingsduetodifferencesinindividualdepartments,clinics,andhospitalswithinanindividualhealthcareorganization.Theworkflowcanalsovaryovertimeasrefinementsaremade.

Resultsofsuchassessmentsforrepresentationalinadequacyareoftenqualitativeandusedeitherasformativeassessmentsinresearchdesignortodescribelimitationsinreportedresults.Comparisonsofaggregateordistributionalstatistics(e.g.,marginal)asperformedbytheObservationalMedicalOutcomesProject(OMOP),16havealsobeenusedtoidentifyrepresentationalvariationsindatasetscausedbydifferencesinlocalpracticeamongtheinstitutionsprovidingdata.16Usingbothprocess-orientedanddata-basedapproachesinconcerttoconfirmrepresentationaladequacyofdataelementsisrecommended.Aprocess-orientedapproachmaybeusedatthetimeofsiteselection;onceconsistencyisconfirmed,adata-basedapproachmaybeusedtomonitorconsistencyduringthestudy.

Informationlossordegradationisthelossofinformationcontentovertimeandcanarisefromerrorsorpurposefuldecisionsindatacollectionandprocessing(forexample:datareductionsuchasintervaldatacollectedasordinaldata;separationofdatavaluesfromcontextualdataelements;ordatavaluesthatloseaccuracyorrelevanceovertime).Informationlossanddegradationmaybepreventedormitigatedbydecisionsmadeduringresearchdesign.Becausesucherrorsandomissionsaresensitivetomanyorganizationalfactors(e.g.,localclinicaldocumentationpractices,mappingdecisionsmadeforwarehouseddata),theyshouldbeassessedforanydatasource.Thus,workflowanddataflowdocumentationalsohelptoassesssourcesofinformationlossanddegredation.15

Assessingdataaccuracy,primarilywithregardtoinformationlossanddegradation,involvescomparisons,eitherofindividualvalues(asiscommonlydoneinclinicaltrials14andregistries5)orofaggregateordistributionalstatistics.14,16-18

1. Individualvaluecomparisons:Attheindividual-valuelevel,thecomparisoncouldbetothetruth(ifknown),toanindependentmeasurement,toavalidatedindicator,ortovalid(expected,physicallyplausible,orlogicallyconsistent)values.14,17,19Inpractice,theoptionsforcomparison(Figure1)representacontinuumfromtruthtomeasurementsoflesserproximitytothetruth,suchasanacceptedgoldstandardorvalidvalues.Thus,accuracyassessmentusuallyprovidesadisagreementrate,andmuchlessoften,anactualerrorrate.Further,insomeprospectivesettings,4theidentificationofdatadiscrepanciesisdoneforthepurposeofresolvingthem;inothersettings,wheredatacorrectionisnotpossible,datadiscrepanciesare

identifiedforthepurposeofreportingadatadiscrepancyrateortoinformstatisticalanalysis.

2. Aggregateanddistributionalcomparisons:Aggregateanddistributionalcomparisons(suchasfrequencycountsormeasuresofcentraltendencyanddispersion)canbeusedasasurrogateaccuracyassessments.Forexample,differencesinaggregateordistributionalmeasuresbetweenaresearchdatasetandanindependentdatasourcewithasimilarpopulationmayindicatepossibledatadiscrepancies,whilesimilarmeasureswouldincreaseconfidenceintheresearchdata.Differencesincentraltendencyanddispersionmeasuresinageorasocioeconomicstatusmeasuremayindicatesignificantdifferencesinthepopulationsintwodatasets.Aggregateanddistributionalcomparisonscanbealsobeperformedwithinadataset,16-18betweenmultiplesitesinamulticenterstudy,17,18orbetweensubgroupsasmeasuresofconsistency.

Intheabsenceofasourceoftruth,comprehensiveaccuracyassessmentofmultisitestudiesincludesuseofindividualvalue,aggregate,anddistributionalmeasures.17Toemphasizetheimportanceofthesewithinandbetweendatasetcomparisons,athirddimension,consistency(describedbelow),wasadded.Thedifferencebetweenthetwodimensionshereliesnotinthemeasures,butinthepurposeofthecomparisonsandinthechoiceofdataonwhichtorunthem.

Anaccuracyassessmentrequiresselectingasourceforcomparison,makingthecomparison,andthenquantifyingtheresults.InFigure1,sourcesforcomparisonarelistedindescendingorderoftheirproximitytotruth.Iftherearemultipleoptions,thosesourcesforcomparisontowardthetopofthelistinFigure1arepreferredbecausethesourcesforcomparisonareclosertothetruth.Thus,sourcesforcomparisontowardthetopprovidequantitativeassessmentsofaccuracy,whereassourcesforcomparisoninthemiddleprovidepartialmeasuresofaccuracyand,dependingonthedatasourceusedforthecomparison,mayenableidentificationoferrorsormayonlyindicatediscrepancies.Sourcesforcomparisontowardthebottomidentifyonlydatadiscrepancies,i.e.,itemsthatmayormaynotrepresentanactualerror.Forexample,ifithasbeenshownthatapercentageofmissingvaluesisinverselycorrelatedwithdataaccuracy,thenpercentmissingmaybeanindicatorofloweraccuracy.

ThehierarchyofsourcesforcomparisonshowninFigure1providesalistofpossiblecomparisonsranging(frombottomtotop)fromthosethatareachievableineverysituationbutprovidelessinformationabouttruedataaccuracy,totheidealbutrarelyachievablecasethatprovidesanactualdataerrorrate.Thishierarchysimplifiestheselectionofsourcesforcomparison:wheremorethanonesourceforcomparisonexists,thehighestpracticalcomparisoninthelistshouldbeused.

Comparisontoasourceof“truth” Accuracy

Comparisontoanindependentmeasurement

Comparisontoindependentlymanageddata Partialaccuracy

Comparisontoanupstreamdatasource

Comparisontoaknownstandard Discrepancydetection

Comparisontovalidvalues

Comparisontovalidatedindicators Gestalt

Comparisontoaggregatestatistics

Figure1.DataAccuracyAssessmentComparisonHierarchy.Comparisonofdatatosourceslistedabovethetoplineprovidesfullassessmentofdataaccuracy;sourceslistedbelowthetoplineprovideonlypartialassessmentsofaccuracy.Sourcesabovethebottomlinecanbeusedtodetectactualdatadiscrepancies,whereassourcesbelowthebottomlinecanonlyindicatethatdiscrepanciesmayexist.Itemsatthetopofthelistidentifyactualerrors,whereasitemsinthemiddleonlyidentifydiscrepanciesthatmayormaynotinfactbeanerror.Itemstowardthebottommerelyindicatethatdiscrepanciesmayexist.

Thestrengthoftheaccuracyassessmentdependsnotonlyontheproximitytotruthofthesourceforcomparison,butalsoontheimportanceandnumberofdataelementsforwhichaccuracycanbeassessed.Accuracyassessmentsareoftenperformedonsubsetsofdataelementsorsubsetsofthepopulation,ratherthanacrossthewholedataset.Commonsubsetsassessedincludedataelementsusedinsubjectorcohortidentification,dataelementsusedtoderiveclinicaloutcomes,andpatientsforwhomanindependentsourceofdata(suchasregistryorMedicareclaimsdata)isreadilyavailableforcomparison.Accuracyassessmentsshouldbedoneforcohortidentificationdataelements,outcomedataelements,andcovariates.Accuracyassessmentsforagivenstudymayusedifferentsourcesforcomparison.

Comparisonsfordataaccuracyassessmentswilllikelydifferbasedontheunderlyingnatureofthephenomenaaboutwhichthedatavalueswerecollected.Examplesofdifferentphenomenaincludeanatomicorpathologicphenomena,physiologicorfunctionalphenomena,imagingorlaboratoryfindings,patients'symptomaticexperiences,andpatients’behaviorsorfunctioning.Thedatavaluescollectedaboutthesephenomenamaybetheresultofinherentlydifferentprocesses,includingbutnotlimitedtomeasurementofaphysicalquantity,directobservation,clinicalinterpretationofavailableinformation,askingpatientsdirectly,orpsychometricmeasurements.Thesearenotcompletelists,andwedonotprovideadeterministicmapofphenomenaandmeasurementprocessestoassociatederrorsources.Wesimplynotethatdifferentphenomenaandmeasurementorcollectionprocessesaresometimescharacteristicallypronetodifferentsourcesoferror.

Suchassociationsshouldbeconsideredwhendataelementsandcomparisonsfordataqualityassessmentarechosen.

ConsistencyConsistencyisdefinedhereasrelevantuniformityindataacrossclinicalinvestigationsites,facilities,departments,unitswithinafacility,providers,orotherassessors.Inconsistencies,thereforeareinstancesofdifference.Inotherframeworks,18,20,21thelabelconsistencyisusedforseveraldifferentthings,suchasuniformityofdataovertimeorconformanceofdatavaluestoothervaluesinthedataset(e.g.,gendercorrectnessofgender-specificdiagnosesandprocedures,proceduredatesbeforedischargedate).Here,weviewthesevalidvaluecomparisonsassurrogateindicatorsofaccuracy(Figure1).

Therearemanywaysthatdatacanbeinconsistent;forexample,clinicaldocumentationpoliciesorpracticesmayvaryovertimewithinafacility,betweenfacilities,orbetweenindividualsinafacility.Considerastudywheretheoutcomemeasureiswhetherornotpatientbehaviorregardingmedicationtakingchanges.Ifsomesitesdocumentfilledprescriptionsfrompharmacydatasourceswhileothersrelyonpatientreporting,theoutcomemeasurewouldbeinconsistentbetweenthesites.Actionsshouldbetakentoimprovesimilarityindocumentationortouseotherdocumentationthatiscommonacrossallsites.Otherwise,suchinconsistenciesmayintroducebiasandaffectthecapacityofthedatatosupportstudyconclusions.Thus,theconsistencydimensioncomesintoplayparticularlyinmultisiteormultifacilitystudiesandwhensuchdifferencesmayexistinclinicaldocumentation,datacollection,ordatahandlingwithinastudy.Comparisonsofmultisitedataovertimetoexamineexpectedandunexpectedchangesinaggregateordistributionaldatacanalsobeuseful.Forexample,changesinEHRsystems,suchasnewdatabeingcaptured,datanolongerbeingcaptured,orevenimplementationofanewsystem,arecommonplaceandaffectdata.Assessingconsistencyduringastudy(dataqualitymonitoring)istheonlywaytoensurethatsuchchangeswillbedetected.

Targetedconsistencyassessmentsareimportantduringthefeasibility-assessmentphaseofstudyplanning.Forexample,toascertainwhetherdataaresufficientlyconsistentacrossfacilitiestosupportaproposedstudy,consistencyassessmentsmaybeoperationalizedbyqualitativeassessmentssuchasreviewofclinicaldocumentationpoliciesandprocedures,interviewswithfacilitiescoveringclinicaldocumentationproceduresandpractice,ordirectobservationofworkflow.Initialconsistencycheckscanalsobeestablishedusingaggregateordistributionalstatistics.Oncedatacollectionhasstarted,consistencyshouldbemonitoredovertimeoracrossindividuals,units,orfacilitiesbyaggregateordistributionalstatistics.

OMOP16andMini-Sentinel18bothprovidepublicallyavailableconsistencychecksthatareexecutableagainsttheOMOPandMini-Sentinelcommondatamodels,respectively.

CONSISTENCY:Relevantuniformityindataacrossclinicalinvestigationsites,facilities,departments,unitswithinafacility,providers,orotherassessors.

AlthoughPCTsarelesslikelytoutilizeacommondatamodel,theOMOPandMini-Sentinelprogramsprovideexcellentexamplesofchecksthatcanbeusedtoevaluateconsistencyacrossinvestigationalsites,facilities,departments,clinicalunits,providers,orassessorsinPCTs.

Aswithaccuracyassessments,consistencyassessmentsshouldbeconductedfordataelementsusedinsubjectorcohortidentification,outcomedataelements,andcovariates.

DataQualityAssessmentRecommendationsforCollaboratoryProjectsWehavedefinedcriticalcomponentsofdataqualityassessmentforresearchusingdatageneratedinhealthcaresettingsthatweconsidertobenecessaryindemonstratingthecapacityofdatatosupportresearchconclusions.OurrecommendationsbelowfordataqualityassessmentforCollaboratoryresearchprojectsarebasedonthesekeycomponents:

Recommendation1-KeydataqualitydimensionsWerecommendthataccuracy,completeness,andconsistencybeformallyassessedfordataelementsusedinsubjectidentification,outcomemeasures,andimportantcovariates.

Recommendation2-Descriptionofformalofassessments1. Completenessassessmentrecommendation:Useofafour-partcompleteness

assessment.Thesamecolumnanddatavaluecompletenessmeasurescanbeemployedformonitoringcompletenessthroughouttheproject.Thecompletenessassessmentappliestobothprospectivelycollectedandsecondaryusedata.AdditionalrequirementssuggestedbytheGCDMP,suchason-screenpromptsformissingdatawhereappropriate,applytodatacollectedprospectivelyforastudy.

2. Accuracyassessmentrecommendation:Identificationandconductofproject-specificaccuracyassessmentsforsubject/cohortidentificationdataelements,outcomedataelements,andcovariates.ThehighestpracticalaccuracyassessmentinthehierarchyshowninFigure1shouldbeused.Thesamemeasuresmaybeapplicableformonitoringdataaccuracythroughouttheproject.AdditionalrequirementssuggestedbytheGCDMP,suchason-screenpromptsforinconsistentdatawhereappropriateapplytoprospectivelycollecteddata.

3. Consistencyassessmentrecommendation:Identificationof:a)areaswheredifferencesinclinicaldocumentation,datacollection,ordatahandlingmayexistbetweenindividuals,units,facilities,sites,orassessors,orovertimeandb)measurestoassessconsistencyandmonitoritthroughouttheproject.Asystematicapproachtoidentifyingcandidateconsistencyassessmentsshouldbeused.Suchanapproachwilllikelybebasedonreviewofavailabledatasources,accompaniedbyanapproachforsystematicallyidentifyingandevaluatingthelikelihoodandimpactofpossibleinconsistencies.Thisrecommendationappliestobothprospectivelycollecteddataandsecondaryusedata.

4. Impactassessmentrecommendation:Useofcompleteness,accuracy,andconsistencyassessmentresultsbytheprojectstatisticiantotestsensitivityoftheanalysestoanticipatedoridentifieddataqualityproblems,includingaplanforreassessingbasedonresultsofdataqualitymonitoringthroughouttheproject.

Recommendation3–ReportingdataqualityassessmentwithresearchresultsAsrecommendedelsewhere,resultsofdataqualityassessmentsshouldbereportedwithresearchresults.1,22Dataqualityassessmentsaretheonlywaytodemonstratethatdataqualityissufficienttosupporttheresearchconclusions.Thus,dataqualityassessmentresultsmustbeaccessibletoconsumersofresearch.

UseofworkflowanddataflowdiagramstoinformdataqualityassessmentInourinitialrecommendations(AppendixIII),weencouragedthecreationanduseofdataflowandworkflowdiagramstoaidinidentifyingaccuracyandinconductingconsistencyassessments;however,thisstrategyhasbothadvantagesanddisadvantages.Amongtheadvantagesisthatthediagramsarehelpfulinotheraspectsofoperationalizingaresearchprojectandinmanaginginstitutionalinformationarchitecture.Thus,theymayalreadyexist,andifnot,theywilllikelybeusedforotherpurposes.Understandingworkflowaroundclinicaldocumentationofcohortidentifiers,outcomesdata,andcovariatesisnecessaryforassessingpotentialinconsistenciesbetweensites.

Workflowknowledgeisalsorequiredincaseswheretheclinicalworkflowwillbemodifiedfortheresearch,e.g.,collectingstudy-specificdatawithinclinicalprocessesorusingroutineclinicaldatatotriggerresearchactivities.IntheCollaboratorySTOPCRCDemonstrationProject,documentationofapatient’scolonoscopy“turnsoff”furtherfecaloccultbloodtestscreeninginterventionsforaperiodoftime.Logicdecisionssimilartothesewouldbeclearlydocumentedintheworkflowanddataflowanalysis.Onourtestproject,theprocessofcreatingandreviewingthediagramsprompteddiscussionofpotentialdataqualityissuesaswellasstrategiesforpreventionormitigationofproblems.

Alternatively,ifworkflowdiagramsdonotexistforafacility,creationofthesediagramssolelyforthepurposeofsuchananalysismaynotbefeasible.Considerastudywith30smallparticipatinginvestigationalsitesfromdifferentinstitutions.Creationofworkflowanddataflowdiagramsdenovoforastudywouldconsumesignificantresources.Insuchcaseswheretheeffortassociatedwithcreatingandreviewingsuchdiagramsisnotpractical,weofferthefollowingsetofquestionsthatcouldbereviewedwithpersonnelateachfacility.Thesequestionsweredevelopedbasedonourexperiencewiththetestingoftheinitialrecommendations.

1. Talkthrougheachofthedataelementsusedforcohortidentification.Canyouexplainhowandwhereeachoneisdocumentedintheclinic/ontheunit(i.e.,whatinformationsystem,whatscreen,atwhatpointintheclinicalprocess,andbywhom)?

2. Whenyousendusthedataorconnectdatatoafederatedsystem,whatdatastorewillyoucreate/use?Importantly,pleasedescribealldatatransformationbetweenthesourcesystemandthedatastoreusedforthisresearch.

3. Foreachdataelementusedinthecohortidentification,doyouknowofanydifferenceindatacaptureorclinicaldocumentationpracticesacrossclinicsatyoursiteorfordifferentsubsetsofyourpopulation?

4. Foreachdataelementusedincohortidentification,doyouknowofanysubsetsofdatathatmaybedocumenteddifferently,suchasdatafromspecialistorhospitalreportsexternaltoyourgroupversusdatafromyourpractice,orinternallaboratorydatafromanalyzersonsiteversusthosethatyoureceivefromexternalclinicallaboratories?

Thefourquestionsaboveshouldbeappliedtootherimportantdataelementssuchasoutcomemeasuresandcovariates.

ConcludingRemarksMovingforward,attentiontodataqualitywillbecriticalandincreasinglyexpected,asinthecaseofthedatavalidationreviewcriteriafortheCollaboratory.AlthoughgeneralizedcomputationalapproacheshaveshowngreatpromiseinlargenationalinitiativessuchasMini-SentinelandOMOP,theyarecurrentlydependentontheexistenceofacommondatamodel.However,ashealthcareinstitutionsacrossthecountryembarkupondatagovernanceinitiatives,andasstandarddataelementsbecomearealityforhealthcareandhealth-relatedresearch,moreandbettermachine-readablemetadataarebecomingavailable.Ongoingresearchinthisarenawillworktowardleveragingthisinformationtoincreaseautomationofdataqualityassessmentandcreatemetadata-driven,next-generationapproachestocomputationaldataqualityassessment.FundingThisworkwassupportedbyacooperativeagreement(U54AT007748)fromtheNIHCommonFundfortheNIHHealthCareSystemsResearchCollaboratory.TheviewspresentedherearesolelytheresponsibilityoftheauthorsanddonotnecessarilyrepresenttheofficialviewsoftheNationalInstitutesofHealth.

References1. BrownJS,KahnM,TohS.Dataqualityassessmentforcomparativeeffectiveness

researchindistributeddatanetworks.MedCare2013;51(8suppl3):S22–S29.PMID:23793049.doi:10.1097/MLR.0b013e31829b1e2c.

2. SaltzJ.ReportonPragmaticClinicalTrialsInfrastructureWorkshop.Availableat:https://www.ctsacentral.org/sites/default/files/documents/IKFC%201%204%202013.pdf.AccessedJuly28,2014.

3. Richesson,RL,HammondWE,NahmM,etal.Electronichealthrecordsbasedphenotypinginnext-generationclinicaltrials:aperspectivefromtheNIHHealthCareSystemsCollaboratory.JAmMedInformAssoc2013;20:e226–e231.PMID:23956018.doi:10.1136/amiajnl-2013-001926.

4. SocietyforClinicalDataManagement.GoodClinicalDataManagementPractices(GCDMP).Availableat:http://www.scdm.org/sitecore/content/be-bruga/scdm/Publications/gcdmp.aspx.AccessedJuly2,2014.

5. ArtsDG,DeKeizerNF,SchefferGJ.Definingandimprovingdataqualityinmedicalregistries:aliteraturereview,casestudy,andgenericframework.JAmMedInformAssoc2002;9:600–611.PMID:12386111.doi:10.1197/jamia.M1087.

6. WeiskopfNG,WengC.Methodsanddimensionsofelectronichealthrecorddataqualityassessment:enablingreuseforclinicalresearch.JAmMedInformAssoc2013;20:144–151.PMID:22733976.doi:10.1136/amiajnl-2011-000681.

7. InternationalOrganizationforStandardization.ISO8000-2:2012(E)DataQuality–Part2:Vocabulary.1sted.June15,2012.

8. WeiskopfNG,HripcsakG,SwaminathanS,WengC.Definingandmeasuringcompletenessofelectronichealthrecordsforsecondaryuse.JBiomedInform2013;46:830–836.PMID:23820016.doi:10.1016/j.jbi.2013.06.010.

9. InternationalOrganizationforStandards.ISO21090.HealthInformatics–HarmonizedDataTypesforInformationInterchange.2011.

10. HealthLevelSevenInternational.HL7DataTypeDefinitionStandards.Availableat:http://www.hl7.org/implement/standards/product_section.cfm?section=2&ref=nav.AccessedJuly2,2014.

11. NIHHealthCareSystemsCollaboratoryBiostatisticsandStudyDesignCore.Keyissuesinextractingusabledatafromelectronichealthrecordsforpragmaticclinicaltrials.Version1.0(June26,2014).Availableat:https://www.nihcollaboratory.org/Products/Extracting-EHR-data_V1.0.pdf.AccessedJuly28,2014.

12. NahmM,BonnerJ,ReedPL,HowardK.Determinantsofaccuracyinthecontextofclinicalstudydata.InternationalConferenceonInformationQuality(ICIQ),Paris,France,November2012.Availableat:http://mitiq.mit.edu/ICIQ/2012/.AccessedJuly2,2014.

13. NahmML,PieperCF,CunninghamMM.Quantifyingdataqualityforclinicaltrialsusingelectronicdatacapture.PLoSONE2008;3:e3049.PMID:18725958.doi:10.1371/journal.pone.0003049.

14. TchengJ,NahmM,FendtK.Dataqualityissuesandtheelectronichealthrecord.DrugInformationAssociationGlobalForum2010;2:36–40.

15. DavidsonB,LeeWY,WangR.Developingdataproductionmaps:meetingpatientdischargedatasubmissionrequirements.InternationalJournalofHealthcareTechnologyandManagement2004;6:223–240.

16. ObservationalMedicalOutcomesPartnership.GeneralizedReviewofOSCARUnifiedChecking.2011.Availableat:http://omop.org/GROUCH.AccessedJuly2,2014.

17. KahnMG,RaebelMA,GlanzJM,RiedlingerK,SteinerJF.Apragmaticframeworkforsingle-siteandmultisitedataqualityassessmentinelectronichealthrecord-basedclinicalresearch.MedCare2012;50(suppl):S21–S29.PMID:22692254.doi:10.1097/MLR.0b013e318257dd67.

18. Mini-SentinelOperationsCenter.Mini-SentinelCommonDataModelDataQualityReviewandCharacterizationProcessandPrograms.ProgramPackageversion:3.1.2.September2013.Availableat:http://mini-sentinel.org/data_activities/distributed_db_and_data/details.aspx?ID=131.AccessedJuly2,2014.

19. BrownPJ,WarmingtonV.Dataqualityprobes-exploitingandimprovingthequalityofelectronicpatientrecorddataandpatientcare.IntJMedInform2002;68:91–98.PMID:12467794.doi:10.1016/S1386-5056(02)00068-0.

20. WeiskopfNG,EnablingtheReuseofElectronicHealthRecordDatathroughDataQualityAssessmentandTransparency.DoctoralDissertation,ColumbiaUniversity,June13,2014.

21. Sebastian-ColeL.MeasuringDataQualityforOngoingImprovement:ADataQualityAssessmentFramework.Waltham,MA:MorganKaufmann(Elsevier);2013.

22. KahnMG,BrownJ,ChunA,etal.Aconsensus-baseddataqualityreportingframeworkforobservationalhealthcaredata.SubmittedtoeGEMSJournal,December2013.Draftversionavailableat:http://repository.academyhealth.org/cgi/viewcontent.cgi?article=1001&context=dqc.AccessedJuly2,2014.

AppendixI

DefiningdataqualityTheISO8000seriesofstandardsfocusesondataquality.1Qualityisdefinedasthe“…degreetowhichasetofinherentcharacteristicsfulfillsrequirements.”2Thus,dataqualityisthedegreetowhichasetofinherentcharacteristicsofthedatafulfillsrequirementsforthedata.

Describingdataqualityintermsofcharacteristicsinherenttodatameansthatwesubscribetoamultidimensionalconceptualizationofdataquality.3Briefly,theseinherentcharacteristics,alsocalleddimensionsofdataquality,includeconceptssuchasaccuracy,relevance,accessibility,contemporaneity,timeliness,andcompleteness.Theinitialworkestablishingthemultidimensionalconceptualizationofdataqualityidentifiedover200dimensionsinuseacrosssurveyedorganizationsfromdifferentindustries.4Formostdatauses,onlyahandfulofdimensionsaredeemedimportantenoughtoformallymeasureandassess.Thedimensionsmeasuredindataqualityassessmentshouldbethosenecessarytoindicatefitnessofthedataforaparticularuse.Insummary,dataqualityisassessedbyidentifyingimportantdimensionsandmeasuringthem.

DefiningthequalityofresearchdataTheCollaboratoryhasembracedthedefinitionofqualitydatafromthe1999InstituteofMedicineWorkshopReporttitled,AssuringDataQualityandValidityinClinicalTrialsforRegulatoryDecisionMaking,5inwhichfitnessforuse(i.e.,qualitydata)inclinicalresearchisdefinedas“datathatsufficientlysupportconclusionsandinterpretationsequivalenttothosederivedfromerror-freedata.”5Thejob,then,ofassessingthequalityofresearchdatabeginswithidentifyingthoseaspectsofdatathatbearmostheavilyonthecapacityofthedatatosupportconclusionsdrawnfromtheresearch.

ImmediatelypriortotheApril2013CollaboratorySteeringCommitteemeeting,theprogramofficereleasedthereviewcriteriaforDemonstrationProjectsapplyingforfundsfortrialconduct:

• Criterion1:“Aredatacollectionmethodsadequatelyvalidated?”• Criterion2:“Validatedmethodsfortheelectronichealthrecordinformation?”• Criterion3:“Demonstratedqualityassuranceandharmonizationofdata

elementsacrosshealthcaresystems/sites?”• Criterion4:“PlansadequatefordataqualitycontrolduringtheUH3(trial

conduct)phase?”InkeepingwiththeInstituteofMedicinedefinitionofqualitydata,thegoaloftheserequirementsistoprovidereasonableassurancethatdatausedforCollaboratoryDemonstrationProjectsarecapableofsupportingtheresearchconclusions.

Therequirementswerenotfurtherdefinedatthetimeofrelease.Toaidinoperationalizingdataqualityassessment,thePDSDQCoredrafteddefinitionsforeachcriterion.Thesedraftdefinitions(providedbelow)reflecttheconsensusoftheCoreanddonotnecessarilyrepresenttheopinionsorofficialpositionsoftheNIH.

Briefly,Criterion1pertainstodataprospectivelycollectedforresearchonly(i.e.,inadditiontodatageneratedinroutinecare).Criterion2appliestodatageneratedinroutinecare.Criterion3pertainstoaprioriactivitiestoassureconsistencyindatacollectionandclinicaldocumentationacrossclinicalsites.Criterion4requiresplanstoassessandcontroldataqualitythroughouttrialconduct.Thecriteriacanbedecomposedintodataqualityactivitiesanddatasourcestowhichtheyapply(FigureA1).Thethirdaxisofconsiderationisthedataqualitydimensionsimportantforagivenstudy.

DataSourceè ê Dataqualityactivity Routinecare

Datacollectedsolelyfortheresearch

Validationofdatacollectionmethods ✔

Dataqualityassurance ✔

Harmonizationofdataelements ✔ ✔

Dataqualitycontrol ✔ ✔

FigureA1.GraphicRepresentationofReviewCriterion

Historically,inclinicaltrialsconductedforregulatoryreviewformarketingauthorization,identificationofdatadiscrepancieswasfollowedbyacommunicationbacktothesourceofthedatainanattempttoascertainthecorrectvalue.6Thisprocessofidentifyingandresolvingdatadiscrepanciesisatypeofdatacleaning.Correctionofdatadiscrepanciesisbestappliedtoprospectivetrialswithprospectivelycollecteddata.Asdescribedabove,sometrialsconductedinhealthcaresettingswillcollect“add-ondata”(i.e.,datanecessaryfortheresearchthatarenotcapturedinroutinecare).

OurinitialDemonstrationProjectdataqualityassessmentinventory(dataavailableuponrequest)confirmedthatmultipleDemonstrationProjectsarecollectingprospectivedata.FiveDemonstrationProjectsplannedtocollectPROsandoneaddedscreensinthelocalEHRtocapturestudy-specificdata.Allprojectsalsousedroutinecaredataandadministrativedata.DetailsoftheDemonstrationProjectdataqualityassessmentinventoryareprovidedinAppendixII.

Dataquality–relatedreviewcriteriaThefollowingfourUH3reviewcriteria(February12,2013UH3TransitionCriteriaDraft)wereprovidedbytheNationalCenterforComplementaryandAlternativeMedicine.ThePDSDQCorehasdefinedthecriteriaasoutlinedbelow.

Criterion1:Aredatacollectionmethodsadequatelyvalidated?Scope:Thiscriterionappliestodatacollectedprospectivelyfortheproject(i.e.,collectedoutsideofroutineclinicaldocumentation).

Purpose:Thepurposeofthiscriterionistoprovideassurancethatproject-specificdatacollectiontools,systems,andprocessesproducedatathatcansupporttheintendedanalysisandultimatelytheresearchconclusions.

Datacollectionmethods:Theprocessesusedtomeasure,observe,orotherwiseobtainanddocumentstudyassessments.

Adequate:Evidencethattheerrorratehasbeencharacterizedandwillnotlikelyimpacttheintendedanalysisandultimatelytheconclusions.

Validated:Showntoconsistentlyrepresentandrecordtheintendedconcept.Forquestionnairesandratingscales,thisreferstoevidencethatthetoolmeasurestheintendedconceptintheintendedpopulation.Withrespecttomeasurementofphysicalquantitiesorobservationofphenomena,thisreferstotheabilityofthemeasurementorobservationtoconsistentlyandaccuratelycapturetheactualstateofthepatient.Withrespecttodataprocessing,thisreferstoevidenceoffidelityinoperationsperformedonthedata.

Criterion2:Validatedmethodsfortheelectronichealthrecordinformation?Scope:Thiscriterionappliestodatacollectedduringroutinecare(i.e.,duringorassociatedwithaclinicalencounterorassessment).Itappliestopatient-reporteddatacollectedinconjunctionwithroutinecare(e.g.,intakeforms,questionnaires,orratingscalesusedinroutinecareandcollectedthroughhealthcareinformationsystemssuchaspatientportalsorEHRs).NOTE:Questionnairesadministeredthroughstand-alonesystemscreatedforaresearchstudyarenotincludedinthiscriterion.

Purpose:Thepurposeofthiscriterionistoprovideassurancethathealthsystemdatausedfortheprojectcansupporttheintendedanalysisandultimatelytheresearchconclusions.

Seedefinitionofvalidatedabove.

EHRinformation:Forourpurposes,thisdefinitionencompassesdatafrominformationsystemsusedinpatientcareandself-monitoring;thisincludessuchdataobtainedthroughorganizationaldatawarehouses.

Criterion3:Demonstratedqualityassuranceandharmonizationofdataelementsacrosshealthcaresystems/sites?Scope:Thiscriterionappliestodataelementscollectedfortheproject,includingboththosecollectedthroughhealthcaresystemsandthosecollectedthroughadd-onsystemsforthestudy.

Purpose:Thepurposeofthiscriterionistoprovideassurancethatthemeaningandformatofdataareconsistentacrossfacilitiesandthatthemethodsofmeasurement,observation,andcollectionupholdtheintendedconsistency.

Qualityassurance(withinthiscriterion):Alltheplannedandsystematicactivitiesimplementedwithinthequalitysystemthatcanbedemonstratedtoprovideconfidencethataproductorservicewillfulfillrequirementsforquality.Here,qualityassurancepertainstoactivitiesundertakento1)assessexistenceofandpotentialforinconsistentdataacrossparticipatingfacilitiesand2)technical,managerial,orproceduralcontrolsinplacetomaintainconsistencythroughouttheUH3phase.NOTE:TheU.S.FoodandDrugAdministrationhasdefinedqualityassuranceasindependent.

Harmonizationofdataelementsacrosshealthsystems/sites:Useoformappingorganizationaldatatocommondataelements.

Commondataelements:DataelementswiththesamesemanticsandrepresentationasdefinedbytheISO11179standard.7

Dataelement:AsdefinedbytheISO11179standard,adataelementispairingofaconceptandasetofvalidvalues.7

Criterion4:AreplansadequatefordataqualitycontrolduringtheUH3phase?Scope:Thiscriterionappliestodatacollectedfortheproject,includingboththosecollectedthroughhealthcaresystemsandthosecollectedthroughadd-onsystemsforthestudy.

Purpose:Thepurposeofthiscriterionistoprovideassurancethatdataqualitymonitoringandcontrolprocessesareinplacetomaintainthedesiredqualitylevelsandconsistencybetweendatacollectionfacilities/sites.

Qualitycontrol:Theoperationaltechniquesandactivitiesusedtofulfillrequirementsforquality.Qualitycontrolactivitiesareusuallythoughtofasthoseactivitiesperformedaspartofroutineoperationstomeasure,monitor,andtakecorrectiveactionnecessarytomaintainthedesiredqualitylevelswithinacceptablevariance(e.g.,re-abstractingasampleofchartsonaquarterlybasistomeasureinter-raterreliabilityandprovidefeedbacktoabstractors).

References1. InternationalOrganizationforStandardization.ISO8000-2:2012(E)DataQuality–

Part2:Vocabulary.1sted.June15,2012.

2. InternationalOrganizationforStandardization.ISO9000:2005,definition3.1.1,ISO9000:2005(E)Qualitymanagementsystems—Fundamentalsandvocabulary.3rded.September15,2005.

3. LeeYW,PipinoLL,FunkJD,WangRY.JourneytoDataQuality.Cambridge,MA:MITPress;2006.

4. WangR,StrongD.Beyondaccuracy:whatdataqualitymeanstodataconsumers.JournalofManagementInformationSystems1996;12:4–34.Availableat:http://www.jstor.org/stable/40398176.AccessedJuly2,2014.

5. InstituteofMedicineRoundtableonResearchandDevelopmentofDrugs,Biologics,andMedicalDevices.DavisJR,NolanVP,WoodcockJ,EstabrookRW,eds.AssuringDataQualityandValidityinClinicalTrialsforRegulatoryDecisionMaking.WorkshopReport.Washington,DC:NationalAcademiesPress;1999.Availableat:http://books.nap.edu/openbook.php?record_id=9623.AccessedJuly2,2014.

6. SocietyforClinicalDataManagement.GoodClinicalDataManagementPractices(GCDMP).Availableat:http://www.scdm.org/sitecore/content/be-bruga/scdm/Publications/gcdmp.aspx.AccessedJuly2,2014.

7. InternationalOrganizationforStandardization.ISO11179.InformationTechnology–Metadataregistries(MDR).

AppendixII:DataQualityAssessmentPlanInventoryTheinitialplanofthePDSDQCorewastoinventoryplanneddataqualityassessmentpracticefromtheUH2applications,reviewthestatementsofdataqualityassessmentplanswiththeDemonstrationProjects,summarizetheplansinthecontextoftheexistingliterature,andsupporttheprojectsasneededinfollowingtheexistingplansorinformulatingandundertakingnewplansifdesired.

ThePDSDQCoreconductedadataqualityassessmentinventorytocharacterizedataqualityassessmentplansacrosstheinitialsevenUH2fundedDemonstrationProjects.ThedataqualityassessmentinventorywasconductedinMarch2013andreportedattheApril29-30,2013CollaboratorySteeringCommitteemeeting(dataavailableuponrequest).

GiventheCollaboratory’sfocusonPCTsinitsDemonstrationProjects,weexpectedvariabilityinthedatasourcesusedaswellastheextenttowhichanyprojectrelieduponanyonedatasource.Tocharacterizetheiruse,datasourcescommonlyusedbytheDemonstrationProjectswereclassifiedintofivecategories(TableA1):

1. ExternalPRO:PROorotherquestionnairedatacollectedoutsideofanEHR,suchasthoseusingaseparatepersonalhealthrecordsystem,interviews,orpaperquestionnaires.

2. PROinEHR:PROorotherquestionnairedatacollectedusinganEHRsystem.3. Research-specificEHRscreens:Datacollectionfields,modules,orscreens

renderedforusersasiftheywereapartoftheEHRsystem.4. Clinicaldatafromaninstitutionaldatawarehouse:Medications,reportsfrom

laboratoryanddiagnostictests,clinicalnotes,andstructuredclinicaldatasuchasvitalsignsoriginatingfromapatientcare–facingsystemthatareaccessedthroughaninstitutionalclinicaldatawarehouseratherthandirectlyfromthetransactionalsystem.

5. Administrativedatafromaninstitutionaldatawarehouse:Codeddiagnosesandproceduresusedforreimbursement.

TherewasalsovariabilityintheextenttowhichDemonstrationProjectsreliedonexistingversusprospectivelycollecteddata.Fiveofsevenprojectswereidentifiedascollectingresearchdatainadditiontoroutinecaredata,fourprojectsareusingPROdata(oneofwhichincludespatientandstaffinterviews),andoneprojectisaddingdatacollectionscreenstotheEHR.

TableA1.DataSourceSummary

ProjectExternalPRO

PROinEHR

Research-specificEHRscreens

Datawarehouse/EHRclinicaldata

Datawarehouseadmin.data

Collaborativecareforchronicpaininprimarycare* Paper,interview X X X

Nighttimedosingofanti-hypertensivemedications:apragmaticclinicaltrial

Personalhealthrecordorinterview

Decreasingbioburdentoreducehealthcare-associatedinfectionsandreadmissions*,†

Strategiesandopportunitiestostopcoloncancerinprioritypopulations*

Patientinterviews X X

Apragmatictrialoflumbarimagereportingwithepidemiology(LIRE)‡

Pragmatictrialofpopulation-basedprogramstopreventsuicideattempt

Pragmatictrialsinmaintenancehemodialysis X X X

*Includesstaffinterviewdata.†Includesdatafromexternallaboratory.‡Includesexternallyenhanceddata.

Variationindataqualityassessmentmethodscorrespondswithvariationindatasources.TocharacterizedataqualityassessmentpracticesacrossDemonstrationProjects,initialapplicationswerereviewedandeachprojectprovidedanyupdatesdescribingtheirplanneddataqualityassessmentpractices(TableA2).

Duetothedependenceofdataqualityassessmentonthetypeofdataandtheavailablesourcesforcomparison,opportunisticdataqualityassessmentsthatleverageavailablesourcesforcomparisonshouldbeexpected,ratherthanuniformitywithrespecttocomparisons.

TableA2.DataQualityAssessmentActivitiesProject Collection

controlCompleteness Accuracy

Ascertainment %Columncomplete

Individual Aggregate

Collaborativecareforchronicpaininprimarycare

Procedural;technical

PartofETLintowarehouse

Nighttimedosingofanti-hypertensivemedications:apragmaticclinicaltrial

Procedural(abstractionforms)

100-casechartreview;ACPPV/NPV≥90%

1.Yesonn=1000cases,AC<5%perDE;2.Site-to-sitevariability,completeness

1.Comparisontopatientself-report;2.ComparisontoNDI;3.IRRabstractionthreshold;4.Endpointreview;5.Out-of-rangevalues

Decreasingbioburdentoreducehealthcare-associatedinfectionsandreadmissions

Procedural;technical

Yes(monthlymonitoring)

Healthsystemvalidated

Strategiesandopportunitiestostopcoloncancerinprioritypopulations

Independentdata;%chartreview

Callaudit;%chartreview

Apragmatictrialoflumbarimagereportingwithepidemiology(LIRE)

Yes Site-to-sitevariability

Pragmatictrialofpopulation-basedprogramstopreventsuicideattempt

%chartreview

Pragmatictrialsinmaintenancehemodialysis

Procedural Yes Validvalues

AC,ascertainmentcompleteness;DE,dataerror;ETL,extract-transform-load;IRR,interraterreliability;NPV,negativepredictivevalue;PPV,positivepredictivevalue

Inmostcases,dataqualityassuranceandcontrolactivitiesfordatacollecteddenovowerenotdescribedindetail.

ThisinventorywasreportedtotheCollaboratoryinthecontextoftheexistingliterature.

AppendixIII:InitialDataQualityAssessmentRecommendationsforCollaboratoryProjectsAttheApril2013CollaboratorySteeringCommitteemeeting,anapproachforaddressingthedatavalidationrequirementswaspresented(dataavailableuponrequest).Thisapproachincluded:

1. Completenessassessment:Afour-dimensionalcompletenessassessmentthatcouldbeconductedbyallDemonstrationProjects.

2. Accuracyassessment:Identificationandconductofproject-specificdataquality(accuracy)assessments.

3. Impactassessment:Useofthecompletenessandaccuracyassessmentresultsbytheprojectstatisticiantotestsensitivityoftheanalysestoanticipateddataqualityproblems.

ATotalDataQualityManagementapproach1,2wasappliedtoidentifyandprioritizeproject-specificdataqualityneeds(step2above).Thefollowingprojectinformationwasreviewed:

1. Dataelementscollectedandusedfortheproject’sstatisticalanalysis2. Workflowdiagramsforclinicprocessesthatgeneratedatausedinthestudy3. Dataflowdiagramsfordataelementsusedinthestudy

Higherprioritywastobegiventocohortidentificationandoutcomedataelements.Theinitialdataelementlistfromeachprojectapplicationwasreviewedandupdatedwhereneeded;specificationofthesourcesystemforeachdataelementwasadded.Theworkflowanddataflowdiagramsconcentratedonprocessesusedtogeneratedatautilizedinthestudywithoutregardtowhethertheseprocesseswerepartofroutineclinicpracticeorspecifictothestudy.Thedevelopmentanddiscussionofthediagramswereusedtosurfacepotentialsourcesofinconsistencyordataerror.

TheCoreproposedone-on-oneworkwith,orindividualworkby,eachDemonstrationProjectteamtodeterminethetypeofaccuracyassessmentattainableandthetargeteddatavalidationassessmentsvaluableforeachproject.

TestingtherecommendationswiththeSTOPCRCprojectAttheSteeringCommitteemeeting,oneproject,StrategiesandOpportunitiestoStopColonCancerinPriorityPopulations(STOPCRC),cameforwardtoworkthroughtheproposedapproachwiththeCore.Aseriesofseveralcallsheldover2monthswereconductedastheprojectandCoreworkedthroughtheaboveapproach.Thecallswereattendedbyaco–principalinvestigatoroftheSTOPCRCtrial,theprojectinformaticianoverseeingthestudy’smultifacilityEHRimplementation,thefirstauthorofthisreport,andtwoinformaticiansfromtheCoordinatingCenter.Asplanned,thedevelopmentanddiscussionofthediagramswereusedtosurfacepotentialsourcesofinconsistencyordataerror.

Thedataqualityassessmentworkwasreportedonmonthlycallsandwassummarizedbothinwritingandbyatemplateforreportingadataqualityassessment.

SummaryoffindingsfromtestingwiththeSTOPCRCprojectAworkflowdiagramexistedandwascontributedbytheSTOPCRCprojectteam,aswasanupdateddatadictionary.ACoordinatingCenterinformaticianconductedinterviewswiththeSTOPCRCprojectco–principalinvestigatorandinformaticiantounderstandtheworkflowandcompletethedataflowdiagram.Themajorityofthetimeonthecallswasspent1)educatingtheCollaboratoryCoordinatingCenterinformaticiansaboutthestudyandthelocaldatapolicies,procedures,andsystems;2)discussingdatasourcesandreviewingtheworkflowanddataflowdiagrams;3)discussingpossibledataqualityproblemsbasedontheco–principalinvestigator’sexperiencewithasimilarproject,aswellaspotentialsolutions;and4)creatingaplanforinitialandongoingassessmentsofdataqualityandcompleteness.TheSTOPCRCstatisticianattendedthefinalcalltodiscussthedatavalidationplanresults,impactassessments,andplansforongoingdataqualityassuranceduringthemultisitetrial.

Duetothefactthatdataqualityassessmentplansexistedforeachprojectandweredeemedacceptablethroughthegrantreview,thesystematicapproachatdesigningadataqualityassessmentplanwasofferedonavoluntarybasis.Becauseoftheinclusionofthedatavalidationcriteriainthereviewcriteriaforthetrialconductfundingdecision,weanticipatedthatmostifnotallDemonstrationProjectswouldhaveshowninterestintheofferedapproachandsupport.OnlyoneprojectengagedtheCoreforasystematicassessment.Nootherprojectsreportedmakinguseofthedraftreviewcriteriadefinitionsordataqualityassessmentinformation,plans,ortemplatesproduced.

References1. DavidsonB,LeeWY,WangR.Developingdataproductionmaps:meetingpatient

dischargedatasubmissionrequirements.InternationalJournalofHealthcareTechnologyandManagement2004;6:223–240.

2. LeeYW,PipinoLL,FunkJD,WangRY.JourneytoDataQuality.Cambridge,MA:MITPress;2006.

assessing data quality for healthcare systems data … data quality for healthcare systems data used...

Documents

mandatory healthcare associated infection surveillance ·...

data quality in healthcare comparative...

healthcare quality concepts

national healthcare quality and disparities report ... ·...

data mining in healthcare: how health systems can improve...

vip healthcare quality healthcare @ your convenience

national hospital inpatient quality reporting measures ......

prioritising data quality challenges in electronic ... ·...

quality in healthcare organizations

healthcare quality improvement efforts...1 healthcare...

digital healthcare quality assurance

healthcare quality: basic concepts

computing healthcare quality indicators automatically: ...

improving healthcare quality

sustaining quality of healthcare: institutionalization of...

south carolina association for healthcare quality using data...

data quality solutions to improve healthcare...

udi implementation reality - data quality · udi...

healthcare quality reporting overview

qi is our niche: a scaﬀolded approach to teaching...