the challenges of codes in rwd

4
PAGE 64 IMS HEALTH REAL-WORLD EVIDENCE SOLUTIONS INSIGHTS RWE PLATFORM DEVELOPERS The author Christian Reich, mD, PHD Vice President, RWE Solutions, IMS Health [email protected] The challenges of codes in real-world data Real-world data is the backbone of evidence generation but as a resource created for very different purposes and its confounding characteristics can be a challenge for unfamiliar users. Here we consider the particular complexities of coding – a critical prerequisite for data aggregation but one that demands quite specific solutions to tap into and realize the value of the underlying content.

Upload: imshealthrwes

Post on 07-Jan-2017

271 views

Category:

Healthcare


0 download

TRANSCRIPT

Page 1: The challenges of codes in RWD

PAGE 64 IMS HEALTH REAL-WORLD EVIDENCE SOLUTIONS

INSIGHTS RWE PLATFORM DEVELOPERS

The author

Christian Reich, mD, PHD Vice President, RWE Solutions, IMS [email protected]

The challenges of codes inreal-world data

Real-world data is the backbone of evidence generation butas a resource created for very different purposes and itsconfounding characteristics can be a challenge for unfamiliarusers. Here we consider the particular complexities of coding– a critical prerequisite for data aggregation but one thatdemands quite specific solutions to tap into and realize thevalue of the underlying content.

Page 2: The challenges of codes in RWD

ACCESSPOINT • VOLUME 5 • ISSUE 10 PAGE 65

Addressing the ‘curse’ of RWEReal-world data (RWD) – most of it – is coded.Not much is represented in textual form.Things that happen to us as patients –diagnoses, complaints and symptoms, drugtreatments, lab tests, diagnostic andtherapeutic procedures and applications ofmedical devices – are represented by codesfrom standardized coding schemes. This is an amazing fact given that we are talking abouthealthcare – an industry that has been lagging incomputerization and the introduction of industrializedprocesses by decades. In fact, these coding schemes are notonly pervasive but they are also designed to becomprehensive, projecting every relevant possible situationin the typical healthcare settings. The reason lies in theprimary purpose of collecting the data when it first becamedigital: mortality and morbidity reporting andreimbursement claims processing. Electronic medicalrecords (EMR) were very rare at the time and even today stillhave some way to go to create a complete representation ofthe facts in structural form: discharge summaries andpathology reports are often still just plain text.

The first wave of healthcare digitization came in the 1980swith systems that helped to process claims. In order tomake that a repeatable and reliable process, all the variousservices were standardized and assigned a code. Since theservices required justification of why they were rendered,the justifications themselves were coded as well. Thisbrought us coding conventions for procedures anddiagnoses. Next came drugs, this time for pharmacyreimbursement by payers for filling prescription medicinesand for the FDA to know what products were on the market(resulting in the introduction of the National Drug Code(NDC)). At least, some would claim, with money at stake thequality of this data should be more reliable.

All this would be just an interesting fact were it not that theentire RWE industry is based on it. Without the codes itwould be impossible to aggregate data at the necessary scalefor it to become the foundation of evidence generation.Neither would it be possible to put together and interrogatedatabases of more than 80 million patients and explore thenatural history of a condition, its treatments and theireffectiveness, and compare them with other treatments orto no treatment at all. While the number is impressive, thedownside is that this data is ‘shallow’ with a lot of the detailmissing and with a half-life of six months per patient.Nevertheless, compared to the epidemiological research of

old – some poor analyst in the basement of a hospitalsifting through dusty patient records and counting factsusing a clipboard – this is an amazing leap forward andopens monumental opportunities in understanding andimproving disease and healthcare. That’s the good news.

The bad news is that everything is coded; the codes are all wehave and we must live with them. But that isn’t always easy.

Curses of codingThe problem is that codes are made for a purpose – themanagement of healthcare processes (claim reimbursementor medical transactions). This means they represent factsthat are relevant for those particular transactions but notnecessarily for understanding a patient’s underlyingetiological and pathogenetic processes or their treatments.Specifically, they make life difficult in four key ways:

1. Overabundance of coding schemes. There is a myriad ofcompeting coding schemes representing more or less thesame domains. For example, there is ICD9, ICD10 (withnational versions), Read and SNOMED for diagnoses.There are more than half a dozen coding schemes fordrugs: NDC, GPI, FDB, Multum, Multilex, DM+D,Gemscript as well as Read and SNOMED. The situation forlab tests and procedures is similar. Unless crosswalks ormapping is provided, it is down to the analyst to navigatethis Babylonian language jumble. However, it takes along time to become truly ‘fluent’ in theseterminologies, making such analysts a rare breed, whichis a big problem for the customers.

2. Ambiguity in precision. For some common conditionsthere are multiple codes representing various details ofthe disease. In the case of diabetes mellitus, for example,there are 95 ICD9 codes. This level of detail is due todiabetes being a very prevalent disease with manydifferent complications. In the case of HIV, another exampleof a frequent disease, there are only four codes in ICD9.

Code systems are designed to be comprehensive.However, in some cases the enumeration of all possiblevariants of a disease would result in very large amountsof codes and the medical community might not agree onthe exact composition of such an enumeration. For thosecases, the concept ‘not otherwise specified’ or NOS wascreated. For example, ICD9CM 362.14 stands for “Retinalmicroaneurysms NOS”, meaning any microaneurysmthat does not have a cause coded in another retinopathy,such as hypertensive or diabetic retinopathy.

Without codes it would be impossible to aggregate data at thenecessary scale for it to become the foundation of evidencegeneration... but living with them isn’t always easy.

“”

continued on next page

Page 3: The challenges of codes in RWD

PAGE 66 IMS HEALTH REAL-WORLD EVIDENCE SOLUTIONS

INSIGHTS RWE PLATFORM DEVELOPERS

However, to understand exactly which conditions aresummed up in this NOS, the analyst will need to know allthe codes where the microaneurysms are ‘specified’.Hence, the meaning of this code depends on the meaningof an unknown number of other codes. This may be anacceptable solution for billing but not for dissectingprecise medical conditions.

Then there are codes which are just general catch-allconcepts. One particularly interesting one is 729.99‘Other disorders of soft tissue’, which is completelyuseless for observational research or RWE generationsince half of all diseases could be construed as a disorderof a soft tissue.

3. Inconsistent hierarchical structure. When a physiciandiagnoses a disease, the result will reflect the level ofwork-up. For example, a patient with a dilatedcardiomyopathy due to taurine deficiency will presentfirst as a cardiomyopathy or disease of the myocardiumleading to a dilation of the heart. After excludingprimary causes, such as ischemic or infectiouscardiomyopathies, the search will go into causes of thedisease that are the result of another illness, such as ametabolic disorder or nutrient deficiency. Only at theend will the cause be determined as lack of taurine, amajor constituent of bile. However, each of these arelegitimate diagnoses, nested into each other: asecondary cardiomyopathy is a cardiomyopathy but notnecessarily the other way around.

All of this matters because the coding schemes make itlook as though there is a linear list of all possibleimpairments and that one has nothing to do with theother. Statistical analyses do the same thing, using a codeas a single covariate to calculate risk or probability. Inother words, they treat codes as in a one-man, one-vote

system. In reality, these conditions are all heavilyinterdependent on each other; our ability to generateprecise evidence depends on the ability to understandthese relationships.

Some code systems have hierarchical relationshipsinbuilt. SNOMED-CT, for example, has a fully developedhierarchy of diseases and other domains. Other codingsystems are less robust. ICD9, for example, features asimple three-layer hierarchy micro-coded in the codes.However, this hierarchy is very primitive, only allowingup to one parent for each code; some of the hierarchicalrelationships are daring at best. For example, ICD9 785‘Symptoms involving cardiovascular system’ hasdescendants of such completely unrelated conditions asarrhythmias, abnormal heart sounds, gangrene,enlargement of lymph nodes (which are not part of thecardiovascular system) and shock (Figure 1).

4. Mixing of domains. Coding schemes are controlledvocabularies for a certain area or domain of medicine –diagnoses and conditions, drugs, devices, procedures,tests, etc. That is how they start. Then, due to their rolein organizing the healthcare processes, those strictlimitations are broken down by a growing number ofexceptions. For example, the coding scheme CPT4 standsfor ‘Current Procedural Terminology, 4th Edition’. Theassumption is that the codes contain procedures. Indeed,CPT4 has almost 12,000 codes for procedures that can beadministered by the provider. However, it also containsover 600 quality survey codes, such as 0583F ‘Transfer ofcare checklist used (Peri2)’, and about 100 drugs, mostlyvaccines. HCPCS, another coding system commonlyassumed to represent procedures in the USA, has only aminority of about 1,000 procedures but 3,500 medicaldevices, such as L7007 ‘Electric hand, switch ormyoelectric controlled, adult’ or a simple thing such asL7360 ‘Six volt battery’.

Source: IMS Health

Associated morphology

Lower respiratorytract structure

Granulomatousinfection

Exudativegranulomatousinflammation

Necrotizinggranulomatousinflammation

Non-photochromogenic

mycobacteria

Mycobacteriumtuberculosis

complex

Slow growingmycobacteria

Pulmonarytuberculosis

Myco-bacteriosis

Infectiousdisease of lung

Bacterial lowerrespiratoryinfection

TuberculosisPulmonary

disease due tomycobacteria

Pneumonitis

Tuberculousfibrosis of lung

Isolated trachealor bronchialtuberculosis

Tuberculosisof lung withcavitation

Entire lung

Lung structure

Causative agentFinding site

Anatomical Site

Microorganism

Clinical Finding (Disease)

Pathology

Figure 1: Relationships of medical entities

Page 4: The challenges of codes in RWD

ACCESSPOINT • VOLUME 5 • ISSUE 10 PAGE 67

HCPCS even contains diagnoses, such as G8848 ‘Mildobstructive sleep apnea’ (Figure 2).

None of these issues are insurmountable, providing theanalyst knows, for example, to find sleep apnea patients in aprocedure coding system and is familiar with all the othercoding idiosyncrasies. However, while this is already aproblem for an integrator who is intimately familiar withthe data, it is a bigger one for the customer. Fortunately,solutions do exist.

Cracking the codeCoding schemes are perceived to make it hard to generatereliable evidence, particularly in ensuring the right codelists representing a certain patient population and thatnothing has been left out. They also call for in-depthknowledge of the underlying healthcare system in which thecodes are used, creating an additional burden whengenerating evidence across different countries.

The solutions that can systematically address the issuespresented by the coding schemes and enable reliableevidence generation need to include two aspects: acomprehensive map of the entire semantic space of medicalentities, and a tool to navigate it:

• Master Catalog, containing the universe of codingschemes and their codes, including lifecycle informationsuch as deprecation and succession to keep them freshfor actual use in RWD. Currently, users trying to accessthat information have to select from public websites ofunclear quality, one for each coding scheme.

• Mapping between equivalent codes. Equivalence isdefined here as supporting the purpose of a certainevidence generation rather than any type of semanticequivalence. This will allow cross-walking betweencoding systems.

• Hierarchical grouping of codes. Such hierarchies need tobe pre-populated for typical drug classes and diseasehierarchies, etc, but should allow user-defined groupingsas well.

• Lateral or semantic relationships between codes. Theserepresent medical facts, such as indications of drugs,complications of procedures, etc.

The problems have been recognized. For example, the OMOPCommon Data Model, which is geared towards evidencegeneration from observational data, includes StandardizedVocabularies with mapping, relationships and hierarchicalclasses. There are also commercial providers of medicalterminologies, such as Health Language, Intelligent MedicalObjects or Appelon. However, none of these solutions reallysolve the problem for the researchers: allowing thegeneration of evidence on the basis of a robustunderstanding of the semantic space of the involvedmedical concepts or entities.

Future solutions of the IMS Health application andtechnology platform Evidence 360™ will incorporate thisfunctionality, providing the user with self-service tools tonavigate through the maze of domains, schemas and codesand allow generation of the same type of evidence reliablyacross all IMS Health data assets, regardless of the healthcaresystem of origin or type of data capture. Existing open-sourceor public solutions such as the UMLS or the OMOPStandardized Vocabularies can serve as a starting point for acomprehensive and industrialized solution to the problem.

Source: IMS Health

ObservationsDrugsProceduresDevicesConditions NDC

624,965

GPI

SNOMED395,822

ICD9CM

ICD10

HCPCS

CPT4

MedDRA96,494

Read98,021

Figure 2: Composition of select vocabularies

Solutions that can systematically address the issues and enable reliableevidence generation need to include a comprehensive map of theentire semantic space of medical entities and a tool to navigate it.

“”