an introduction to analysing a snomed ct coded dataset
Post on 04-Jun-2022
4 Views
Preview:
TRANSCRIPT
An introduction to analysing a SNOMED CT coded dataset using a FHIR terminology server
Matt CordellTerminology Specialist
A quick introduction to SNOMED CT, FHIR & Ontoserver
SNOMED CT
• Much larger than most other code systems traditionally used in healthcare (ICD, ICPC etc.)
• Primary purpose is recording clinical notes, with the specificity required by clinicians, and interoperability– Structure* supports secondary uses (analytics).
• Codes have no intrinsic meaning, simply identifiers. 278285008|Left hemiplegia| & 278284007|Right hemiplegia|
• Concepts in the terminology are associated by range of relationships, forming an Ontology.
• Expression Constraint Language (ECL) – language that supports sophisticated queries against the terminology.
FHIR
• Latest Interoperability standard from HL7, supporting modern RESTful practices. (ValueSets)
Ontoserver
• Provides FHIR based access to terminology, including ECL support
• Made available for use throughout Australia via the National Clinical Terminology Service (NCTS)
ECL in 90 seconds
<396234004|Infective arthritis| All (Subtypes) of Infective arthritis
<64572001|Disease|:116676008|Associated morphology|=23583003|Inflammation|
All Diseases associated with inflamation
<928000|Musculoskeletal disorder|:246075003|Causative agent|=<<49872002|Virus|
Musculoskeletal disorders with some Viral involvement
What might a SNOMED CT dataset look like?
Unique Conditions : 24647
Unique Medications: 10128
Rows : 500,000
* Randomly generated synthetic dataset
Index Sex DoB PostCode Condition Medication
0 F 26/04/1998 B03 102930000 7086011000036102
1 F 24/01/1953 E00 49512000 1112071000168105
2 M 7/09/1943 E00 277627005 5604011000036100
3 M 1/01/1966 E00 3109008 3231000036108
4 F 14/02/1957 E00 723409007 6286011000036105
5 M 14/08/1961 E00 3272007 761951000168100
6 F 28/01/1986 C04 86225009 921045011000036104
7 F 15/06/1983 C04 163577001 NaN
8 F 23/05/1967 C04 191737008 927853011000036101
… … … … … …
499998 M 16/01/1984 B09 443919007 36227011000036103
499999 M 28/03/1995 B09 723913009 5081011000036108
Basic outline of approach to SNOMED CT analytics
o Define aggregation categories using SNOMED CT Expression Constraint Language (ECL)
o Identify all the codes that match our category, using Ontoserver to perform valueSet Expansions.
o Store the results of each expansion in a Hash Set for fast lookup.
o Use the Sets to filter our dataset, and optionally create human readable labels.
o Use standard analytic approaches to report and visualise the data.
Populate Set with ECL
• Create a GET request with the ECL parameter
• Parse the JSON response to a FHIR Value Set
• Iterate through the Value Set and populate
the Hash with just the codes.
• Return the Hash.
import requests #for Rest calls
from fhir.resources.valueset import ValueSet
def PopulateSetWithECL(ecl):
endpoint= https://ontoserver.csiro.au/stu3-latest
expandAPI="/ValueSet/$expand“
sctValueSetUrl='http://snomed.info/sct?fhir_vs=ecl/’
urlParam={'url':sctValueSetUrl+ecl}
response=requests.get(endpoint+expandAPI,params=urlParam)
j=response.json()
vs=ValueSet(j)
_set=set()
for e in vs.expansion.contains:
_set.add(e.code)
return _set
Creating Health Condition Labels
o A list of tuples, each tuple consisting of an ECL definition and label
o Iterate through this list
o Create the Hash Set based of the ECL
o Create Boolean filter for concepts that match the Set
o Label accordingly in a new “Category” column.
healthCategories=[
('<<106028002','Musculoskeletal problems’),
('<<106048009','Respiratory problems’),
('<<195967001','Asthma’),
('<<363346000','Cancer’),
('<<13645005','COPD’),
('<<73211009','Diabetes mellitus’),
('<<106063007','Cardiovascular problems’),
('<<249578005','Kidney problems’),
('<<74732009','Mental illness’),
('<<40733004','Infectious disease’),
('<<414022008','Blood disease’)]
for category in healthCategories:
categorySet = PopulateSetWithECL(category[0])
filter = codeSet["Condition"].isin(categorySet)
codeSet.loc[filter,"Category"]=category[1]
Index Sex Condition Medication Category
0 F 102930000 7086011000036102Other Condition
1 F 49512000 1112071000168105 Mental illness
2 M 277627005 5604011000036100 Cancer
… … … … …
499998 M 443919007 36227011000036103 Mental illness
499999 M 723913009 5081011000036108 Mental illness
codeSet.groupby(['Category','Sex']).size()Category Sex Count
Blood disease F 7741
M 3295
Cancer F 1909
M 3298
Cardiovascular problems F 13716
M 10481
Diabetes mellitus F 18463
M 10362
Infectious disease F 1435
M 368
Kidney problems F 531
M 356
Mental illness F 106980
M 104910
Musculoskeletal problems F 1817
M 1400
Other Condition F 107163
M 105340
Respiratory problems F 230
M 205
Category OverlapOverlap managed by:• Categories ordered by priority
• Later categories overwrite; or• Only label unlabled
• Build disjointness into ECL
<<106048009|Respiratory|
Minus (
<<363346000|Cancer|
OR <<106028002|Musculoskeletal|
OR <<40733004|Infectious
)
Use case dependent,
especially where double counting
Counting Opioidso Again, iterate through this list as before, adding an “Opioid”
labelopioids= [('<34841011000036108','dihydrocodeine'),
('<21821011000036104','codeine'),
('<21705011000036108','pholcodine'),
('<21232011000036101','buprenorphine'),
('<21357011000036109','methadone'),
('<135971000036102','tapentadol'),
('<21258011000036102','fentanyl'),
('<21259011000036105','oxycodone’),
…
('<21252011000036100','morphine'),
('<21486011000036105','tramadol'),
('<21901011000036101','dextropropoxyphene'),
('<34839011000036106','pethidine’),
('<1247191000168104','sufentanil')]
for opioid in opioids:
OpioidSet = PopulateSetWithECL(opioid[0])
filter = codeSet[“Medication"].isin(OpioidSet)
codeSet.loc[filter,"Opioid"]= opioid[1]
Index Sex Medication Opioid
65 M 7349011000036100 oxycodone
219 M 1070441000168107 codeine
648 F 1048081000168105 buprenorphine
... ... ... ...
499738 F 34022011000036100 methadone
499802 M 785911000168101 fentanyl
499951 M 36062011000036104 dextropropoxyphene
Opioids
Using AMT’s “Concrete domain” in ECL
/*High Dose, 200mg or greater*/
<30497011000036103|medicinal product|:
{
30364011000036101|has Au BoSS|=1817011000036100|aspirin|,
700000111000036105|Strength| >= #200,
177631000036102|has unit|=700000801000036102|mg/each|
},
[1..1] 700000081000036101|has intended active ingredient|=ANY
53798011000036101|Ecotrin 650 mg enteric tablet|
/*Low Dose <200mg */
<30497011000036103|medicinal product|:
{
30364011000036101|has Au BoSS|=1817011000036100|aspirin|,
700000111000036105|Strength| < #200,
177631000036102|has unit|=700000801000036102|mg/each|
},
[1..1] 700000081000036101|has intended active ingredient|=ANY
/*Combination Aspirin Products*/
<21719011000036107| aspirin (MP)|:
[2..*] 700000081000036101|has intended active ingredient|=ANY
“Concrete Domain” expansions
High Dose – 28 concepts
o Solprin 300 mg dispersible tablet
o Disprin Direct 300 mg chewable tablet
o Alka-Seltzer Lemon-Lime 324 mg effervescent tablet
Low Dose – 27 concepts
o Spren 100 mg tablet
o Cardasa 100 mg enteric tablet
o Aspirin Low Dose (Nyal) 100 mg enteric tablet
Combination Products – 54 concepts
o Clopidogrel/Aspirin 75/100 (AN) tablet
o Duoprel 75/100 tablet
o Action Cold and Flu effervescent tablet
Additional Resources
snomed.org/eclSNOMED CT ECL Specification
ontoserver.csiro.au/shrimpShrimp Browser
github.com/AuDigitalHealth/ecl-examplesAgency ECL examples
bit.ly/SNOMED_HDA19Supplementary Jupyter Notebook
Contact us
1300 901 001
help@digitalhealth.gov.au
healthterminologies.gov.au
twitter.com/AuDigitalHealth
Help Centre
Website
OFFICIAL
top related