etl & database development
TRANSCRIPT
Tracy Steven Brown
Extract, Transform & Load (ETL) – Database Development
Tracy’s Biomedical Informatics Services Group
The following examples highlight a small subset of Microsoft
Transact-SQL (T-SQL) queries, views, and functions developed by
Tracy Steven Brown in his Appointed Professional position within
the prestigious University of Arizona Center for Biomedical
Informatics and Biostatistics under the leadership of Don Saner,
Assistant Chief Knowledge Officer. The following examples are
based on EPIC Clarity, an extensive hospital electronic medical
record system.
Tracy’s team is directly affiliated with the Arizona Health Sciences
Center (AHSC) Biomedical Informatics Core and the Office of the
Senior Vice President of Health Sciences. Our mission is to foster
meaningful translational and applied clinical information
management research. This core service group provides a single
point of service for comprehensive biomedical informatics and
associated analytics.
Our medical research objectives are to extract specific and
targeted patient medical information from the University of
Arizona Medical Center (UA-UMC) EPIC/Clarity medical record
system in support of the research mission set by the office of the
Senior Vice President of Health Sciences (AHSC).
Database Development & ETL Style
Most of the example code has been extracted from much
larger queries to illustrate the use of temporary tables, inner
joins to a table-based function, or inner views embedded in
larger select statements. I often use the ROW_NUMBER()
function to produce row counts over specific columns in order
to isolate duplicate records as an optimization instead of
relying on the ubiquitous DISTINCT qualifier; with that said, I
do use DISTINCT where appropriate. In addition to filtering
duplicates, I’ll employ the use of ROW_NUMBER()and record
partitioning to isolate specific procedure counts to extract
records; for example, three or more hospital inpatient
hemodialysis procedures over a specific time interval. This
example illustrates the use of a temporary table to store a
targeted patient list that is later used for a pharmacology
query.
Tracy Steven Brown
The following example is an extension from the
above illustration highlighting basic filtering within
an inner join. This example is a typical ETL case study
where a University of Arizona College of Pharmacy
graduate college researcher is specifically interested
in the efficacy of certain medications in the
treatment of hemodialysis patients while inpatient
at our world class medical center.
Stylistically, you can see my code is extensively
commented, clean, and well organized. This
example also shows the use of the LIKE operator as
well as filtering NULL values from the result set. Also
of note is the use of Boolean logic combinations as
a means to identify and extract the appropriate
result set.
Data Transformation
Record extraction is one aspect of our groups’
mission. We also are responsible for transforming
records from one form into another appropriate
form to support statistical review, readability, or
for loading into a separate data warehouse. The
following short example illustrates one example of
data transformation where information is
extracted from patient laboratory medical records
and displayed in specific columns. Specifically, this
example extracts biopsy information from
pathology reports stored in our hospital
EPIC/Clarity medical record system. The original
data set was extracted from a formalin-fixed and
paraffin-embedded tissue sample dataset and
linked to our medical records data warehouse.
While the full query is several hundred lines of
code, the example shows Tracy’s use of T-SQL
Inner Common Table Expressions (CTE) and more importantly, the technical challenges associated with parsing free-form
text using standard SQL syntax. The combination of pattern indexing, PATINDEX(), in conjunction with character indexing,
CHARINDEX(), yields an extremely effective technique for parsing targeting text from lengthy medical reports and
laboratory notes. Query results show biopsy details as dictated by the Pathologist.
Tracy Steven Brown
Support Code and Other Examples
While our ETL group is composed of several individuals,
Tracy has developed several support queries, views, and
functions to help reduce duplicate work effort and to
help team members perform their duties effectively.
The partial function definition to the right provides a
unique set of patient diagnosis records based on ICD-9-
CM and ICD-10-CM data points through the use of
CROSS APPLY and a custom string parsing function. The
function definition is truncated due to space constraints;
however, the style conveys a simple and elegant
approach to database development that is customary
with Tracy’s work.
Tracy has several other support queries including
laboratory records, treatment order records, biological
specimen order records, as well as medication records,
diagnosis records, and patient demographics records.
The next example is a support function Tracy wrote to
separate out comma separated values within column
data or function argument parameters. Its primary use
is to unpack a set of comma separated values into a
table where each row represents a single entry from the
original list.
The code examples and descriptions illustrated in this
document are the work of Tracy Steven Brown and show
only a subset of his work portfolio. The purpose is to
highlight specific techniques and style to help the reader
gain a better idea of the caliber of work you can expect.