etl & database development

3
Tracy Steven Brown Extract, Transform & Load (ETL) – Database Development Tracy’s Biomedical Informatics Services Group The following examples highlight a small subset of Microsoft Transact-SQL (T-SQL) queries, views, and functions developed by Tracy Steven Brown in his Appointed Professional position within the prestigious University of Arizona Center for Biomedical Informatics and Biostatistics under the leadership of Don Saner, Assistant Chief Knowledge Officer. The following examples are based on EPIC Clarity, an extensive hospital electronic medical record system. Tracy’s team is directly affiliated with the Arizona Health Sciences Center (AHSC) Biomedical Informatics Core and the Office of the Senior Vice President of Health Sciences. Our mission is to foster meaningful translational and applied clinical information management research. This core service group provides a single point of service for comprehensive biomedical informatics and associated analytics. Our medical research objectives are to extract specific and targeted patient medical information from the University of Arizona Medical Center (UA-UMC) EPIC/Clarity medical record system in support of the research mission set by the office of the Senior Vice President of Health Sciences (AHSC). Database Development & ETL Style Most of the example code has been extracted from much larger queries to illustrate the use of temporary tables, inner joins to a table-based function, or inner views embedded in larger select statements. I often use the ROW_NUMBER() function to produce row counts over specific columns in order to isolate duplicate records as an optimization instead of relying on the ubiquitous DISTINCT qualifier; with that said, I do use DISTINCT where appropriate. In addition to filtering duplicates, I’ll employ the use of ROW_NUMBER()and record partitioning to isolate specific procedure counts to extract records; for example, three or more hospital inpatient hemodialysis procedures over a specific time interval. This example illustrates the use of a temporary table to store a targeted patient list that is later used for a pharmacology query.

Upload: tracy-brown

Post on 18-Aug-2015

6 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ETL & Database Development

Tracy Steven Brown

Extract, Transform & Load (ETL) – Database Development

Tracy’s Biomedical Informatics Services Group

The following examples highlight a small subset of Microsoft

Transact-SQL (T-SQL) queries, views, and functions developed by

Tracy Steven Brown in his Appointed Professional position within

the prestigious University of Arizona Center for Biomedical

Informatics and Biostatistics under the leadership of Don Saner,

Assistant Chief Knowledge Officer. The following examples are

based on EPIC Clarity, an extensive hospital electronic medical

record system.

Tracy’s team is directly affiliated with the Arizona Health Sciences

Center (AHSC) Biomedical Informatics Core and the Office of the

Senior Vice President of Health Sciences. Our mission is to foster

meaningful translational and applied clinical information

management research. This core service group provides a single

point of service for comprehensive biomedical informatics and

associated analytics.

Our medical research objectives are to extract specific and

targeted patient medical information from the University of

Arizona Medical Center (UA-UMC) EPIC/Clarity medical record

system in support of the research mission set by the office of the

Senior Vice President of Health Sciences (AHSC).

Database Development & ETL Style

Most of the example code has been extracted from much

larger queries to illustrate the use of temporary tables, inner

joins to a table-based function, or inner views embedded in

larger select statements. I often use the ROW_NUMBER()

function to produce row counts over specific columns in order

to isolate duplicate records as an optimization instead of

relying on the ubiquitous DISTINCT qualifier; with that said, I

do use DISTINCT where appropriate. In addition to filtering

duplicates, I’ll employ the use of ROW_NUMBER()and record

partitioning to isolate specific procedure counts to extract

records; for example, three or more hospital inpatient

hemodialysis procedures over a specific time interval. This

example illustrates the use of a temporary table to store a

targeted patient list that is later used for a pharmacology

query.

Page 2: ETL & Database Development

Tracy Steven Brown

The following example is an extension from the

above illustration highlighting basic filtering within

an inner join. This example is a typical ETL case study

where a University of Arizona College of Pharmacy

graduate college researcher is specifically interested

in the efficacy of certain medications in the

treatment of hemodialysis patients while inpatient

at our world class medical center.

Stylistically, you can see my code is extensively

commented, clean, and well organized. This

example also shows the use of the LIKE operator as

well as filtering NULL values from the result set. Also

of note is the use of Boolean logic combinations as

a means to identify and extract the appropriate

result set.

Data Transformation

Record extraction is one aspect of our groups’

mission. We also are responsible for transforming

records from one form into another appropriate

form to support statistical review, readability, or

for loading into a separate data warehouse. The

following short example illustrates one example of

data transformation where information is

extracted from patient laboratory medical records

and displayed in specific columns. Specifically, this

example extracts biopsy information from

pathology reports stored in our hospital

EPIC/Clarity medical record system. The original

data set was extracted from a formalin-fixed and

paraffin-embedded tissue sample dataset and

linked to our medical records data warehouse.

While the full query is several hundred lines of

code, the example shows Tracy’s use of T-SQL

Inner Common Table Expressions (CTE) and more importantly, the technical challenges associated with parsing free-form

text using standard SQL syntax. The combination of pattern indexing, PATINDEX(), in conjunction with character indexing,

CHARINDEX(), yields an extremely effective technique for parsing targeting text from lengthy medical reports and

laboratory notes. Query results show biopsy details as dictated by the Pathologist.

Page 3: ETL & Database Development

Tracy Steven Brown

Support Code and Other Examples

While our ETL group is composed of several individuals,

Tracy has developed several support queries, views, and

functions to help reduce duplicate work effort and to

help team members perform their duties effectively.

The partial function definition to the right provides a

unique set of patient diagnosis records based on ICD-9-

CM and ICD-10-CM data points through the use of

CROSS APPLY and a custom string parsing function. The

function definition is truncated due to space constraints;

however, the style conveys a simple and elegant

approach to database development that is customary

with Tracy’s work.

Tracy has several other support queries including

laboratory records, treatment order records, biological

specimen order records, as well as medication records,

diagnosis records, and patient demographics records.

The next example is a support function Tracy wrote to

separate out comma separated values within column

data or function argument parameters. Its primary use

is to unpack a set of comma separated values into a

table where each row represents a single entry from the

original list.

The code examples and descriptions illustrated in this

document are the work of Tracy Steven Brown and show

only a subset of his work portfolio. The purpose is to

highlight specific techniques and style to help the reader

gain a better idea of the caliber of work you can expect.