custom etl software

3
Tracy Steven Brown Custom Extract, Transform & Load (ETL) Software WinForms/C# ETL Biomedical Informatics Software The following project highlights an example of custom biomedical informatics extraction, transformation & loading (ETL) software designed and developed by Tracy Steven Brown in his Appointed Professional position within the prestigious University of Arizona Center for Biomedical Informatics and Biostatistics under the leadership of Don Saner, Assistant Chief Knowledge Officer. This ETL software – codenamed: Project Casanova, extracts lengthy and complex data from EPIC Clarity, an extensive hospital electronic medical record system; data from EPIC Clarity is transformed by C# with T-SQL and loaded into the University of Arizona Translational Sciences REDCap Research Database via the HTTP REDCap API using JavaScript Object Notation (JSON). REDCap (Research Electronic Data Capture) is a mature, secure web application for building and managing online surveys and databases. Project Casanova uses a set of Microsoft SQL Server User-Defined Functions (UDF) to extract data points based on patient medical record numbers and hospital admission dates. The application user interface uses custom drawn objects via the Windows GDI+ graphics library coupled with traditional third-party Telerik controls. The main Windows UI Thread is unburdened with lengthy database lookups by offloading ETL operations to an asynchronous background thread. Background threads call a progress method within the main UI thread to perform progress meter updates during asynchronous operation. The background ETL worker thread supports cancellation and gracefully terminates asynchronous operations. Cancelled operations and error conditions repaint the main application user interface appropriately notifying the condition to the user. Successful conditions repaint lines and recolor shading with green, error conditions repaint with red, amber highlights events, and purple represents dry-run practice runs. The activity log, located on the bottom of the screen, updates asynchronously while the ETL worker thread increments through a complex set of data manipulation tasks. Text is color coded to aid the user in identifying conditions associated with application progress.

Upload: tracy-brown

Post on 16-Aug-2015

160 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Custom ETL Software

Tracy Steven Brown

Custom Extract, Transform & Load (ETL) Software

WinForms/C# ETL Biomedical Informatics Software

The following project highlights an example of custom

biomedical informatics extraction, transformation &

loading (ETL) software designed and developed by Tracy

Steven Brown in his Appointed Professional position

within the prestigious University of Arizona Center for

Biomedical Informatics and Biostatistics under the

leadership of Don Saner, Assistant Chief Knowledge

Officer.

This ETL software – codenamed: Project Casanova,

extracts lengthy and complex data from EPIC Clarity, an

extensive hospital electronic medical record system; data

from EPIC Clarity is transformed by C# with T-SQL and

loaded into the University of Arizona Translational

Sciences REDCap Research Database via the HTTP REDCap

API using JavaScript Object Notation (JSON). REDCap

(Research Electronic Data Capture) is a mature, secure web application for building and managing online surveys and

databases.

Project Casanova uses a set of Microsoft SQL Server User-Defined Functions (UDF) to extract data points based on patient

medical record numbers and hospital admission dates. The application user interface uses custom drawn objects via the

Windows GDI+ graphics library coupled with traditional third-party Telerik controls. The main Windows UI Thread is

unburdened with lengthy database lookups by offloading ETL operations to an asynchronous background thread.

Background threads call a progress method within the main UI thread to perform progress meter updates during

asynchronous operation.

The background ETL worker thread supports cancellation and

gracefully terminates asynchronous operations. Cancelled

operations and error conditions repaint the main application user

interface appropriately notifying the condition to the user.

Successful conditions repaint lines and recolor shading with green,

error conditions repaint with red, amber highlights events, and

purple represents dry-run practice runs.

The activity log, located on the bottom of the screen, updates

asynchronously while the ETL worker thread increments through a

complex set of data manipulation tasks. Text is color coded to aid

the user in identifying conditions associated with application

progress.

Page 2: Custom ETL Software

Tracy Steven Brown

Database Development & ETL Style

There are several data capture instruments within the REDCap

Electronic Data Capture System – each instrument or form has

a corresponding Microsoft SQL User-Defined Function (UDF).

The WinForms/C# source code iterates through each of the

data instrument SQL functions visually updating a progress

meter and reporting progress within the application log. The

application checks for incomplete status for each instrument

prior to extracting data from EPIC Clarity. After data has been

transformed and loaded, the application sets the instrument

status to “unverified” to ensure that a sanity check is

performed by clinical practitioners.

The UDF syntactic style is based on inner and outer common

table expressions (CTE) – an alternate technique to the use of

inner-views. Although several inner-views are used within the

common table expressions to cleanly extract desired data

points or to perform necessary calculations on extracted data.

The example here is a very small subset of the larger SQL

codebase showcasing an outer apply with an embedded cross

apply wrapped within an inner-view to extract several

important clinical data points. The code shown here also

highlights Tracy’s technique for extracting laboratory values

that most closely match a given event time – in this case, a

blood draw for the University of Arizona Biorepository. For

instance, the ‘DISTANCE’ variable in this example represents

the absolute time for a secondary event occurring before or

after an indicated primary event. Its use coupled with an

ORDER BY and OFFSET FETCH extracts the desired value.

Page 3: Custom ETL Software

Tracy Steven Brown

Software Development – In Design

The development process for Project Casanova

was very organic in nature beginning with a couple

notebook pages with sketches of the potential

product.

While the final product doesn’t reflect the exact

sketch – it does, however, serve to illustrate how

Tracy began the project and his initial thoughts and

plans.

The sketches show the planned C#/WinForms user

interface as well as a C#/WinForms User Control.

Notes are scattered around the illustration as

reminders and ideas for the project.