labkey server etl workshop labkey software friday september 20, 2013 1
TRANSCRIPT
![Page 1: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/1.jpg)
LabKey Server ETL Workshop
LabKey SoftwareFriday September 20, 2013
1
![Page 2: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/2.jpg)
2
Understand basic workings of LabKey Server Administrator & developer views
Know how to use LabKey’s Query capability Build a module to extend LabKey
Update data model with incremental scripts Expose data & metadata to LabKey Server
Learn ETL Options Run ETLs Create Simple ETLs
Objectives
![Page 3: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/3.jpg)
3
Alternate talking & doing Using Amazon-hosted VMs running LabKey
Server + SQL Server Run via Remote Desktop Everyone has VM with full admin rights Everyone has own SQL Server instance
Workshop not one-way training
Course format
![Page 4: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/4.jpg)
4
Never done this before Probably “bugs” in course material
The code is fresh Code from LabKey “trunk” Basic ETL Services in Place Extending over next few months
Keeping fingers crossed for reliable wifi
Caveats
![Page 5: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/5.jpg)
5
About LabKey Server Getting Connected LabKey Folder Setup Data in LabKey LabKey SQL Database & Module Architecture Building a Module ETL in Modules Q & A https://hosted.labkey.com/project/ETLTraining/begin.view?
Agenda
![Page 6: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/6.jpg)
LabKey Server
Labkey
File System 2 SAS Share
Data 1
Data 2
File SystemLabKey Database
(PostgreSQL, MS SQL)
LabKey Schemas
More Schemas
OracleMS SQL
DatabaseMy SQL
LabKey ServerModular, Java-based
Web App
Nelson et al., LabKey Server: An open source platform for scientific data integration, analysis and collaboration
![Page 7: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/7.jpg)
7
See instructions on getting to your server at Amazon Should connect via Remote Desktop You can use SQL Management Studio to get direct
access to database Full admin gives you power to break anything
Won’t be true in FHCRC environment
Getting Connected
![Page 8: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/8.jpg)
8
Start server with icon on desktop Production installs use a Windows Service
Use web-browser on remote desktop machine You’ll connect to http://localhost:8080/labkey
Set up a site administrator password Server will “upgrade itself”
Run SQL Scripts to initialize modules We’ll go over this process later when you build your own
modules
Starting The Server
![Page 9: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/9.jpg)
9
Site is an server administration level Connectivity to resources, site wide groups
Projects are top-level folders Add groups, customized interfaces
Subfolders secure subsets of data Physically each container is a row in a database with a GUID
Other tables often have “container” column Try the tutorial
Basic Organization and Security
![Page 10: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/10.jpg)
Data Connectivity in LabKey
A relational data store designed for scientists Built on a robust SQL database Property and vocabulary service Secure SQL query service Data grid for exploring data File sharing and linking
10
Relational DB
LabKey Query Service
UI, ETL orCustom
Application
SQL Query or Table + Column List
API Layer
Translated SQL
LabKey Server
![Page 11: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/11.jpg)
11
LabKey data model terminology
Tabular data: data in the form of rows and columns Schema: a named collection of related tables and queries Metadata: information about the data contained in a tabular
data set, including field names, types, formats, links Query: a named, saved SQL SELECT statement written in
LabKey SQL, can be parameterized Custom grid view
Subset of query functionality (field list, sort, filter) Intended for UI definition (not defined in SQL) Can do implicit joins via lookups
![Page 12: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/12.jpg)
12
Tutorial: Data Analysis
Import a spreadsheet into a list Explore the data grid view of the list
Sort Filter Paging
Create a scatter plot of the data View the plot over subsets of the data Change the ARVRegimen field to be a lookup
![Page 13: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/13.jpg)
Lookups in LabKey Server
“Lookup” is special field type A field in one table whose values consist of key values from
another table Target: the table whose key values are kept in the lookup Title field: attribute of the target, specifies the field of the
target that will be displayed in place of the key values contained in the lookup
In SQL terms, known as a single-column FOREIGN KEY Always many-to-one or one-to-one from lookup field table
to target
13
![Page 14: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/14.jpg)
14
Display more meaningful data values Allow users to explore data without writing SQL To constrain user input to a fixed set of choices Allow updating display values in one place Add expression columns to base data sets
Uses of lookups
![Page 15: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/15.jpg)
Configuring fields
15
The Field Editor is the main UI for configuring field-level properties For developer-defined tables, data is supplied in XML
![Page 16: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/16.jpg)
16
LabKey allows folks to write SQL But they don’t get access to the underlying database
Within any folder, the available schemas can be browsed
Create new Queries Equivalent to database views
SQL In LabKey
![Page 17: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/17.jpg)
Query Schema Browser
17
![Page 18: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/18.jpg)
New Query
18
![Page 19: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/19.jpg)
Query Web Part
19
![Page 20: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/20.jpg)
20
Full SELECT Syntax Update/Insert/Merge accessible via ETL pipeline, APIs, UI
Easy lookup syntax replaces JOIN in many cases Use || for string concat (like Oracle, PostgreSQL) PIVOT Queries GROUP_CONCAT PARAMETERS
LabKey SQL vs MS SQL
![Page 21: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/21.jpg)
21
Joins Group_Concat – All visits for a patient PIVOT – one column for each visit
Queries to Try
![Page 22: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/22.jpg)
22
LabKey Server is Based on Modules Look in Admin->Folder Management->FolderType Each module can provide
HTML Views Javascript/CSS LabKey SQL Queries
Enables easy movement of sets of queries between servers ETL Definitions Reports in R and JavaScript Database level schema definition
Only run at restart so DBAs can approve XML to add metadata to database schema
Java code
LabKey Modules
![Page 23: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/23.jpg)
23
See tutorial
Building first Module
![Page 24: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/24.jpg)
24
Provenance For every row in HIDRA_Prime, know when & how it got there
Auditing For every row that leaves HIDRA_Prime, know when & how it
left Down to individual patient info History of all runs Clear packaging & deployment
Re-invent the axle, but not the wheel… Use Stored Procs (coming soon) Wrap existing ETL Frameworks
ETLs: Why In LabKey
![Page 25: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/25.jpg)
25
Still under development Basic functionality is in place
Query based ETLs Checkers (identify whether work is to be done) Scheduling Logging all output
LabKey ETL Infrastructure
![Page 26: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/26.jpg)
26
User Interface Management User Interface
Scheduling Lists of Transform Runs Detail views
ETL Creation Stored Procedure-based ETLs Support for external ETL packages yet (SSIS, Kettle)
Still Not Done
![Page 27: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/27.jpg)
27
Change identification Initiation Query Transformation Staging Load/Merge Finalize
ETL Steps (from Design Spec)
![Page 28: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/28.jpg)
28
ETLs are defined in etls directory of a module Each ETL is an XML file
Each ETL consists of a set of Transform Steps Key Components of a Transform
Source Query (LabKey SQL for now) Destination Table
May be in unrelated database Filter Strategy
Identifies rows to transform & if there is work to do Schedule
ETL Basics
![Page 29: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/29.jpg)
29
Choose which rows to move to target table SelectAllFilterStrategy
Just get all the data, every time ModifiedSinceFilterStrategy
Rows with a DateTime column newer than last run Records most recent value
RunFilterStrategy Based on Incrementing Integer Value (e.g. Run ID) Any rows with higher value than last time are transferred Useful for rows written by previous ETLs
But can “forget” previous runs and re-run from scratch “Reset State” in the UI
Filter Strategies
![Page 30: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/30.jpg)
30
How to add data to target table truncate
Delete all rows and add the selected ones append
Add new rows to the target table Will fail if duplicate primary keys
merge Update or Insert Matches Primary Keys
Target Options
![Page 31: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/31.jpg)
31
<?xml version="1.0" encoding="UTF-8"?><etl xmlns="http://labkey.org/etl/xml"> <name>Overwrite</name> <description>Replaces target with source query.</description> <transforms> <transform id="1hour"> <source schemaName="external" queryName="etl_source" /> <destination schemaName="patient" queryName="etl_target" targetOption=”truncate"/> </transform> </transforms> <incrementalFilter className=”SelectAllFilterStrategy” />
<schedule><poll interval="1h"></poll></schedule></etl>
Overwrite Full Table Every Hour
![Page 32: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/32.jpg)
32
<?xml version="1.0" encoding="UTF-8"?><etl xmlns="http://labkey.org/etl/xml"> <name>Overwrite</name> <description>Replaces target with source query.</description> <transforms> <transform id="1hour"> <source schemaName="external" queryName="etl_source" /> <destination schemaName="patient" queryName="etl_target" targetOption=”merge"/> </transform> </transforms> <incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="Date" /> <schedule><poll interval="1h"></poll></schedule></etl>
Merge Changed Rows
![Page 33: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/33.jpg)
33
Couple of key tables in the dataintegration schema TransformConfiguration
One row for each ETL Controls whether ETL is active Quick access to state of last run
TransformRun Stores information about every transform Success or Failure Total # of rows transferred
Pipeline Detailed log of steps
Storing ETL Information
![Page 34: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/34.jpg)
34
Try an Early HIDRA ETL
![Page 35: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/35.jpg)
35
Enable hidra and hidra_uw_intake
![Page 36: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/36.jpg)
36
Amalga_Import has some Data
![Page 37: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/37.jpg)
37
Let’s Try a Transform
![Page 38: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/38.jpg)
38
<?xml version="1.0" encoding="UTF-8"?><etl xmlns="http://labkey.org/etl/xml"> <name>Amalga to hidraPrime - Patients</name> <description>Move uw_patient, uw_patientidentifier, uw_encounter from Amalga to hidraPrime</description> <transforms>
<transform id="patient"> <source schemaName="AmalgaImport_queries" queryName="uw_patient" timestampColumnName="updtDtTm" /> <destination schemaName="hidraPrime" queryName="Patient" targetOption="merge"/> </transform>
<transform id="patientidentifier_mrn"> <source schemaName="AmalgaImport_queries" queryName="uw_patientidentifier_mrn" timestampColumnName="lastUpdateTime"/> <destination schemaName="hidraPrime" queryName="PatientIdentifier" targetOption="merge"/> </transform>
<transform id="patientidentifier_epi"> <source schemaName="AmalgaImport_queries" queryName="uw_patientidentifier_epi" timestampColumnName="lastUpdateTime" /> <destination schemaName="hidraPrime" queryName="PatientIdentifier" targetOption="merge"/> </transform>
<transform id="encounter"> <source schemaName="AmalgaImport_queries" queryName="uw_encounter" timestampColumnName="lastUpdateTime" /> <destination schemaName="hidraPrime" queryName="Encounter" targetOption="merge"/> </transform>
</transforms>
<incrementalFilter className="ModifiedSinceFilterStrategy" timestampColumnName="lastUpdateTime" /></etl>
Files in: C:\LabKey\modules\hidra_uw_intake
A look inside
![Page 39: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/39.jpg)
39
SELECT
(SELECT OID FROM AmalgaImport_azAEID.AEID204 WHERE AEID204.EIDForOID=UW_PID601.EIDForOID) as GPID,
LName AS LastName, FName as FirstName, MName as MiddleName, MotherMaidenName AS MaidenNameMother, DOB, Sex AS Gender, Language AS PrimaryLanguage, PatientAlias, Race, Street1 AS AddressLine1, Street2 AS AddressLine2,…
FROM AmalgaImport_azADT.UW_PID601
Patient Query
![Page 40: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/40.jpg)
40
Nothing Happens Change some Data in
Amalga_Import.azADT.UW_PID601 Remember to update updtDtTm field
Now try again
Run Again
![Page 41: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/41.jpg)
41
![Page 42: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/42.jpg)
42
Researchers often have data in existing relational databases LIMS systems Clinical data Locally-developed applications
LabKey Server offers two mechanisms to incorporate this data Define an external schema connection (link) Use Extract, Transform and Load support (copy)
Data in external databases
![Page 43: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/43.jpg)
43
LabKey Server consists of many separate modules Server modules usually contain SQL scripts to create
the database objects used by the module CREATE or ALTER, TABLES and VIEWs in native syntax Schema usually specific to a module Supported DBs: PostgreSQL and Microsoft SQL Server Script runner figures out which scripts needed for upgrade
Database tables and LabKey Server modules
![Page 44: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/44.jpg)
44
After install or upgrade, the SQL sent to the database Mostly SELECTs and 1-row UPDATE/INSERT/DELETE SELECTS can be issued by a user or an application in
LabKey SQL LabKey translates into the back-end database dialect
![Page 45: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/45.jpg)
45
Provides a way to link from LabKey Server to another data source to make LabKey’s functions and Client API to work directly on the external data
LabKey translates its own SQL into the dialect of the external schema. Supported databases include Oracle, SAS, and MySQL in addition to
Postgres and SQL Server Options:
Make only some tables exposed to LabKey Read only or read/write Implement folder-based security if a containerId is included Add additional metadata (example field display properties) via an XML
file
External schemas and data sources
![Page 46: LabKey Server ETL Workshop LabKey Software Friday September 20, 2013 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649e105503460f94afac4c/html5/thumbnails/46.jpg)
Files Proteomics Flow
Fold
er 1
Fold
er 2
Tabular data rows and files are visible in folders46
Folders, files and tabular data