geoff kimber. realize the importance of learning sql recognize sql platform differences identify...
TRANSCRIPT
SQL ReportingGeoff Kimber
With thanks to Shawn Toy
Objectives
Realize the importance of learning SQL Recognize SQL platform differences Identify relational database concepts Apply SQL syntax to a real-life example Review SQL references
Hierarchical Databases
Data organized in tree structure with parent and child segments
Parent to child is a one to many relationship Implies repeating information, generally in
child segments Fileman
Prescription file with refill multiple field
Relational Databases
Data consists of records stored in tables Each row is unique Column values are of the same data type Sequence of rows and columns is insignificant
What is SQL?
Structured Query Language [SQL] Is an ANSI computer language that is used to
interact with many relational databases▪ Oracle▪ MS-SQL Server▪ MySQL, PostgreSQL and others
SQL is a set-based language SQL is structured like English
▪ SEQUEL (Simple English QUEry Language)
Why Learn SQL?
VA is truly data rich Each site has between 2-4 TB of data†
▪ non-imaging data VA FileManager [FileMan]
Only within VistA Limited functionality for non-programmers Runs in the production environment
▪ Workflow and resource concerns† - Pham R. The VA Data Lifecycle (Internals, Data Flows, and Business Intelligence)
Why Learn SQL?
Data repositories (warehouses) Moves the analysis off-line Provides accessible, yet very secure, data
▪ ODBC, ADO and LINQ Greater functionality and access control
▪ The “official” data is secured within VistA▪ Independent of VistA
▪ Microsoft Active Directory based access▪ Maintain the front-line stance of CPRS/VistA
Why Learn SQL?
Various VA relational databases Corporate Data Warehouse (CDW) Regional Data Warehouse (RDW) VISN Data Warehouse (VDW) Potential for local data warehouses
▪ Extract raw data from VistA off-hours▪ UPSERT the data into an SQL server automatically▪ Users run their own reports and access is secure
Why Learn SQL?
Data warehouses = relational databases SQL = the language of relational databases
Relational databases are everywhere Learn a portable skill Take a systems approach to collaboration Build reusable code Work more efficiently
Why SQL Server?
VA has an enterprise license Microsoft SQL Server is functionally superior
to Microsoft Access Secure Backups Procedural code No database size limit (software) Efficiency
Microsoft Access & SQL Server
Two of the different SQL flavors Transact SQL (Microsoft SQL Server)
▪ Manage databases/services, procedural code Jet SQL (Microsoft Access)
▪ Information retrieval, program component Each program has its own SQL flavor
Core keywords and functions follow standards ANSI guidelines are “considered”
A Set-Based Language
In SQL: Something is either part of a set or it is not Order is meaningless SQL Server supports procedural code
▪ VBA can be used in MS Access for procedures
Corporate data warehouse structure
Dimension tables
Fact tables
Dimension tables
Contain attributes that describe records in fact tables
Each record contains unique identifier Generally more stable
Drug name, provider name, patient name and address▪ Similar to Drug file (#50), New Person file (#200), Patient
file (#2)
Fact tables
Capture operational data Contain multiple unique identifiers
One identifier for each record One identifier for each data element in dimension
tables Similar to Fileman IEN
Fact tables
Usually contain numerical and date fields Prescription fills, Outpatient visits, lab test results Similar to prescription file (#52), V POV
(#9000010.07)
Dimension table examples
Dim.LocalDrug LocalDrugSID Sta3n (3 digit station number) LocalDrugNameWithDose VaClassification PricePerDispenseUnit NationalDrugSID
Dimension table examples
LocalDrugSID LocalDrugNameWithDose
VaClassification
PricePerDispenseUnit
NationalDrugSID
LocalDrugSID
735055 ATENOLOL 50MG TAB CV100 0.0079 2223484 735055
735056 ATENOLOL 100MG TAB CV100 0.0075 2221541 735056
739331 ATENOLOL 25MG TAB CV100 0.0068 2223485 739331
741066 ATENOLOL 12.5MG (1/2 OF 25MG) TAB CV100 0.0042 2223485 741066
748289 STUDY: ATENOLOL TAB IN000 NULL -1 748289
Dimension table examples
Dim.NationalDrug NationalDrugSID DrugNameWithoutDose DosageForm Strength
Dimension table examples
NationalDrugSID DrugNameWithoutDose DosageForm Strength
2221541 ATENOLOL TAB 100
2221542 ATENOLOL INJ 500
2221543 ATENOLOL TAB 50
2221544 ATENOLOL/CHLORTHALIDONE TAB NULL
2223485 ATENOLOL TAB 25
Dimension table examples
sPatient.sPatient PatientSID PatientName PatientSSN AddressLine1 City State ZipCode
Fact table examples
RxOut.RxOutpat RxOutpatSID PatientSID ProviderSID LocalDrugSID IssueDate RxNumber
Fact table examples
Rxout.RxoutpatFill RxoutpatFillSID RxOutpatSID FillType ReleaseDateTime Qty DaysSupply UnitPrice
CDW Tables
Metadata Information about data Includes CDW to VistA field crosswalk http://vaww.dwh.cdw.portal.va.gov/metadata/default.aspx
CDW diagram
Drug file to national drug file
Prescription to prescription fill
Common SQL Clauses
Clauses perform set-based calculationsCLAUSE DefinitionSELECT Defines what you will see. You can identify columns,
calculations, literal text, and even perform branching.
FROM Defines the origin of the data. Tables can be joined.
WHERE Defines data filters. Filters data before it is queried.
GROUP BY Defines how data is grouped. Works hand in hand with aggregate functions, i.e. how many in each group
HAVING Defines data filters. Filters data after the query is run. An example: return only those items having a count > 5.
ORDER BY Defines the field(s) used to sort the data – ASC or DESC
Common SQL Functions
Function DefinitionCOUNT Aggregate Function: Counts the number in a group.
SUM Aggregate Function: Sums the numbers in a group
AVG Aggregate Function: Averages numbers in a group
MIN and MAX Aggregate Function: Finds the minimum and maximum numeric values in a group.
UCASE and LCASE Scalar Function: Returns text in uppercase or lowercase, respectively
LEN Scalar Function: Returns the length of a string
Microsoft Access supports the aggregate functions FIRST and LASTSQL Server has a COUNT(DISTINCT...) aggregate function
Common SQL OperatorsOperator Definition
AND Logical Operator: A and B are both true to equal TRUE
OR Logical Operator: A or B is true to equal TRUE
NOT Logical Operator: Switches TRUE to FALSE and vice versa
=, >, <, >=, <= Equality Operators: equals, greater than, etc.
BETWEEN Equality Operator: TRUE if value is between the two criteria. Values that equal the criteria are included.
Like Equality Operator: TRUE if text contains the pattern defined. LIKE allows for wildcard characters (see below)
+, -, *, / Mathematical Operators
+ or & Concatenation Operators: Join two text strings. (see below)
Access uses “*” as a wildcard and double quotes; SQL Server uses “%” and single quotesAccess uses the ampersand to concatenate “&”, SQL Server the plus sign “+”
Code example
Select LocalDrugSID,NationalDrugSID,LocalDrugNameWithDose
from vdw.dim.localdrugWhere LocalDrugNameWithDose like '%atenolol%'
Code example
Case is not normally important, but helps with readability
Spaces, tabs, and hard returns have limited importance Important in text fields Not important between commands Help improve readability
Microsoft query analyzer color codes key words for readability
Code example
SQL databases have many different data types Varchar Int
▪ Bigint▪ Smallint
Decimal Float Datetime Money
Code example
Select Main command used to retrieve data from SQL
tables Followed by list of fields to return ‘*’ (without quotes) returns all fields in table
▪ Normally, avoid using ‘*’ unless you need all the fields in a table
Code example
From vdw.dim.localdrug Indicates source of data from which you are
selecting Vdw.dim.localdrug
▪ VDW is database name▪ VDW => VISN Data Warehouse▪ RDW => Regional Data Warehouse▪ CDW => Corporate Data Warehouse
▪ Dim is schema▪ Localdrug is table name
Code example
Where Allows you to restrict or filter the table contents to
just the results you want Extremely versatile
Like ‘%’ is a multicharacter wildcard ‘%atenolol%’ - contains ‘atenolol’ ‘atenolol%’ – begins with ‘atenolol’ ‘%atenolol’ – ends with ‘atenolol’
Code example
Case is not normally important, but helps with readability
Spaces, tabs, and hard returns have limited importance Important in text fields Not important between commands Help improve readability
Microsoft query analyzer color codes key words for readability
Code example
Select LocalDrugSID,NationalDrugSID,LocalDrugNameWithDose
from vdw.dim.localdrugWhere localdrugnamewithdose like '%atenolol%'
Code examples - Results
LocalDrugSID NationalDrugSID LocalDrugNameWithDose
735055 2223484 ATENOLOL 50MG TAB
735056 2221541 ATENOLOL 100MG TAB
739331 2223485 ATENOLOL 25MG TAB
741066 2223485 ATENOLOL 12.5MG (1/2 OF 25MG) TAB
1200004446 2223484 STUDY: ATENOLOL 50MG TAB
Fileman equivalent
Select VA FileMan Option: PRINT File Entries
OUTPUT FROM WHAT FILE: DRUG// SORT BY: GENERIC NAME// 'GENERIC
NAME["ATENOLOL" WITHIN GENERIC NAME["ATENOLOL", SORT BY: FIRST PRINT FIELD: NUMBERTHEN PRINT FIELD: PSNDF:NUMBERTHEN PRINT FIELD: GENERIC NAME THEN PRINT FIELD:
Fileman equivalent
DRUG LIST PAGE 1NUMBER NUMBER GENERIC NAME------------------------------------------------------------------------3217 4331 ATENOLOL 25MG TAB4068 4329 ATENOLOL 50MG TAB4069 4328 ATENOLOL 100MG TAB9751 4331 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 12975 4329 STUDY: ATENOLOL 50MG TAB
Joins
Used to connect two or more tables together using data elements that are shared by individual tables
Similar to fileman jumps, with more power Inner joins, outer joins and full joins
Inner Joins
Produces a result that includes only the records that are the same in both tables
Inner Joins
Inner joins – Example
Select a.localdrugnamewithdose, b.strength
from vdw.dim.localdrug as a inner join vdw.dim.nationaldrug b on a.NationalDrugSID = b.NationalDrugSIDwhere LocalDrugNameWithDose like ‘%atenolol
%’
Inner joins - Results
localdrugsid localdrugnamewithdose strength
735055 ATENOLOL 50MG TAB 50
735056 ATENOLOL 100MG TAB 100
739331 ATENOLOL 25MG TAB 25
741066 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 25
Outer joins
Results contain all records in one table and only matching records in another table Left joins – return all results in table specified in
‘from’ statement and all matching tables from ‘joined’ table
Right joins – return all results in table specified in ‘joined’ table ‘from’ statement and all matching tables from ‘from’ statement
Outer joins
Outer joins – example
Select a.localdrugnamewithdose, b.strength
from vdw.dim.localdrug as a left outer join vdw.dim.nationaldrug b on a.NationalDrugSID = b.NationalDrugSIDwhere LocalDrugNameWithDose like ‘%atenolol
%’
Outer joins - example
localdrugsid localdrugnamewithdose strength
735055 ATENOLOL 50MG TAB 50
735056 ATENOLOL 100MG TAB 100
739331 ATENOLOL 25MG TAB 25
741066 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 25
1200004446 STUDY: ATENOLOL 50MG TAB NULL
Null
Indicates lack of data Does *not* mean zero Math operations with null values have null results
▪ 2*null = null▪ 0 * null = null
Text and logic operations on null values return null Several functions exist to deal specifically with null
values
Full joins
Returns all values in both tables
Not particularly applicable in our setting
Full joins
Full join - example
Select a.localdrugnamewithdose ,b.drugnamewithdose , b.strength from vdw.dim.localdrug as a full join vdw.dim.nationaldrug b on a.NationalDrugSID = b.NationalDrugSID where b.DrugNameWithoutDose like
'%atenolol%'
Full join - example
localdrugnamewithdose drugnamewithdose strength
NULL ATENOLOL 25MG TAB,UD 25
ATENOLOL 100MG TAB ATENOLOL 100MG TAB 100
NULL ATENOLOL 500MCG/ML INJ 500
NULL ATENOLOL 50MG TAB,UD 50ATENOLOL 100MG/CHLORTHALIDONE 25MG TAB
ATENOLOL 100MG/CHLORTHALIDONE 25MG TAB NULL
ATENOLOL 50MG/CHLORTHALIDONE 25MG TAB
ATENOLOL 100MG/CHLORTHALIDONE 25MG TAB NULL
NULL ATENOLOL 100MG TAB,UD 100
ATENOLOL 50MG TAB ATENOLOL 50MG TAB 50
ATENOLOL 25MG TAB ATENOLOL 25MG TAB 25
Where to Start Learning
http://www.w3schools.com/sql/ w3schools is an excellent learning resource
http://msdn.microsoft.com Microsoft Developer Network is the “go to” place
for all coding questions for Microsoft SAMS Teach Yourself SQL in 10-Minutes
Excellent starting point (uses older SQL style) SAMS Teach Yourself Microsoft SQL Server T-SQL in 10-
Minutes (good intro for Microsoft SQL Server)
Where to Start Learning
\\vhacdwa10\BIPLTrainingContent\CDW_Data_101 Recorded Live Meetings
http://vaww.dwh.cdw.portal.va.gov/Pages/welcome.aspx CDW Home
CDW Best practices MetaData Relationship Diagrams Additional Training
References Forta, Ben. SAMS Teach Yourself Microsoft SQL Server T-SQL in 10-Minutes. Indianapolis, IN:
SAMS, 2008. Print. Forta, Ben. SAMS Teach Yourself SQL in 10-Minutes – Third Edition. Indianapolis, IN: SAMS,
2004. Print. Kline, Kevin, Hunt, Brand, and Kline, Daniel. SQL in a Nutshell – Third Edition. Cambridge,
MA: O’Reilly Media, 2008. Molinaro, Anthony. SQL Cookbook. Cambridge, MA: O’Reilly Media, 2005. Henderson, Ken. The guru's guide to Transact-SQL. Boston: Addison-Wesley, 2000. Print. Roman Steven. Access Database Design & Programming – Third Edition. Cambridge, MA:
O’Reilly Media, 2002. 11-124. Print Bluttman Ken, and Freeze Wayne S. Access Data Analysis Cookbook. Cambridge, MA:
O’Reilly Media, 2007. 1-106. Print Getz, Ken, Litwin, Paul, and Baron, Andy. Access Cookbook Cambridge, MA: O’Reilly Media,
2002. 1-56. Print Date CJ. SQL and Relational Theory. Cambridge, MA: O’Reilly Media, 2009. http://www.w3schools.com/sql/ http://msdn.microsoft.com
Questions