david weston ssis portfolio

12
Name: David Weston Email: [email protected] Phone: (617) 692-0608 Business Intelligence Portfolio SQL Server Integration Services (SSIS)

Upload: dlweston

Post on 24-Dec-2014

650 views

Category:

Technology


1 download

DESCRIPTION

Sample ETL Package

TRANSCRIPT

Page 1: David Weston SSIS Portfolio

Name: David WestonEmail: [email protected]: (617) 692-0608

Business Intelligence Portfolio

SQL Server Integration Services (SSIS)

Page 2: David Weston SSIS Portfolio

Table of ContentsChapter Slide Number

Overview 3

Data Model 4

Sample Package - Timesheet 5

Sample Package – Timesheet: Control Flow 6

Sample Package – Timesheet: Data Flow 7

Master Package 10

Database Maintenance Package 11

SQL Server Agent Job 12

Page 3: David Weston SSIS Portfolio

AllWorks Inc.

• Introduction:

AllWorks currently uses spreadsheets and Oracle data (exported as XML) as part of their systems. They store Employee and Client geography data, along with overhead and job order master data, in spreadsheets. The feed of material purchases comes from an XML file. And finally, the timesheet data comes from csv files

• Project Goals:Based on the file structures of these spreadsheets and xml files, create a normalized (3NF) OLTP database to hold all the data in the source files. Create an SSIS package for each of the tables in the data model, to read in the source files, validate, and load the data. Create a master package to run all the individual packages in the appropriate order. Then create a database maintenance package to backup, shrink and re-index the database after each load. Finally, create and schedule SQL Agent job to run the entire package nightly. Initial run will populate the tables for the first time, and updates will occur nightly.

Page 4: David Weston SSIS Portfolio

Part 1: Create the Data ModelNote: AllWorks currently allows for only one work order project per invoice. They want to allow a single invoice to cover multiple projects. Additionally, when AllWorks receives payment, they want to track how much was received for each project on each invoice. So with a many-to-many relationship between Projects and Invoices it was necessary to create a cross-reference (or bridge) table between the Project and Invoice tables.

Page 5: David Weston SSIS Portfolio

Part 2: Create SSIS packages

Create an SSIS package to load each of the above tables (with the exception of MaterialType which is static and currently contains only three valid material types). Data must be validated as it is loaded and invalid data should be written out to log files. Success and failure emails should be sent upon the completion of each package. The success emails should contain the number of rows updated, the number of rows inserted, and the number of invalid rows.

Sample Package Timesheet

The Timesheet package should loop through all the timesheets in the /time folder loading all the data into the Timesheet table. Once the files are processed they should be moved to the /time/processed folder.

Page 6: David Weston SSIS Portfolio

Timesheet Package: Control Flow

The first step that executes, cleans up any old log files that may remain from prior runs.

Using a ForEach Loop Container, loops through all timesheet files (files have naming convention EmpTime*.csv) and process them one at a time.

Perform script task to accumulate inserted/updated/invalid row totals for all files processed.

Send failure/success email with row counts.

Page 7: David Weston SSIS Portfolio

Timesheet Package: Data Flow

Page 8: David Weston SSIS Portfolio

Timesheet Package: Data Flow

The first steps of the data flow is to read in the source file, convert the data to SQL format, then perform a Lookup Task to determine if the records are updates or inserts.

Update Logic: If the source record corresponds to an existing row, use a Conditional Split task to verify that the record contains changed data (we want to avoid unnecessary updates). Perform a RowCount task to update the UpdatedRows variable (used to accumulate total row counts in control flow). Finally perform an OLE DB Task to update the EmpTime table.

Page 9: David Weston SSIS Portfolio

Timesheet Package: Data Flow

Update Logic: When the incoming records are inserts, validation must be performed on the incoming records. First a Lookup Task will be performed to validate that the ProjectID is valid, then the EmployeeID must also be validated. The final validation ensures that the WorkDate on the timesheet is prior to the Close Date of the Project and that the Project Closed Flag has not been set to true. Once validated, count the records to be inserted (and again assign to a variable to be tallied in the control flow) before inserting to the EmpTime table.

Invalid records are sent to a log file for inspection.

Page 10: David Weston SSIS Portfolio

Master Package

The master package pictured to the left, uses a sequence task to execute each individual package in order, based on data dependencies. Those packages that can run simultaneously have no dependency. When the sequence completes, it executes the Database Maintenance package.

Page 11: David Weston SSIS Portfolio

Timesheet Package: Database Maintenance Package

The final package to execute is the Database Maintenance package. This performs routine cleanup to the database after loading a large number of records. It shrinks the database, rebuilds indexes, updates statistics, and backs up the database. This will ensure that the indexes and internal statistics of the database are most up to date when SQL Server determines execution plans in the future.

Page 12: David Weston SSIS Portfolio

SQL Server Agent Job

Finally: All packages were deployed to SQL Server and a job was created using SQL Server Agent to run the Master Package daily at noon.