instrumenting, monitoring and auditing of ssis etl solutions sql bits 2009 - manchester davide mauri...

35
Instrumenting, Monitoring and Auditing of SSIS ETL Solutions SQL Bits 2009 - Manchester Davide Mauri [email protected]

Upload: michael-henry

Post on 29-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Instrumenting, Monitoring and Auditing of SSIS ETL Solutions

SQL Bits 2009 - Manchester

Davide Mauri

[email protected]

EXEC sp_help ‘Davide Mauri’

• MCDBA, MCAD, MCT• Microsoft SQL Server MVP• Works with SQL Server from 6.5• Works on BI from 2003• President of UGISS (Italian SQL Server UG)• Mentor @ Solid Quality Mentors

– Italian Subsidiary

3

Agenda

• ETL Story• Logging SSIS in MS way• Workarounds• Logging SSIS in MY way

– The developer’s corner

• Adding value to log data

© 2007 Solid Quality Mentors

ETL Story

4

ETL Story

• ETL process grows in complexity

• Since package won’t be run from BIDS in production you need something to help to understand– What went wrong when package didn’t worked as

expected• And maybe this happens only at nighttime…

– Monitor the performance of your package to forecast its ability to stay within a given timeframe

5

Logging SSIS in MS way

• Flexibility– You can decide what and where to log to– You have a lot of ready-to-be-used log providers– Is available out-of-the-box

• Well, you have to remember to activate it

Nice things

6

Logging SSIS in MS way

• Logging needs to be set up within the package– If you need to change logging you need to edit the

package in VS

• Too few information given– No Variables values– No Expressions results– No Information on Data Flow– Cannot handle very well chains of packages (>=2)

• Problems using Parent Package Variables to propagate logging configuration (eg: log file path)

• You just lose information when you have more than 2 packages in the chain

Not-so-nice things

7

Logging SSIS in MS way

• DTExec seems to allow to control logging at runtime– Unfortunately you need to have a properly configured

connection manager in advance

Things that don’t work as expected

8

Logging SSIS in MS way

• Some new features added with SQL 2008– SQLDumper

• Too much detailed information on one hand, and again to few on the other

“Improved” things from 2005 to 2008

9

Logging SSIS in MS way

• Doesn’t really offer an help to understand what’s went wrong– To few information given

• Hey, I’d like to log also Data Flow! I really have to do everything by hand?– This can take a lot of time!

• I want to change my logged data. How can I do it without have to open the package in BIDS and release-test-deploy it?– You can’t!

Conclusions

10

Logging SSIS in MS way

• Use specific task (Script, Custom or Execute SQL) before and after each task you want to instrument

• Create an event-handler for each event you want to log (es: PreExecute, PostExecute)– Better if then you use a tool to create SSIS templates

and standardize them• Like MDDE (Metadata-driven ETL)

– http://www.codeplex.com/SQLServerMDDEStudio/

Workarounds

11

Logging SSIS in MS way

DEMO 1 . The usual way

12

Logging SSIS in MY way

• Basically I’d like to have all the information that BIDS give you, but outside BIDS.

• Now, if BIDS can, WE can – No magic here, just need to know the APIs!

• Just a little bit complex…but we’ll simplify things here

• The key is the Execute method of Package class– In particular the overloads that takes the IDtsEvent

interface parameter• Whose documentation is not very rich

Learn from BIDS

13

Logging SSIS in MY way

• IDtsEvents is implemented by the base class DefaultEvents

• We have to create a custom event handler class deriving from DefaultEvents and then override all default event handlers

• Use an instance of the newly created class as a parameter for the Execute method on Package object– Now all events will be intercepted by our Event

Handler!

Developer’s corner

14

Logging SSIS in MY way

• The event handlers methods can call a custom method to log data– Beware! SSIS runtime make heavy use of threads– We have to deal with the fact that our class is used by

different thread at the same time. • We have to be sure that race conditions cannot occur

• We have to be fast to avoid to impact too much on performances– Log the minimum for all event except errors– Log everything we can for error

• They should never happen

Developer’s corner

15

Logging SSIS in MY way

• All containers will raise events

• Inside each event handler method we can access to all runtime information for that container– Variables– Connections– Configurations– Properties

• And their expressions

Developer’s corner

16

Logging SSIS in MY way

• Variables: use the Variables collection available in each container

• Connections: use ConnectionManagers collection available in Package class

• Configurations: use Configuration collection in Package class– The EnableConfiguration property also tells you if a

Package will try to look for “default” configurations

Developer’s corner

17

Logging SSIS in MY way

• Extracting properties is a bit tricky…– First we have to ask to the container its properties

through the Properties collection of the IDTSPropertiesProvider interface

– For each property we have to call the GetValue on the Property passing the object from which this property come from as a parameter (!!!)

Developer’s corner

18

Logging SSIS in MY way

• Now, for Control Flow, we’re done. What about Data Flows?

• No specific native logging infrastructure...but BIDS is able to show us how may rows flows between components– …so these information are available somewhere!

• DataFlow is able to generate events through the FireCustomEvent method

Developer’s corner

19

Logging SSIS in MY way

• Custom events are described by the EventInfo class– Every container has an EventInfos property (a

collection of EventInfo)

• The key event here is the “OnPipelineRowsSent” data flow custom event– Here we have an array of objects that contains

interesting things

• For this event the array contains 8 entries

Developer’s corner

20

Logging SSIS in MY way

• OnPipilineRowSent payload– Source Object (eg: System.__ComObject)– DataFlow Object ID (eg: 140)– DataFlow Object Name (eg: OLE DB Source Output)– Object ID (eg: 134)– Object Name (eg: TransformationName)– Input Object Id (eg: 135)– Input Object Name (eg: Derived Column Input)– Row Count (eg: 744)

• Not documented in EventInfo

Developer’s corner

21

Logging SSIS in MY way

• So, filtering on Custom Events we’re able to profile the entire DataFlow!– On buffer basis

• We can also count how many times a DataFlow has been invoked when placed into a For..Loop or For..Each container– Together with the knowledge of variables values this

provide us information the impact of each iteration

Developer’s corner

22

Logging SSIS in MY way

DEMO2 . Show me the code!

23

Logging SSIS in MY way

• The result is DTLoggedExec– Current version 0.2.1.5 beta

• Log everything needed– Package version– Variables values– Properties’ Expressions– Profile Dataflow

DTLoggedExec

24

Logging SSIS in MY way

• Additional Features– Handle long package chains correctly– Supports the majority of DTExec options– Pluggable architecture

• Easy to create custom Log Providers• In future will also be able to add custom Data Flows Profilers

• Supported platforms– Every platforms & architectures are supported

• 2005, 2008• X86, X64, IA64

DTLoggedExec

25

Logging SSIS in MY way

DEMO3 . Test it!

26

Logging SSIS in MY way

• Profiled data from DataFlows packages can be huge…better to put it into a database

• With DTLoggedExec comes a full set of scripts and batch to create a specific database and to bulk load data– Actually only data profiled from DataFlows can be

imported– In near future also data from CSV log provider will

have its place here• 99% done, testing in progress

DTLoggedExec DB

27

Logging SSIS in MY way

DEMO4 . Load profiled data

28

Logging SSIS in MY way

• Control Flow– Performance are affected by the amount of logging

you decide to have

• Data Flow– Impact of performing dataflow profiling: < 5%

• DTLoggedExec can be improved to have even less impact if needed– Better buffering

Performances ?

29

Logging SSIS in MY way

• DTLoggedExec is under Creative Commons license– Anyone can contribute

• Official homepage– http://dtloggedexec.davidemauri.it– Wiki with documentation

• Download, source code, issues and forum– http://dtloggedexec.codeplex.com/

Support

30

Adding value

• “Native” Auditing– When, who and how a row has been imported in my

DWH?

• Performance monitoring of a single package– Or Dataflow

• Performance monitoring over time• Easy to monitor discarded rows

– Very useful in dashboard

• Monitor SLA

31

Logging SSIS in MY way

DEMO5 . Adding value

32

Question & AnswersDTLoggedExec

References

• DTLoggedExec– http://dtloggedexec.davidemauri.it

• Jamie Thomson, “Custom Logging Using Event Handlers”– http://blogs.conchango.com/jamiethomson/archive/20

05/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx

• Andy Leonard, “ETL Instrumentation”– http://sqlblog.com/blogs/andy_leonard/archive/tags/E

TL+Instrumentation/default.aspx

34

Thanks! DTLoggedExec

35