instrumenting, monitoring and auditing of ssis etl solutions sql bits 2009 - manchester davide mauri...
TRANSCRIPT
Instrumenting, Monitoring and Auditing of SSIS ETL Solutions
SQL Bits 2009 - Manchester
Davide Mauri
EXEC sp_help ‘Davide Mauri’
• MCDBA, MCAD, MCT• Microsoft SQL Server MVP• Works with SQL Server from 6.5• Works on BI from 2003• President of UGISS (Italian SQL Server UG)• Mentor @ Solid Quality Mentors
– Italian Subsidiary
3
Agenda
• ETL Story• Logging SSIS in MS way• Workarounds• Logging SSIS in MY way
– The developer’s corner
• Adding value to log data
© 2007 Solid Quality Mentors
ETL Story
• ETL process grows in complexity
• Since package won’t be run from BIDS in production you need something to help to understand– What went wrong when package didn’t worked as
expected• And maybe this happens only at nighttime…
– Monitor the performance of your package to forecast its ability to stay within a given timeframe
5
Logging SSIS in MS way
• Flexibility– You can decide what and where to log to– You have a lot of ready-to-be-used log providers– Is available out-of-the-box
• Well, you have to remember to activate it
Nice things
6
Logging SSIS in MS way
• Logging needs to be set up within the package– If you need to change logging you need to edit the
package in VS
• Too few information given– No Variables values– No Expressions results– No Information on Data Flow– Cannot handle very well chains of packages (>=2)
• Problems using Parent Package Variables to propagate logging configuration (eg: log file path)
• You just lose information when you have more than 2 packages in the chain
Not-so-nice things
7
Logging SSIS in MS way
• DTExec seems to allow to control logging at runtime– Unfortunately you need to have a properly configured
connection manager in advance
Things that don’t work as expected
8
Logging SSIS in MS way
• Some new features added with SQL 2008– SQLDumper
• Too much detailed information on one hand, and again to few on the other
“Improved” things from 2005 to 2008
9
Logging SSIS in MS way
• Doesn’t really offer an help to understand what’s went wrong– To few information given
• Hey, I’d like to log also Data Flow! I really have to do everything by hand?– This can take a lot of time!
• I want to change my logged data. How can I do it without have to open the package in BIDS and release-test-deploy it?– You can’t!
Conclusions
10
Logging SSIS in MS way
• Use specific task (Script, Custom or Execute SQL) before and after each task you want to instrument
• Create an event-handler for each event you want to log (es: PreExecute, PostExecute)– Better if then you use a tool to create SSIS templates
and standardize them• Like MDDE (Metadata-driven ETL)
– http://www.codeplex.com/SQLServerMDDEStudio/
Workarounds
11
Logging SSIS in MY way
• Basically I’d like to have all the information that BIDS give you, but outside BIDS.
• Now, if BIDS can, WE can – No magic here, just need to know the APIs!
• Just a little bit complex…but we’ll simplify things here
• The key is the Execute method of Package class– In particular the overloads that takes the IDtsEvent
interface parameter• Whose documentation is not very rich
Learn from BIDS
13
Logging SSIS in MY way
• IDtsEvents is implemented by the base class DefaultEvents
• We have to create a custom event handler class deriving from DefaultEvents and then override all default event handlers
• Use an instance of the newly created class as a parameter for the Execute method on Package object– Now all events will be intercepted by our Event
Handler!
Developer’s corner
14
Logging SSIS in MY way
• The event handlers methods can call a custom method to log data– Beware! SSIS runtime make heavy use of threads– We have to deal with the fact that our class is used by
different thread at the same time. • We have to be sure that race conditions cannot occur
• We have to be fast to avoid to impact too much on performances– Log the minimum for all event except errors– Log everything we can for error
• They should never happen
Developer’s corner
15
Logging SSIS in MY way
• All containers will raise events
• Inside each event handler method we can access to all runtime information for that container– Variables– Connections– Configurations– Properties
• And their expressions
Developer’s corner
16
Logging SSIS in MY way
• Variables: use the Variables collection available in each container
• Connections: use ConnectionManagers collection available in Package class
• Configurations: use Configuration collection in Package class– The EnableConfiguration property also tells you if a
Package will try to look for “default” configurations
Developer’s corner
17
Logging SSIS in MY way
• Extracting properties is a bit tricky…– First we have to ask to the container its properties
through the Properties collection of the IDTSPropertiesProvider interface
– For each property we have to call the GetValue on the Property passing the object from which this property come from as a parameter (!!!)
Developer’s corner
18
Logging SSIS in MY way
• Now, for Control Flow, we’re done. What about Data Flows?
• No specific native logging infrastructure...but BIDS is able to show us how may rows flows between components– …so these information are available somewhere!
• DataFlow is able to generate events through the FireCustomEvent method
Developer’s corner
19
Logging SSIS in MY way
• Custom events are described by the EventInfo class– Every container has an EventInfos property (a
collection of EventInfo)
• The key event here is the “OnPipelineRowsSent” data flow custom event– Here we have an array of objects that contains
interesting things
• For this event the array contains 8 entries
Developer’s corner
20
Logging SSIS in MY way
• OnPipilineRowSent payload– Source Object (eg: System.__ComObject)– DataFlow Object ID (eg: 140)– DataFlow Object Name (eg: OLE DB Source Output)– Object ID (eg: 134)– Object Name (eg: TransformationName)– Input Object Id (eg: 135)– Input Object Name (eg: Derived Column Input)– Row Count (eg: 744)
• Not documented in EventInfo
Developer’s corner
21
Logging SSIS in MY way
• So, filtering on Custom Events we’re able to profile the entire DataFlow!– On buffer basis
• We can also count how many times a DataFlow has been invoked when placed into a For..Loop or For..Each container– Together with the knowledge of variables values this
provide us information the impact of each iteration
Developer’s corner
22
Logging SSIS in MY way
• The result is DTLoggedExec– Current version 0.2.1.5 beta
• Log everything needed– Package version– Variables values– Properties’ Expressions– Profile Dataflow
DTLoggedExec
24
Logging SSIS in MY way
• Additional Features– Handle long package chains correctly– Supports the majority of DTExec options– Pluggable architecture
• Easy to create custom Log Providers• In future will also be able to add custom Data Flows Profilers
• Supported platforms– Every platforms & architectures are supported
• 2005, 2008• X86, X64, IA64
DTLoggedExec
25
Logging SSIS in MY way
• Profiled data from DataFlows packages can be huge…better to put it into a database
• With DTLoggedExec comes a full set of scripts and batch to create a specific database and to bulk load data– Actually only data profiled from DataFlows can be
imported– In near future also data from CSV log provider will
have its place here• 99% done, testing in progress
DTLoggedExec DB
27
Logging SSIS in MY way
• Control Flow– Performance are affected by the amount of logging
you decide to have
• Data Flow– Impact of performing dataflow profiling: < 5%
• DTLoggedExec can be improved to have even less impact if needed– Better buffering
Performances ?
29
Logging SSIS in MY way
• DTLoggedExec is under Creative Commons license– Anyone can contribute
• Official homepage– http://dtloggedexec.davidemauri.it– Wiki with documentation
• Download, source code, issues and forum– http://dtloggedexec.codeplex.com/
Support
30
Adding value
• “Native” Auditing– When, who and how a row has been imported in my
DWH?
• Performance monitoring of a single package– Or Dataflow
• Performance monitoring over time• Easy to monitor discarded rows
– Very useful in dashboard
• Monitor SLA
31
References
• DTLoggedExec– http://dtloggedexec.davidemauri.it
• Jamie Thomson, “Custom Logging Using Event Handlers”– http://blogs.conchango.com/jamiethomson/archive/20
05/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx
• Andy Leonard, “ETL Instrumentation”– http://sqlblog.com/blogs/andy_leonard/archive/tags/E
TL+Instrumentation/default.aspx
34