ssis design pattern

Upload: vermasaketpramod7132

Post on 04-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 SSIS Design Pattern

    1/6

    Design Pattern: Avoiding Transactions

    Design Patterns

    byMatt

    SSIS has built-in support for transactions, but their use isnt ideal for all scenarios. This postcovers how SSIS transactions work at a high level, some of the challenges associated with them,

    and provides some design alternatives.

    Disclaimer: Many SSIS packages use transactions without issue. Using the built-in functionality

    is the easiest approach, and (typically) doesnt require a redesign of your package logic to

    implement. If your packages are currently using transactions, please do not feel the need to go in

    and change anything! If, however, you are looking to improve the performance of your packages,

    or are encountering the issues I describe below, consider using the design patterns presented

    here.

    How SSIS Transactions Work

    SSIS transactions are based on theMicrosoft Distributed Transaction Coordinator(MSDTC).

    They areenabledat the Task/Container level by setting the TransactionOption property to

    Required. For a transaction to work at runtime, MSDTC must be enabled (and configured) onall machines involved in the transaction (i.e. the source machine, the destination, and the

    machine running the SSIS package and of course, in some environments, this might all be thesame machine).

    Challenges

    http://www.mattmasson.com/index.php/category/design-patterns/http://www.mattmasson.com/index.php/category/design-patterns/http://www.mattmasson.com/index.php/author/admin/http://www.mattmasson.com/index.php/author/admin/http://www.mattmasson.com/index.php/author/admin/http://en.wikipedia.org/wiki/Distributed_Transaction_Coordinatorhttp://en.wikipedia.org/wiki/Distributed_Transaction_Coordinatorhttp://en.wikipedia.org/wiki/Distributed_Transaction_Coordinatorhttp://msdn.microsoft.com/en-us/library/ms141144.aspxhttp://msdn.microsoft.com/en-us/library/ms141144.aspxhttp://msdn.microsoft.com/en-us/library/ms141144.aspxhttp://www.mattmasson.com/wp-content/uploads/2011/12/image3.pnghttp://msdn.microsoft.com/en-us/library/ms141144.aspxhttp://en.wikipedia.org/wiki/Distributed_Transaction_Coordinatorhttp://www.mattmasson.com/index.php/author/admin/http://www.mattmasson.com/index.php/category/design-patterns/
  • 7/31/2019 SSIS Design Pattern

    2/6

    Configuration

    Configuring DTC to communicate across hosts and machine boundaries can be tricky. Timeoutsand unexpected termination of connections on long running queries are another common source

    of problems. Finally, the server(s) you are accessing must support DTC this isnt always

    possible when your database is using a non-Windows platform.

    Performance

    Performance is probably the biggest challenge that customers face with SSIS transactions. Using

    any sort of transaction will impact the performance of your packages, but DTC transactions can

    be especially heavy-weight, as they can require coordination between multiple hosts.

    DB Provider

    To enlist in a transaction, the underlying data provider used by your connection manager must

    also support DTC transactions. Microsoft providers (like SQL Native Client and SqlClient)typically support DTC, but the same may not be true for third party providers.

    Transaction Support

    My biggest complaint (and perhaps its something we can address in the future) is that there is no

    way to tell which Tasks actually support transactions. Take the following control flow:

    Were missing some information from the UI:

    We cant tell if transactions are enabled We cant tell where the transaction starts We cant tell if any of the tasks have opted out of the transaction (TransactionOption =

    NotSupported)

  • 7/31/2019 SSIS Design Pattern

    3/6

    To get the information we need, we have to rely on good documentation (i.e. annotations), or

    take a look at the TransactionOption property for each of the tasks/containers in the package.

    Even if we can assume that transactions are enabled at the package level, and none of the tasks

    have opted out, not every Task is able to rollback their work when a transaction fails. Using the

    example above:

    Supports Transactions (rollback)

    Execute SQL Task Analysis Services Execute DDL Task Data Flow Task

    Does Not Support Transactions (rollback)

    XML Task

    File System Task

    The Script Task may or may not support transactions, depending on how it was written.

    Design Alternatives

    Manual Transactions

    A common alternative to using the built-in transactions is to create them yourself. This is done

    by setting the Connection Managers RetainSameConnection property to True, andcreating/committing/rolling back the transaction using Execute SQL Tasks.Jamie Thomson

    blogged about this back in 2005, and is a very common design pattern amongst SSIS developerstoday. When RetainSameConnection is set to True, the connection manager will attempttoreuse an existing connection when a task asks for one (instead of creating a new connection each

    time).

    http://sqlblog.com/blogs/jamie_thomson/http://sqlblog.com/blogs/jamie_thomson/http://sqlblog.com/blogs/jamie_thomson/http://consultingblogs.emc.com/jamiethomson/archive/2005/08/20/SSIS-Nugget_3A00_-RetainSameConnection-property-of-the-OLE-DB-Connection-Manager.aspxhttp://consultingblogs.emc.com/jamiethomson/archive/2005/08/20/SSIS-Nugget_3A00_-RetainSameConnection-property-of-the-OLE-DB-Connection-Manager.aspxhttp://consultingblogs.emc.com/jamiethomson/archive/2005/08/20/SSIS-Nugget_3A00_-RetainSameConnection-property-of-the-OLE-DB-Connection-Manager.aspxhttp://sqlblog.com/blogs/jamie_thomson/
  • 7/31/2019 SSIS Design Pattern

    4/6

    The advantage here is that your transactions become light-weight and perform better, as they

    no longer need to be coordinated across multiple machines using DTC. However, this approachis not a silver bullet, and there are some limitations to what RetainSameConnection can do.

    The behavior ofRetainSameConnection is determined by the underlying providerthere isvery little logic in the Connection Manager itself to handle connection reuse. Not all providers

    will be able to support this functionality (although it should work fine with the SQL Server

    providers). Also note that not all Connection Managers support RetainSameConnection, either.File Connection Managers, for example, simply return a string to the file or directory they are

    pointing to - they dont actually open up a connection object.

    RetainSameConnection might not work if the a connection manager is being used multiple

    times in parallel. You usually encounter this issue if you are trying to SELECT and INSERT at

    the same time. Some common examples:

    1. Running multiple Execute SQL Tasks in parallel using the same connection managero Solution: Use separate connection managers, or run the tasks serially

    2. Both the Source and Destination component in your Data Flow are using the sameConnection Manager

    o Solution: Use a separate connection manager for the source component3. The Destination and another transform (usually Lookup, or Script) are using the same

    Connection Manager

    o Solution: Use a Full Cache Lookup or staging tablesThe last thing to note is that if your data flow is writing to multiple destinations, you will need to

    use multiple connection managers. Be sure to start a separate transaction for each one.

  • 7/31/2019 SSIS Design Pattern

    5/6

    Note: If you are using this pattern and encountering issues, try usingSQL Profilerto detect if

    your queries are in fact using the same SPID.

    Design for Restartability

    Instead of using transactions, consider designing your packages for restartability. This doesntnecessarily mean the use ofCheckpoints(although you can use them if you are so inclined)it

    means that your packages take any corrective actions they need to ensure that they can be safelyrun multiple times. The type of actions you take will depend on your packages logic. Forexample, a package that loads multiple flat files from a directory could move/rename a file once

    it has been successfully processed to avoid loading it a second time if the package needs to be

    restarted. This approach requires you to put more effort into the design of your package, but willgive you a much more robust ETL solution.

    There will be scenarios where the use of transactions is required, but it is usually possible to limittheir scope. This typically involves splitting up your ETL logic across multiple tasks (or

    packages), staging your data in between each step, and enabling transactions only when absolutenecessary. For example, you may create a Data Flow that pulls data from a source, performs the

    required transformations, and then loads the data into a staging table. You can then use anotherData Flow (or an Execute SQL Task if the staging table is area in the destination database) to

    insert/merge the data into the final destination. With this approach, you will only need to use a

    transaction for the final insertion.

    Handling Failed Inserts when using Fast Load

    A useful pattern when designing for restartability is to isolate the specific rows that fail to insert

    into your destination. If you are using Bulk / Fast Load at your destination (which is

    recommended if you are at all concerned with performance) and one of the rows fails to insert, itmay fail the entire batch (the logic depends on the provider and destination you are using). To

    isolate the failed row, you can add a second destination to the error output that does row by row

    insertions (i.e. does not use Bulk / Fast Load). You can then redirect the error output of thesecond destination to a flat file / staging table so it can be manually inspected and corrected.

    http://msdn.microsoft.com/en-us/library/ms173789.aspxhttp://msdn.microsoft.com/en-us/library/ms173789.aspxhttp://msdn.microsoft.com/en-us/library/ms173789.aspxhttp://msdn.microsoft.com/en-us/library/ms140226.aspxhttp://msdn.microsoft.com/en-us/library/ms140226.aspxhttp://msdn.microsoft.com/en-us/library/ms140226.aspxhttp://msdn.microsoft.com/en-us/library/ms140226.aspxhttp://msdn.microsoft.com/en-us/library/ms173789.aspx
  • 7/31/2019 SSIS Design Pattern

    6/6

    Summary

    SSIS has built-in support for transactions, but to get the best performance, you might want to

    consider an alternative design. I highly recommend designing your packages for restartability,and limiting the use and scope of transactions in your packages.

    I hope you found this blog post useful. Id love to hear about specific scenarios where you areusing transactions, and alternative package designs you might have used to implement your own

    form of restartability.