ssis basics (1)

Upload: anhell-awaits

Post on 03-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 SSIS Basics (1)

    1/18

    SSIS Basics: Setting Up Your Initial Package

    When working with databases, the use of SQL Server Integration Services (SSIS) is a skillthat often needs to be acquired quickly, from scratch. Up until now, it has been a curiouslyfrustrating search to find out the basics, fast, in order to get up and running quickly. No

    longer, as Annette comes up with a simple introduction for the rest of us.

    I started using SQL Server Integration Services (SSIS) when I had a job that required me to move andmanipulate data between files and other data sources. I did a bit of research using the resourcesavailableTwitter, Simple-Talk, SQL Server Central, etc.and concluded that SSIS was the right wayto go. I had to get my head around it quite rapidly, but I couldnt find help at the level I required. Forthat reason, I noted the points where I had struggled so that, once Id learned more, I could help otherswho might otherwise struggle as I did.

    In this article, the first of the SSIS Basics series, I go through the basics required for anyone startingout with SSIS, before he or she can venture off into more exotic uses for the tool. In subsequentarticles, well cover such topics as variables, for-each loops, and XML. If youre already a regular SSISuser, this series is not for you!

    What can you use SSIS for?

    Essentially, SSIS can be used for any task related to files or data. You can, for example,

    Move and rename files

    Delete, update or insert data

    Execute SQL scripts or stored procedures

    Import and export data between different file types, such as Access, Excel, or any data sourcethat supports an ODBC connection

    These are, of course, only a few of the tasks you can perform in SSIS. As we work through this series,youll get a better sense of how extensive SSIS actually is.

    Getting Started

    SSIS is available only in SQL Server 2005 onwards. You create and develop SSIS in SQL ServerBusiness Intelligence Development Studio (BIDS), a visual development tool based on Microsoft VisualStudio. (BIDS has morphed into SQL Server Data Tools (SSDT) in SQL Server 2012.)

    Before going further, there is some terminology thats important to understand. SSIS files a reorganized intopackages,projects and solutions. The package is at the bottom of the hierarchy andcontains the tasks necessary to perform the actual extract, transform, and load (ETL) operations. Each

    package is saved as a .dtsx file and is part of a project. You can include one or more packages in aproject. That project, in turn, is part of a solution, which is at the top of the hierarchy. You can includeone or more projects within a solution.

    When you first open BIDS, youre presented with the interface shown in Figure 1.

  • 7/28/2019 SSIS Basics (1)

    2/18

    Figure 1: The SSIS interface in BIDS

    To create an SSIS package, point to the File menu, point toNew, and click Project. This launchestheNewProject dialog box, shown in Figure 2.

    Figure 2: TheNew Project dialog box in BIDS

    In theNewProject dialog box, select the IntegrationServicesProject template. Then,provide a name for the project in theName text box. Next, in the Location text box, specify the folderwhere your project files should be saved, and then provide a name for the solution inthe SolutionName text box.

    After youve entered the project and solution information, click OK. Your new package will open in the

    SSIS window, as shown in Figure 3.

  • 7/28/2019 SSIS Basics (1)

    3/18

    Figure 3: Creating a new SSIS package in BIDS

    Notice that the SSIS interface is divided into the following five sections (windows):

    Control Flow Items: The components necessary to control a packages workflow. Forexample, the section includes components that let you move or copy data, run SQLstatements, or send emails. (The components will be explained in more detail in this articleand in articles that will follow.)

    Connection Managers: The connections to your data sources (whether retrieving orloading data). Your data sources can include SQL Server databases, CSV files, Excelspreadsheets, and a variety of other sources.

    Solution Explorer: A hierarchical view of the data sources, data source views,packages, and other components included within the current solution.

    Properties: The properties and their values for the package or the selected componentwithin that package.

    SSIS Designer: The main working area for developing your SSIS package. SSISDesigner is broken into four tabs: ControlFlow, DataFlow, EventHandlers,and PackageExplorer. Well look at each of these in greater detail as we progress throughthis series.

    Control Flow Items

    In this article, I focus on setting up the SSIS package and defining the data connections. I do not coverall the components in the Control Flow Items window. In the next article, I will demonstrate using,what I think is, the most important of these componentsthe Data Flow Taskand cover othercontrol flow tasks in subsequent articles.

    Connection Managers

    I will now explain how to create connection managers that connect to both Excel files and a SQLServer database. However, it is important to note that any connection created throughthe Connection Manager window is available only to the package it is created in.

    Connecting to an Excel File

  • 7/28/2019 SSIS Basics (1)

    4/18

    One of the first steps youll often take when developing an SSIS package is to create the connectionmanagers necessary to retrieve data from or load data into your data sources. You can also set upconnections on the fly, so if you miss creating one here it can be done as part of other tasks. Thisapproach is most commonly used when you wish to create a connection based on the source. Forexample, if you wish to copy data out of a SQL Server database and export it to an Excel spreadsheet,you can create the connection manager when you set up your data flow.

    To add a connection manager, right-click the blank area in the ConnectionManager window, whereit says Right-click here to add a new connection manager to the SSIS package , asshown in Figure 4.

    Figure 4: Adding a connection manager to your SSIS package

    This will launch a context menu that provides a number of options for creating various types ofconnections, as Figure 5 illustrates.

    Figure 5: Selecting which type of connection manager to create

    Notice you can create connections for such sources as OLE DB, ADO.NET, Analysis Services, anddifferent types of files. In this case, we want to create a connection to an Excel file, so clicktheNewFileConnection option. This will launch the FileConnectionManagerEditor dialogbox, shown in Figure 6.

    Figure 6: The File Connection Manager Editor dialog box

    For this example, well be connecting to an Excel file I created for demonstration purposes. Figure 7shows the worksheet I set up in this file.

    Figure 7: Excel worksheet used for demonstration purposes

    I named the Excel file Employees.xlsx and saved it in the C:\Users\Annette\Documents folder.

  • 7/28/2019 SSIS Basics (1)

    5/18

    In the Usage type drop-down list in the File Connection Manager Editor dialog box,select Existingfile. Next, click the Browse button, navigate to the folder that contains the Excelfile, and select the file. The dialog box should now look like the one shown in Figure 8.

    Figure 8: Configuring the File Connection Manager Editor dialog box

    Once youve selected the file, clickOK. The new connection manager will be added to

    the ConnectionManagers window and will be assigned the name of the file, as shown in Figure 9.

    Figure 9: Viewing the new connection manager in the Connection Managers window

    It is very easy to rename the connection manager to something that may be more appropriate. To doso, right-click the new connection manager and select Rename from the context menu, as show inFigure 10

    Figure 10: Renaming a connection manager

    The name then becomes updateable and you can rename it to whatever you like. In this case, Irenamed the connection managerEmployees (Excel), as shown in Figure 11.

  • 7/28/2019 SSIS Basics (1)

    6/18

    Figure 11: Viewing the new connection manager name

    When you view a connection manager in the ConnectionManagerswindow, youll see that eachconnection type is associated with a different icon. If you created an Excelconnection from here, it isdisplayed with the same icon used for any flat file connection. However if you create an Excelconnection when adding a component to the Data Flow tab, the connection manager will display anExcel Icon.

    Connecting to a SQL Server Table

    Because our example will retrieve data from a SQL Server database, youll also need to create aconnection manager for that database. To do so, you should again right-clickthe ConnectionManagers window to open the context menu, but this time, clicktheNewOLEDBConnection option. The Configure OLE DB Connection Managerdialog boxwill appear, as shown in Figure 12.

    Figure 12: Creating an OLE DB connection manager

    If any OLE DB connections have already been defined on the package, they will appearin Dataconnections list. You can use one of these, if it fits your needs, or you can create a newone. To create a new connection, click theNew button to launch the ConnectionManager dialogbox, shown in Figure 13.

  • 7/28/2019 SSIS Basics (1)

    7/18

    Figure 13: Configuring an OLE DB connection manager

    To configure the connection manager, select the SQL Server instance from the Servername drop-down list, and then select the authentication type. In this case, I selectedthe UseSQLServerAuthentication option and provided a username and password. You mightdecide to select the UseWindowsAuthentication option, in which case your current Windows

    credentials will be used to establish the connection with SQL Server. In a later article, when we look atdeploying the package, we will look at how the connections can be altered at run time and thereforehow the login details can be changed then. For now, ensure that you set up the login the way you needit to run the package while youre developing it.

    From the Select or enter a database name drop-down list, select the name oftheAdventureWorks database. YourConnectionManager dialog box should now look similar tothe one shown in Figure 14.

    Figure 14: Configuring an OLE DB connection manager

    Be sure to click the TestConnection button to verify that you can connect to the target database.The system will display a message similar to the one in Figure 15 to confirm whether youvesuccessfully connected to the database.

    Figure 15: Testing your database connection

    After youve confirmed your connection, clickOK to close the message box, and then click OK to closethe ConnectionManager dialog box. You will be returned totheConfigureOLEDBConnectionManager dialog box, shown in Figure 16.

  • 7/28/2019 SSIS Basics (1)

    8/18

    Figure 16: Finalizing yourOLE DB connection manager

    Notice that your new connection has been added to the Dataconnections list. Click OK to close thedialog box. The ConnectionManagerswindow will show your two connections. Youre now ready tostart working with them.

    Solution Explorer

    Within Solution Explorer, you can view all projects, packages, data sources and data sourceviews that make up the solution.

    Adding New Projects

    If you wish to add an additional project to a package, point to File on the menu bar, point toAdd, andclickNew Project, as shown in Figure 17.

    Figure 17: Adding a new project to a solution

    TheAdd New Project window opens. Select Integration Services Project and in the

    Name box enter the name you wish to call the new project as shown in Figure 18.

  • 7/28/2019 SSIS Basics (1)

    9/18

    Figure 18: Add New Project Wizard

    As you can see in Figure 19 a new project is added to the solution named Dev and will appearin Solution Explorer. The project will contain three empty folders named Data Sources, DataSource Views andMiscellaneous. The project will also contain a folder named SSISPackages and within the folder a file namedPackage.dtsx, whichis an empty SSIS packagecreated automatically when the project is created. Figure 19 shows the new project and its foldersin Solution Explorer.

    Figure 19: The folders and package created in a new SSIS project

    Data Sources

    Earlier I showed you how to create connections in the Connection Managers window. As Imentioned, if a connection is created in the Connection Managers window, it is available only to thepackage it was created in. However, you can also create connections known as data sources, whichare available to all packages in a project.

    To create a new data source, right-click Data Sources in Solution Explorer to openthe Connection Manager dialog box (shown in Figure 20). Then fill in the options as you did when

    you created an OLE DB connection manager. Be sure to click Test Connection to confirm theconnection has been created successfully.

  • 7/28/2019 SSIS Basics (1)

    10/18

    Figure 20: Creating a new data source connection

    The Data Source Wizardwill appear, with the new data connection highlighted, as shown in Figure21. After you review the settings, clickNext.

    Figure 21: The data connection in the Data Source Wizard

    When the next page of the wizard appears, type in a name for the data source. As this is project wide, Iwould recommend you fully describe the source using the server and database name. I have renamedmy data source RGTest_AdventureWorks2008, as shown in Figure 22. I try to set up and followconsistent naming conventions.

  • 7/28/2019 SSIS Basics (1)

    11/18

    Figure 22: Renaming the data source

    After youve renamed the data source, clickFinish. YourData Source should now be listedunderData Sources in Solution Explorer, as shown in Figure 23. Notice that the data source issaved with the .ds file extension to indicate that it is indeed a data source.

    Figure 23: Creating a data source in Solution Explorer

    Initially, the new data source is not listed in your packages Connection Managers window;however, it is available to your package. Once you have made use of the data source in the package itwill be visible in the Connection Managers window.Data Source Views

    Data source views, like data sources, are available to all packages in a project. A data source view isused to define a subset of a data from a data source. The data source view can include only some of

    the tables or it can be used to define relationships or calculated columns.

    Because a data source view is based on a data source, you would normally create the data sourcebefore starting to create the data source view. However, this is not compulsory because you cancreate the data source when youre creating the data source view. To create a data source view, right-click the Data Source Views folder and clickNew Data Source View, as shown in Figure 24.

  • 7/28/2019 SSIS Basics (1)

    12/18

    Figure 24: Creating a data source view in Solution Explorer

    When the Data Source View Wizardappears, clickNext. The next page shows the data sourcesavailable to the project, as shown in Figure 25. (In this case, theres only one.)

    Figure 25: Available data sources

    As you can see, the page shows the name of the data source in the Relational datasources list. The properties for the selected data source appear to the right, in theData sourceproperties window. A data source must be selected before you can continue with the wizard. If you

    havent created the data source you need, you can create one now by clicking the New DataSource button.

    Once youve selected the necessary data source, clickNext. The new page provides a list of the

    tables and views available through the selected data source. If you wish to filter the list, type the filtercriteria in the Filter text box below theAvailable Objects list, and then click the filter icon to theright of the text box. For example, I typed Empin the Filter text box, which reduced the list ofavailable objects to those that contain Emp in their name, as shown in Figure 26.

  • 7/28/2019 SSIS Basics (1)

    13/18

    Figure 26: Filtering tables and views in the data source

    The next step is to determine which tables and views you want to include in your data source view.From the filtered list of tables and views in theAvailable Objects list, select the objects you wantto include. You can select more than one object by clicking the first one, holding down the Ctrl key, andthen clicking the additional objects. Once youve selected the objects, click the single right arrow buttonto move those objects to the Included Objects window. If you want to move all the listed objects,simply click the double right arrow button.

    Once an object has been moved to the Included Objects list, the single left arrow button anddouble left arrow button become active. These work the same as the right arrows. The single left arrowmoves a single selected object or multiple selected objects from the Included objects list back totheAvailable objects list. The double left arrow moves all objects in the Included objects listback to the

    Available objectslist.

    Figure 27 shows the full list of available objects (without filters), minus the two objects that have beenmoved to the Included objects list. Notice that two additional objects are selected intheAvailable objects window. As you would expect, you can move the files to the Includedobjects list by clicking the single right arrow button.

    If you click theAdd Related Tables button beneath the Included objects list, all tables relatedto the objects in the Included objects list will be automatically added.

  • 7/28/2019 SSIS Basics (1)

    14/18

    Figure 27: Adding tables and views to a data source view

    Once all required objects have been selected, clickNext. You can now see a preview of what youhave selected, and you can rename the data source view to something more appropriate. If you havemissed an object, click the Back button to return to the previous page.

    For this example, I renamed my data source viewAW2008-Employees. As youre changing the namein theName textbox, the name is also updated in the Preview window, as shown in Figure 28.

    Figure 28: Renaming the data source view

    If you are happy with the configuration, click Finish. The data source view is saved with the .dsvfileextension and is added to the Data Source Views folder in Solution Explorer. A new windowappears in SSIS Designer and shows the data source view in design mode, as shown in Figure 29.

  • 7/28/2019 SSIS Basics (1)

    15/18

    Figure 29: Data source view in design mode

    Amending a Data Source View

    SSIS provides a number of options for modifying a data source view. Most of those options are at thetable level. If you right-click the table name either on the design surface or in the Tables pane (on theleft side of the screen), you can choose from the following options:

    Adding a calculation

    Adding a relationship

    Replacing a table

    Deleting a table

    Reviewing data

    Deleting an object

    Suppose I added the Store table in error. I can delete from table from my data source view by right-clicking the table name and selecting the Delete table from DSVoption, as shown in Figure 30.

    Figure 30: Deleting a table from a data source view

    Youll then be prompted to confirm your deletion. When theDelete Objects message box appears,click OK, as shown in Figure 31.

  • 7/28/2019 SSIS Basics (1)

    16/18

    Figure 31: Deleting objects from a data source view

    When you click OK, the object is permanently removed from the data source view.

    Adding a new column

    To add a calculated column to a data source view, right-click the table name and selectNew NamedCalculation to open the Create Named Calculation dialog box. Enter the new column name inthe Column name text box, add an appropriate description in the Description text box, if required,and then create the calculation in theExpression text box. For this example, Ive assigned thenameAge to the column and added the description Current Age based on Birth Date. For theexpression, I added the one shown in Figure 32. Note that, at this stage, there is no way to testwhether your code is correct!

    Figure 32: Creating a calculated column

    Figure 33 shows us that theAge column has been added to our table. The icon next to the columnshows that it is a calculated column.

  • 7/28/2019 SSIS Basics (1)

    17/18

    Figure 33: Verifying that the calculated column has been added

    To view the data in the table and verify that the new column has been created correctly, right-click oneof the columns and then click Explore Data,as shown in Figure 34.

    Figure 34: Viewing the data in the table

    The Explore Employee Table window appears,as shown in Figure 35. We can now view all thedata in the Employee table. Notice that theAge column has been added to the table (on the far rightside) and displays the data returned by our expression.

    Figure 35: Viewing data in the Employee table

    Once you have made all the necessary changes, save the data source view. It will then be available foryou to use in any of your packages in the project.

  • 7/28/2019 SSIS Basics (1)

    18/18

    Summary

    In this article, Ive shown you how to create an SSIS package and set up connection mana gers, datasources, and data source views. In the next article, I will show you how to set up a package thatretrieves data from a SQL Server database and loads it into an Excel file. I will also show you how toadd a derived column that calculates the data to be inserted into the file. In addition, I will demonstratehow to run the package.

    In future articles, I plan to show you how to deploy the package so it can be run as part of a scheduledjob or called in other ways. I also plan to cover how to use variables and how they can be passedbetween tasks. I also aim to cover more control flow tasks and data flow components, including thosethat address conditional flow logic and for-each looping logic. There is much much more that can bedone using SSIS, and I hope over the course of this series to cover as much information as possible.