Integration Services

You are currently browsing articles tagged Integration Services.

A while ago I wrote an article about how you can use the SQL Server Import and Export wizard to import Excel data into a SQL Server database.  In this sequel I’m going to show you some problems which you may encounter when using that wizard.  Just like any good old wizard, he’s only as good as his recipes.  If a recipe is missing an ingredient, the resulting potion will probably not behave as expected and before you know it it explodes in your face.

I’ll be using the same Excel file as in my previous article.  In case you’re having some problems understanding the data or locating certain screens mentioned in this article, I recommend you to first read the prequel.

Some Common Wizard Pitfalls And Their Solution

Drop And Re-Create Destination Table

Imagine that for your import process you’re planning to import that Excel data regularly.  So the logical step is to make sure that each time the process runs, it starts from scratch.  Following that, what seems to be an interesting checkbox is located on the Column Mappings screen.  This checkbox is called Drop and re-create destination table.

Column Mappings: Drop and re-create destination table

So you activate that checkbox and execute the package.  However, it ends with an error in the Executing phase:

SQL Server Import and Export Wizard: error while Executing

Here’s what the error details say:

image

Error 0xc002f210: Drop table failed… Cannot drop the table dbo.ProductList$ because it does not exist or you do not have permission.

So why does this error occur?  Let’s examine the SSIS package that was generated.  This is what the Control Flow looks like:

A Control Flow that doesn't take a non-existing table into account

New here is that Drop table SQL Task.  Taking a closer look at the task, here’s the query:

drop table [dbo].[ProductList$]
GO

So the first task is to drop the table.  This flow does not take into account that the table may not yet exist, as is the case here, causing the error.

However, do you notice the blue arrow connecting the Drop table task with the next SQL Task?  That means that it doesn’t require the execution of the task to end successfully, the flow will continue even when an error occurred (unlike when the green connector – meaning Success – is used).

Double-clicking the blue connector gives us the Precedence Constraint Editor window:

Precedence Constraint Editor

Indeed, the value for the constraint is set to Completion.

This also explains why the wizard continued executing all phases, even though an error occurred.  And the end result was that the table is really created, containing the data as expected.

If you’d like to avoid the error and handle the non-existing table, you could replace the query in the Drop table SQL Task with the following one:

if exists
(
    select * from INFORMATION_SCHEMA.TABLES T
    where T.TABLE_NAME = 'ProductList$'
        and T.TABLE_SCHEMA = 'dbo'
)
    drop table [dbo].[ProductList$];

It first checks if the table exists, taking schema into account, and will only perform the DROP TABLE statement if the table actually exists.

Field Length Insufficient

On to the next possible issue.  When examining our Excel sheet we find out that the content of the Class column is either blank or just one character:

Possible values of the Class field

So we decide to change its type to char(1), again using the Column Mappings screen.

Column Mappings: change type to char(1)

However, clicking Finish at the end of the wizard gives us a fatal error:

SQL Server Import and Export Wizard: Operatoin stopped in Error

And these are the details of the error:

    • Error 0xc02020c5: Data Flow Task 1: Data conversion failed while converting column “Class” (63) to column “Class” (139).  The conversion returned status value 4 and status text “Text was truncated or one or more characters had no match in the target code page.”.(SQL Server Import and Export Wizard)
    • Error 0xc020902a: Data Flow Task 1: The “output column “Class” (139)” failed because truncation occurred, and the truncation row disposition on “output column “Class” (139)” specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

      (SQL Server Import and Export Wizard)

    • Error 0xc0047022: Data Flow Task 1: SSIS Error Code DTS_E_PROCESSINPUTFAILED.  The ProcessInput method on component “Data Conversion 0 – 0″ (131) failed with error code 0xC020902A while processing input “Data Conversion Input” (132). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.  There may be error messages posted before this with more information about the failure.

      (SQL Server Import and Export Wizard)

Wow, that’s quite a long list for such a simple change!  Apparently it wasn’t such a good idea to limit the length of this field.  But why does this error occur?  After all, the values in this field are just one character long.  Except, after taking a closer look at the Excel sheet it turns out that these values contain a trailing space!

One way of avoiding this issue is by specifying a higher length for the field.  In this particular case however, you may decide to modify your Data Flow so that it removes the trailing space.  Here’s what the Data Flow currently looks like:

Data Flow to import Excel data

The Data Conversion transformation takes care of converting the Class column into a DT_STR of length 1.

Data Conversion Transformation Editor

We’re going to replace that Data Conversion Transformation with another one, a Derived Column transformation.  This is one of the transformations which I’m using all the time.

Derived Column Transformation Editor

As you can see in the screenshot, I’ve set it up so that it uses the TRIM() function on the Class input column to remove any leading or trailing spaces.  Furthermore I’m using a cast to DT_STR of length 1 to ensure the correct field type.  The resulting column is called Class_STR, to clearly indicate that it has been converted to DT_STR.

This is the resulting Data Flow:

Data Flow: Destination component complains about lineage ID

Looks like we’ve got an issue with our Destination component now.  Which is quite logical: we still need to tell it that it should use the newly-created Class_STR column.  So double-click the component.

 Restore Invalid Column References Editor

Double-clicking the Destination component will open up the Restore Invalid Column References Editor.  In Available Columns, select the new Class_STR column to replace the Class column which was used previously.

That’s it, your SSIS package will now remove the trailing spaces from the Class column and store it in a column of char(1).

Excel Layout Change

Here’s another common issue when dealing with Excel SSIS imports.  If you’re not 100% in control of that Excel sheet, someone will someday make a structural change to it and it will cause your import process to fail.

In this really simple example I’ve opened up the Excel sheet and renamed the Color column to Colour.  Which is something that may happen in real life: a British person takes over the maintenance of that product list and sees that the Color column is spelled the wrong way, and corrects it without informing anyone.

What does that mean for our import?  Here’s the result when manually executing the package using DTExecUI.exe:

DTExecUI: Package Execution Progress

In this case execution will fail because the package cannot find the Color column in the Excel sheet.  More precisely it says VS_NEEDSNEWMETADATA.  This is a really common error when using Integration Services, but you need to manually update the package to handle such changes.

In other words: try to be in control of that sheet as much as you can, and if possible: set up another way to maintain such data.  For instance by using a Master Data Management system.  But that’s stuff for later on, in future articles.

Conclusion

As long as you’re aware of some of the common issues into which you may run, I still think the Import and Export Wizard is an interesting option to start your first SSIS package.  When running into an issue, I recommend to open up the SSIS package and have a closer look through the Business Intelligence Development Studio.  Then make any changes there so you can handle the errors.

Happy importing!

Valentino.

  • Share/Bookmark

Tags: , , ,

In a previous article I’ve shown you how to import data from an Excel sheet using the OPENROWSET() function.  And I concluded by stating that it’s not the best option when automating your data import.

Today I’ll repeat the Excel data import process by using SQL Server Integration Services, also known as SSIS.

I’ll be using SQL Server 2008 R2, but I’m quite sure that the process is very similar to the first release of 2008, and even to 2005.  The Excel file that I will be importing is the one used in my previous article, and I’ll also refer to some parts of that article, so you may want to have a read over that one when something here isn’t clear.

Furthermore I’m using a Windows 7 64-bit machine, with the ACE 14 driver (beta) installed.  To avoid any discussion about versions and for my own (future) reference, here’s the result of a SELECT @@VERSION:

Microsoft SQL Server 2008 R2 (CTP) – 10.50.1352.12 (X64)   Oct 30 2009 18:06:48   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )

Create SSIS Package To Import Excel Data

Usually you will start by creating a new package in an Integration Services project, add an Excel source to a new Data Flow, throw in some Data Flow Transformations and end your flow with an OLE DB Destination connecting to your SQL Server.

But that’s not the approach that I’ll take in this article.  I’ll make use of a shortcut (and meanwhile I’m showing you how well integrated some components really are).

First, I’m creating a new database called ExcelImport, using the Management Studio (aka SSMS).  Once the database is created, right-click on it and choose Tasks > Import Data.

Start the Import Data Wizard through Management Studio

This will open up the SQL Server Import And Export Data Wizard.  Like all good wizards, it starts with a Welcome screen containing an introductory text about its purpose – something about “create simple packages that import and export data between many popular data formats including databases, spreadsheets” – and so on.  It also includes a checkbox that says “Do not show this starting page again”.  That’s my favorite object on the page :-)   Okay, I agree, the page is useful for people who have never seen the wizard before and who may have opened it up by accident, but that’s about as far as its use goes methinks.

So, do whatever you like with the checkbox and click Next.  That opens up the Choose a Data Source screen.

SQL Server Import and Export Wizard - Choose a Data Source

In that screen you have several options in the Data Source dropdown.  The one we’re interested in is called Microsoft Excel.  Once that option is selected, the controls further down the screen change into what is shown in the screenshot.  Select your file and the right version, in my case I’ve got an Excel 2007 file.  If your sheet contains a header row, activate the First row has column names checkbox.

Clicking Next will open up the Choose a Destination screen.

SQL Server Import and Export Wizard - Choose a Destination

In that screen, select Microsoft OLE DB Provider for SQL Server as Destination.  Ensure that your SQL Server instance is the right one in the Server Name dropdown and the Authentication is filled out as expected.  The correct database should be selected by default because we did a right-click on it to start the wizard.

Another Next click opens up the Specify Table Copy or Query window.

SQL Server Import and Export Wizard - Specify Table Copy or Query

Here we can choose between either retrieving the full table – all rows, all columns – or writing a query ourselves.  I’ll go for the first option.

Click Next once more to open the Select Source Tables and Views screen.  That name seems a bit weird in the context of an Excel import but I guess that’s not really important here.  Just read it as “Select the sheet that you’d like to import”.

SQL Server Import and Export Wizard - Select Source Table and Views 

I’ll go for the ProductList$ Source.  The sheet in my Excel file is called ProductList.  Note that the Destination is editable – I’m changing the destination table to dbo.ProductList (with the dollar sign removed).

If you’d like to view your data right now you can hit the Preview button.  It opens up a window such as this one:

SQL Server Import and Export Wizard - Select Source Table and Views - Preview Data 

That should look familiar.

A more interesting button is the one called Edit Mappings.  Clicking that button opens a screen that lets you change the destination table’s schema.  By default, all columns that seem to be numeric are mapped to a type of float and all the others are mapped to nvarchar with a length of 255.  Depending on your situation you can either leave it as it is (in case you’re just loading a staging table and want to handle data conversions later on) or you should review each column (for instance when you’re erasing and loading the full table every night and this is the actual table being used by other processes).

Note: whenever a column contains a blank cell, its type will be nvarchar(255), even when all other values are numeric.  Also, if you don’t need Unicode support, don’t use nvarchar but change it to varchar instead.

Here’s the Column Mappings screen with some of the defaults changed:

The Column Mappings window

Everything that I changed has been indicated using the yellow marker.  I’ve changed some field types and lenghts, made one field non-nullable and adapted the destination name.

Clicking OK followed by Next brings us to the following screen, Review Data Type Mapping.

SQL Server Import and Export Wizard - Review Data Type Mapping

This screen gives another overview of all the columns that we’re planning to import.  Note that each column of which we’ve changed the type has gotten a nice yellow warning sign.  This happens because a data conversion needs to occur, and there’s always something that can go wrong when converting data.  The On Error and On Truncation columns show the action that should happen when such an event occurs.  We’ll leave them all at Use Global.  The Global settings are located at the bottom of the screen and are both set to Fail.  That’s the best option at the moment (the only other one is Ignore but that means you won’t get any notification in case of an error or truncation problem).

After clicking Next we end up at the Save and Run Package window.

SQL Server Import and Export Wizard - Save and Run Package

I have activated the Save SSIS Package checkbox and chose the Do not save sensitive data option.  By default it is saved in SQL Server, which is actually the MSDB.  That’s good because we’re going to examine the contents of the package later on so we want to keep it.

The Run immediately checkbox was activated by default.  This will execute the package in the last step of the wizard.

Another Next click and we’re on the Save SSIS Package screen.

SQL Server Import and Export Wizard - Save SSIS Package

Here we can give our package a decent name, such as ExcelImport.  You can also see the server on which I’m saving the package.

The final Next click brings us to the Complete the Wizard screen, woohoo!

 SQL Server Import and Export Wizard - Complete the Wizard

We can see a short overview of what we’ve configured in the previous steps.  Click Finish to execute and save the package!

SQL Server Import and Export Wizard - The execution was successful

And we’ve successfully executed the package!

Taking A Closer Look At The SSIS Package

When connecting to the Integration Services server (through SSMS for instance), I now have a new package in the root of the MSDB folder.

The ExcelImport SSIS package, stored in MSDB

It’s located there because I chose to save the package to SQL Server.

Adding An Existing Package To An SSIS Project

We’re now going to open it in the Business Intelligence Development Studio (aka BIDS).  So, open the BIDS and create a new SSIS Project (or use an existing one, doesn’t really matter).

Once the project is open, right-click the SSIS Packages folder in the Solution Explorer and select Add Existing Package.

Right-click SSIS Packages folder in Solution Explorer to Add Existing Package

That opens up the Add Copy of Existing Package window.

Sidenote: do you see that SSIS Import and Export Wizard option in the previous screenshot?  That’s right, the wizard that we’ve used extensively in the earlier part of this article can be launched from here as well.

Add Copy of Existing Package

Select SSIS Package Store as location of the package and enter the name of your server in the second dropdown.  Once that is done you can click the button with the ellipsis and select your package under the MSDB node.

Clicking OK will add the package to your project in the Solution Explorer.  Double-click it to open it up.

The Control Flow

In the Control Flow you can see two components: a SQL Task that contains a CREATE TABLE statement and a Data Flow.

Here’s what the CREATE TABLE statement looks like:

CREATE TABLE statement in the Execute SQL Task

As you can see, the table is created using the column names and types just like we configured them through the wizard.

Important to note here is that the Control Flow does not take anything else into account.  For instance, what happens if we execute the package twice?  It will fail because the table already exists!

In case you don’t believe me, just try it out!

Execute SQL Statement failed There is already an object named 'ProductList' in the database. 

The Data Flow

Opening up the Data Flow we see that it contains three components: an Excel source component that uses an Excel Connection Manager to connect to our now well-known Excel sheet, a Data Conversion Transformation to take care of the conversions that we requested and an OLE DB Destination that uses an OLE DB Connection Manager to connect to our SQL Server.

The Data Flow to transfer Excel data into SQL Server

Important to note here is that whenever an issue occurs, such as a conversion problem, the flow will fail.

In production environments, certain events need to be taken into account.  The purpose of this article was just to show you how you can use a wizard to generate an SSIS package for you.  You can now use this package as the basis for a well-developed Excel Import template.

Conclusion

With this article I hope to have shown you how to use Integration Services to import Excel data, and also that the Management Studio knows how to use other SQL Server components, such as SSIS, quite well.

If you’re running into some issues while using the wizard, or you just like reading what I write, check out my follow-up article covering some common pitfalls.

Need to go to sleep now, long drive tomorrow, PASS European Conference in Germany!  I do hope that Adam Saxton will be there because I was planning to see his presentations all day long.  I already read that one of the speakers – Brent Ozar – won’t be able to make it.  Darned ash cloud…  Next time someone starts talking to me about Azure I’ll run away screaming.

Just kidding :-)

Have fun!

Valentino.

References

Microsoft Support: How to import data from Excel to SQL Server

How to: Run the SQL Server Import and Export Wizard

  • Share/Bookmark

Tags: , , , ,

A while ago I posted a query to create a list of all the Integration Services packages deployed to the MSDB.  I am now using that query to take it a step further.

If you’ve been using SSIS for a while you’ve probably noticed that the Management Studio doesn’t like to delete Integration Services folders that are not empty.  In fact, it will politely ask you if you’re sure that you want to delete the folder on which you’ve just selected the “Delete” option through the right-click menu.

Right-click pop-up menu on SSIS folder

I am sure I want to delete this non-empty SSIS folder

So you click the Yes button.  But then it shows you the following message:

SSIS folder ‘FolderWithSubfolders’ contains packages and/or other folders. You must drop these first. (Microsoft SQL Server Native Client 10.0)

Graphically it looks like this:

Object Explorer pop-up: you can't delete SSIS folders that contain packages or other folders

And this message can be really annoying if you’ve got a main folder with, let’s say, five subfolders, and each subfolder contains about 20-30 packages.  If you want to delete this folder you first need to delete each package separately and then delete the five subfolders, and then you can finally delete the main folder.  And all that through the right-click pop-up menu because you can’t just select the object in the Object Explorer and hit the Delete button on the keyboard – it doesn’t have an action on SSIS objects…

So, I wasn’t planning on doing such a job manually and came up with the following stored procedure.

It’s probably a bit long but don’t run away just yet, I will explain what’s going on down below the code, and there are some comments in the code as well.

/*
DESCRIPTION: Deletes all folders and packages under, and including, specified folder.
WRITTEN BY:  Valentino Vranken
CREATED:     2010-02-28
VERSION:     1.0
USAGE:
  -- mind the forward slash
  EXEC dbo.SSIS_RecursiveDeleteFolder '/FolderWithSubfolders'
  -- to delete a subfolder
  EXEC dbo.SSIS_RecursiveDeleteFolder '/FolderWithSubfolders/ASubfolderWithPackages'

COPIED FROM: http://blog.hoegaerden.be

Note 1: folder names are not case-sensitive
Note 2: uses system tables and (undocumented) stored procedures located in MSDB.
Note 3: this code was written for SQL Server 2008. For 2005:
  o sysssispackagefolders -> sysdtspackagefolders90
  o sysssispackages -> sysdtspackages90
  o sp_ssis_deletefolder -> sp_dts_deletefolder
  o sp_ssis_deletepackage -> sp_dts_deletepackage
*/
CREATE PROCEDURE dbo.SSIS_RecursiveDeleteFolder
    @Folder varchar(2000)
AS
BEGIN
    set nocount on;

    declare @foldersToDelete table
    (
        folderid uniqueidentifier,
        Lvl int
    );

    declare @packagesToDelete table
    (
        PackageName sysname,
        folderid uniqueidentifier,
        Lvl int
    );

    --retrieve list of folders to be deleted
    with ChildFolders
    as
    (
        select PARENT.parentfolderid, PARENT.folderid, PARENT.foldername,
            cast('' as sysname) as RootFolder,
            cast(PARENT.foldername as varchar(max)) as FullPath,
            0 as Lvl
        from msdb.dbo.sysssispackagefolders PARENT
        where PARENT.parentfolderid is null
        UNION ALL
        select CHILD.parentfolderid, CHILD.folderid, CHILD.foldername,
            case ChildFolders.Lvl
                when 0 then CHILD.foldername
                else ChildFolders.RootFolder
            end as RootFolder,
            cast(ChildFolders.FullPath + '/' + CHILD.foldername as varchar(max))
                as FullPath,
            ChildFolders.Lvl + 1 as Lvl
        from msdb.dbo.sysssispackagefolders CHILD
            inner join ChildFolders on ChildFolders.folderid = CHILD.parentfolderid
    )
    insert into @foldersToDelete
    select F.folderid, F.Lvl
    from ChildFolders F
    where F.FullPath like @Folder + '%';

    --retrieve list of packages to be deleted
    with ChildFolders
    as
    (
        select PARENT.parentfolderid, PARENT.folderid, PARENT.foldername,
            cast('' as sysname) as RootFolder,
            cast(PARENT.foldername as varchar(max)) as FullPath,
            0 as Lvl
        from msdb.dbo.sysssispackagefolders PARENT
        where PARENT.parentfolderid is null
        UNION ALL
        select CHILD.parentfolderid, CHILD.folderid, CHILD.foldername,
            case ChildFolders.Lvl
                when 0 then CHILD.foldername
                else ChildFolders.RootFolder
            end as RootFolder,
            cast(ChildFolders.FullPath + '/' + CHILD.foldername as varchar(max))
                as FullPath,
            ChildFolders.Lvl + 1 as Lvl
        from msdb.dbo.sysssispackagefolders CHILD
            inner join ChildFolders on ChildFolders.folderid = CHILD.parentfolderid
    )
    insert into @packagesToDelete
    select P.name, F.folderid, F.Lvl
    from ChildFolders F
        inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
    where F.FullPath like @Folder + '%';

    --use cursor to loop over objects to be deleted
    declare objectsToDelete_cursor cursor
    for
        select P.folderid, P.Lvl, P.PackageName, 'P' as ObjectType
        from @packagesToDelete P
        UNION ALL
        select F.folderid, F.Lvl, null, 'F'
        from @foldersToDelete F
        order by Lvl desc, ObjectType desc;

    open objectsToDelete_cursor;

    declare @folderid uniqueidentifier;
    declare @lvl int;
    declare @packageName sysname;
    declare @objectType char;

    fetch next from objectsToDelete_cursor
    into @folderid, @lvl, @packageName, @objectType;

    while @@FETCH_STATUS = 0
    begin
        if @objectType = 'F'
        begin
            print 'exec msdb.dbo.sp_ssis_deletefolder '
                + cast(@folderid as varchar(max));
            exec msdb.dbo.sp_ssis_deletefolder @folderid;
        end
        else
        begin
            print 'exec msdb.dbo.sp_ssis_deletepackage '
                + @packageName + ', ' + cast(@folderid as varchar(max));
            exec msdb.dbo.sp_ssis_deletepackage @packageName, @folderid;
        end

        fetch next from objectsToDelete_cursor
        into @folderid, @lvl, @packageName, @objectType;
    end;

    close objectsToDelete_cursor;
    deallocate objectsToDelete_cursor;
END

Before trying to dismantle this stored procedure, I recommend you to read my previous article on retrieving the list of packages.  That already explains half of the code, if not 75%.

Our mission is to find a way to recursively delete packages and folders contained in a specified folder.  To be able to loop over those objects in the correct order (from the deepest level up until the level of the folder specified), the SP creates two table variables: one to hold all folders under the specified folder (@foldersToDelete) and one to hold the packages under the specified folder, including all subfolders (@packagesToDelete).

Based on those two lists I create a cursor that joins these two together, taking their level and object type into consideration.  That’s important because we first need to delete the packages in the lowest level folder, followed by their containing folder, then move one level up and do the same.

We then use the cursor to loop over the packages and folders and use two undocumented system stored procedures – one for each object type- to delete the package or folder.  These system SPs are located in the MSDB.  Here’s how they are defined:

ALTER PROCEDURE [dbo].[sp_ssis_deletefolder]
  @folderid uniqueidentifier
AS

ALTER PROCEDURE [dbo].[sp_ssis_deletepackage]
  @name sysname,
  @folderid uniqueidentifier
AS

As you can see, the parameters for these procedures are not that complicated.  Both of them expect a uniqueidentifier as identification for the folder.  That’s okay, these IDs are stored in the msdb.dbo.sysssispackagefolders table and retrieved by our queries to create the list of to-be-deleted objects.

Furthermore, the sp_ssis_deletepackage SP expects the name of the package to be deleted.  Not a problem either, those names are obtained from the msdb.dbo.sysssispackages table.

Note for SQL Server 2005 users: this code was written for SQL Server 2008.  The system stored procedures and system tables exist in 2005 as well, but they have different names.  See the comment header of my SP for more details.

So, let’s give it a little test.  Following screenshot shows the setup.  What I will do is use the stored procedure to delete the FolderWithSubfolders folder.  If you’ve been paying close attention, that is the same folder which I tried to delete manually through the Management Studio’s right-click menu (see first screenshot above).

Overview of my deployed folders and packages

After creating the SP, I ran following command:

EXEC dbo.SSIS_RecursiveDeleteFolder '/FolderWithSubfolders'

And that gave me the following output in the Messages pane:

exec msdb.dbo.sp_ssis_deletepackage AnotherPackage, 7F38288D-4370-40A8-80E3-E92283033E4C

exec msdb.dbo.sp_ssis_deletepackage Package, 7F38288D-4370-40A8-80E3-E92283033E4C

exec msdb.dbo.sp_ssis_deletefolder 4102ED59-ED75-4D93-BBAE-0A162447BF02

exec msdb.dbo.sp_ssis_deletefolder 7F38288D-4370-40A8-80E3-E92283033E4C

exec msdb.dbo.sp_ssis_deletefolder C156B436-8C78-4BF9-99F9-5ABFAB10C405

I have deliberately put a couple of print commands in the stored procedure to dump the commands that are actually being executed.  This gives us a good idea of what’s going on.

That’s it for now folks.  Thank you for reading this, and if you found it useful or you’ve got some questions about it: post a comment!

Have fun!

Valentino.

  • Share/Bookmark

Tags: , , , ,

Sometimes my posts are over 20 pages long when pasted into a Word document.  That’s when I call them article, or tutorial.  Other times I post real quickies about little things that have annoyed me in the past, because I had to spend too much time looking for a solution to a certain issue, or just because they are not very obvious and I can image fellow developers doing a search on the internet on that specific subject.

This post is one of the latter.

In a SQL Server Integration Services project, have you ever wondered how on earth you can get files into that Miscellaneous folder?  When you right-click on the folder in the Solution Explorer, nothing happens, no pop-up menu.

Well, the answer is simple, once you know it.

All you need to do is go one level up in the tree and right-click the Project node in the Solution Explorer.  In the menu that appears, select Add > Existing Item….

How to add a file to the SSIS Miscellaneous folder

In the pop-up window, navigate to any file that you’d like to add to the project.  Files of a different type than the usual SSIS files such as .dtsx and .ds are automatically added under the Miscellaneous folder.  When adding files, they will be automatically copied to the SSIS project folder, no matter where they were stored originally.  See, not that complicated… once you know it!

In the following screenshot I’ve added three different file types to the folder, just to prove that it’s working.

Miscellaneous folder with some files added to it

I like using this folder to store files that belong with that particular project.  Examples of those are configuration files, XSLT files for complex XML conversions, and also the SQL scripts that create my databases.  Each time when I make schema changes, I update the scripts.  And as I’m using TFS integration, I can rest assured that I always have a backup of my files.  (At least, assuming the TFS team is doing their job – this is usually beyond my responsibility :-) )

Speaking about TFS, watch out if you use Business Intelligence Development Studio 2008 to connect to Team Foundation Server 2005!  There’s an interesting setting in the Options screen (through the Tools menu), located in the Source Control > Visual Studio Team Foundation Server page.  This setting is called Get latest version of file on check out.

Get latest version of item on check out

However, there’s one caveat!  On TFS2005 it doesn’t do anything!  If you’re not aware of that, you may get an annoying surprise when you’re trying to check in your changes because you may have been working on an outdated version of your package!  And as you probably already know: merging two versions of an SSIS package is, well, what shall I call it, a challenge?

Another setting that helps you to avoid the issue described above is located under Source Control > Environment and is called Get everything when a solution or project is opened.  Activate this setting and each time when you open your project, you’ll get a popup window which allows you to Get the latest version of the files.

Get evrything when a solution or project is opened

That leaves one more possible conflict situation.  If someone changes a package on the same day as you, the second person will need to explicitly do a Get Latest Version or he/she will be working on an outdated version.  So, communicate with your team mates so that you know if someone has gotten an assignment that collides with yours.  Of course, this last problem is just a theoretical possibility.  In teams, work is usually divided so that developers do not need to work with more than one person on the same piece of code.  The same logic applies to SSIS packages.

And remember, have fun!

Valentino.

  • Share/Bookmark

Tags: , , ,

If you’ve got some experience building Data Flows in your Integration Services packages, you probably already know that SSIS has its own representation of the different data types.  The names of these data types start with DT_, followed by the actual type such as BOOL for boolean and STR for string.

Some data types you use every day and others only once in a while.  So some you already know by heart and others you don’t.  A nice example of one that I certainly don’t need to look up anymore is the following cast as used in a Derived Column Transformation: (DT_STR, 100, 1252)YourField.  It converts YourField to a string field of length 100 using the ANSI – Latin 1 code page.

But the reason for this post are the other types, the ones not used daily.  Because each time that I need info on one of those, I find myself ploughing through several BOL pages before finding that page of which I know it exists but where, oh where?!

So, on the following Books Online page you can find a list of all SSIS data types and their definition: Integration Services Data Types

And at the bottom of that same page there’s an interesting table showing the mapping between the SSIS types and those of several RDBMS, including SQL Server of course.

Another interesting page is this one: Working with Data Types in the Data Flow

The bottom half of this page contains a mapping table between the SSIS data types and their corresponding managed type in .NET.  This is especially useful when you’re doing some custom development like a custom data flow component.  But you don’t need to take it that far, even when you’re just using the built-in Script Component task for some complex data conversions, you’ll find this useful.  After all, you’re also programming in a managed .NET language in that component.

Okay, that’s it for now, hopefully this will save you some time searching for those reference pages.  I already know that I’ll be coming back to this post once in a while :-)

  • Share/Bookmark

Tags: ,

When deploying packages to SQL Server Integration Services, it’s advisable to set up a folder structure so that you can easily distinguish packages belonging to different projects.  Furthermore it may be interesting to create subfolders under the main project folder to separate packages according to the different phases in your ETL (Extract, Transform, Load) process.  When loading a data warehouse, interesting folder names are Dimensions for your dimension ETLs and Facts for the packages that load the fact tables.

After a while you end up with lots of packages spread over lots of folders.  To get a good view of what is deployed on your server, it may be interesting to find a way to list all the packages.  And that’s exactly the reason why I’m writing this article.

The query further down generates a list of all packages deployed in the MSDB database on your SQL Server.  What you get is the name of the packages, their location and version-related information.  I’ve also created a  RootFolder column so that it’s easy to filter on project.  (See now why it’s interesting to create separate folders per project?)

It’s important to note that packages deployed to the File System will not be shown in the list.  After all, they are not stored in the MSDB database but in a folder somewhere on the server’s hard drive, more precisely in a subfolder of where you’ve installed your SQL Server.  In case you’ve forgotten where that was, here’s a small tip.  On your server, open up the list of Windows Services (Start > Run > type "services.msc" > enter) and locate the service called SQL Server Integration Services 10.0.  Open the properties of that service and have a look at the Path to executable value in the General tab.  Take the path, drop the \Binn part and add \Packages instead.  That is where, by default, the packages are deployed.  (If you’re running SQL Server 2005, apply the same procedure but look for a service called SQL Server Integration Services.)

On my system, this is where the packages are located: D:\Program Files\Microsoft SQL Server\100\DTS\Packages.  I will also prove it with following screenshot:

Packages deployed to the file system on SSIS 2008

Okay, time for the real stuff, the query:

/*
    DESCRIPTION: Lists all SSIS packages deployed to the MSDB database.
    WRITTEN BY: Valentino Vranken
    VERSION: 1.1
    COPIED FROM: http://blog.hoegaerden.be

    Note: this query was written for SQL Server 2008. For SQL2005:
        o sysssispackagefolders => sysdtspackagefolders90
        o sysssispackages => sysdtspackages90
*/
with ChildFolders
as
(
    select PARENT.parentfolderid, PARENT.folderid, PARENT.foldername,
        cast('' as sysname) as RootFolder,
        cast(PARENT.foldername as varchar(max)) as FullPath,
        0 as Lvl
    from msdb.dbo.sysssispackagefolders PARENT
    where PARENT.parentfolderid is null
    UNION ALL
    select CHILD.parentfolderid, CHILD.folderid, CHILD.foldername,
        case ChildFolders.Lvl
            when 0 then CHILD.foldername
            else ChildFolders.RootFolder
        end as RootFolder,
        cast(ChildFolders.FullPath + '/' + CHILD.foldername as varchar(max))
            as FullPath,
        ChildFolders.Lvl + 1 as Lvl
    from msdb.dbo.sysssispackagefolders CHILD
        inner join ChildFolders on ChildFolders.folderid = CHILD.parentfolderid
)
select F.RootFolder, F.FullPath, P.name as PackageName,
    P.description as PackageDescription, P.packageformat, P.packagetype,
    P.vermajor, P.verminor, P.verbuild, P.vercomments,
    cast(cast(P.packagedata as varbinary(max)) as xml) as PackageData
from ChildFolders F
    inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
order by F.FullPath asc, P.name asc;

The query uses a recursive CTE (Common Table Expression) to get data out of a system table called sysssispackagefolders, located in the MSDB system database.  The CTE gives us a list of all folders stored in the database and at the same time uses the hierarchical structure of the table to build the FullPath and the Lvl columns.

Note: the CAST() calls are needed because the data type of the foldername column is sysname.  And sysname does not implicitly convert to varchar, which is needed for the concatenation building the FullPath column.

The CTE is joined with another system table called sysssispackages, also located in MSDB.  Not all columns are being retrieved from that table but I believe I’ve selected the most important ones.  Have a look in the Books Online for more info on the columns available.

There’s one column however on which I’d like to add some additional info myself.  That column is called packagedata and it contains the actual SSIS package.  The data type of this column is image, not sure why because after all, an SSIS package (or .dtsx file for that matter) is pure XML.  So why isn’t it stored as XML? If anyone knows the reason: post a comment!

Update: since I wrote the above paragraph I’ve come across the answer myself.  The reason that the XML is not stored as xml datatype is because of the overhead that this would cause.  So there you go, use image instead of xml if you’re not going to query the xml structure itself.

Anyway, as you can see in the query, to get it converted from image to XML you need to go through varbinary.  The image datatype cannot convert directly to XML.  See the Books Online here on what casts are allowed: http://msdn.microsoft.com/en-us/library/ms187928.aspx

Note for SQL Server 2005 users: as mentioned in the query’s comments, these tables don’t exist in SQL Server 2005.  Well, actually they do, they just have different names.  See the comment in the code for their equivalent.

To finish off I’ll show you what the results look like when executing the query on my test system.  But first, following screenshot shows all deployed packages as reported by the Management Studio.  As you can see, two packages are deployed to the File System.  These two packages were shown earlier in the first screenshot.  Some other packages have been deployed to the MSDB database.

Object Explorer showing all deployed SSIS packages

And here are the results of the query:

A list of all SSIS packages deployed to my SQL Server 2008 MSDB database

To be honest, I added a little filter to keep the results clean.  The Data Collector, a new feature of SQL Server 2008, also uses some packages so I’ve filtered those out by adding a WHERE clause to the SELECT statement at the bottom of the full query:

select F.RootFolder, F.FullPath, P.name as PackageName,
    P.description as PackageDescription, P.packageformat, P.packagetype,
    P.vermajor, P.verminor, P.verbuild, P.vercomments,
    cast(cast(P.packagedata as varbinary(max)) as xml) as PackageData
from ChildFolders F
    inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
where F.RootFolder <> 'Data Collector'
order by F.FullPath asc, P.name asc;

If you’ve been paying attention, you’ve noticed that the two packages deployed to the File System are not mentioned in the output of the query, as expected.

Now that you know how to list your packages, check out my article on deleting them.

That’s all for now folks, have fun!

References

BOL 2008: Tutorial: Creating a Simple ETL Package

BOL 2008: sysssispackagefolders (Transact-SQL)

BOL 2008: sysssispackages (Transact-SQL)

BOL 2008: Recursive Queries Using Common Table Expressions

  • Share/Bookmark

Tags: , , , , ,

Initially I was going to call this article “Struggling With Collation: The SeQueL”, but it just doesn’t have the same ring to it as “Fun With Strings”.  In that previous article I showed how you might get different results when loading data from a temporary table or table variable and I suggested that one way of solving this is by switching your data type to nvarchar.

Unicode Or Not?

Reason #1 For Not

Today I’m going to show you that nvarchar is not always what we want to use, especially if we don’t need to support Unicode strings.  Imagine a staging scenario when loading a data warehouse.  Often the Business Keys (BK) are strings, and depending on the source system, sometimes very long strings – I’ve seen situations with a combined business key of over 500 bytes!  (You can’t imagine what some data sources look like but that’s another story.)  Do we really want to convert these to Unicode, and thus double their size?  Furthermore, to improve lookups we put indexes on those BKs.  These indexes would double in size as well.  So no, we don’t really want to make these fields Unicode, and certainly not when we want our ETLs to perform as fast as possible.

Reason #2 For Not

That was reason number one why nvarchar is not always the solution.  And here comes reason number two.  In my scenario, the source tables are located in an Oracle database.  And guess what: by default Oracle’s ORDER BY behaves different than SQL Server’s ORDER BY (when using the regular Latin1_General_CI_AS or SQL_Latin1_General_CP1_CI_AS collations)!  By default Oracle uses binary string comparison to sort its data and the reason for it appears to be that that’s the only way to prevent a full table scan.  I’m no Oracle expert but that’s what the documentation states.

Here’s a little demonstration.  The following script prepares a table variable and selects the data from it, sorted ascending.

declare @tbl table( col1 varchar(20));
insert into @tbl select ‘AA’;
insert into @tbl select ‘A’;
insert into @tbl select ‘A-’;
insert into @tbl select ‘A A’;
insert into @tbl select ‘BA’;
insert into @tbl select ‘0′;
insert into @tbl select ‘1′;
insert into @tbl select ‘-0′;
insert into @tbl select ‘-1′;
insert into @tbl select ‘A0′;
insert into @tbl select ‘0A’;
insert into @tbl select ‘-A’;
insert into @tbl select ‘-B’;
insert into @tbl select ‘a’;
insert into @tbl select ‘b’;
insert into @tbl select ‘ ‘;

select * from @tbl
order by col1 asc;

I have executed it once just as stated above (while connected to a database that uses the SQL_Latin1_General_CP1_CI_AS collation) and once more while using nvarchar as data type for the column in the table variable.  The first execution will sort the data using a non-Unicode sorting algorithm, while the second execution will order the data according to the Unicode sorting method.  The results will be shown further below for easier comparison.

On Oracle I performed a similar procedure, as shown in following script.

select cast(‘AA’ as varchar(20)) as col1 from Dual union
select ‘A’ as col1 from Dual union
select ‘A-’ as col1 from Dual union
select ‘A A’ as col1 from Dual union
select ‘BA’ as col1 from Dual union
select ‘0′ as col1 from Dual union
select ‘1′ as col1 from Dual union
select ‘-0′ as col1 from Dual union
select ‘-1′ as col1 from Dual union
select ‘A0′ as col1 from Dual union
select ‘0A’ as col1 from Dual union
select ‘-A’ as col1 from Dual union
select ‘-B’ as col1 from Dual union
select ‘a’ as col1 from Dual union
select ‘b’ as col1 from Dual union
select ‘ ‘ as col1 from Dual
order by col1 asc;

The Oracle script doesn’t use a table variable, it just creates a result set using several select statements with a union in between.  But for our test that doesn’t matter, the results using this method are suitable.

In the table below you can see the result of the three executions.

SQL non-Unicode SQL Unicode Oracle Binary
< space
-0 0 -0
-1 -0 -1
-A 0A -A
-B 1 -B
0 -1 0
0A A 0A
1 a 1
A -A A
a A- A A
A A A A A-
A- A0 A0
A0 AA AA
AA b BA
b -B a
BA BA b

As you can see, they only agree on one thing: space really is the smallest character in my test set!  And that’s not what I want, I want all the data to be sorted consistently, no matter what the source is.

But Why Sorted?

You may wonder why I need to sort the data.  Well, some components in Integration Services expect the incoming data flows to be ordered.  One of the standard components that requires this is the Merge Transformation.  Another (custom!) component is Table Difference.  I could of course add a Sort Transformation to my Data Flow, but that would not be interesting for performance.  I want the data to come from the database server in the expected order.  So now I’ll show you how you can do that.

Taking Control!

SQL Server: ORDER BY … COLLATE …

On SQL Server this was fairly easy.  The ORDER BY clause has a COLLATE part where you can specify what collation should be used to order the data.  Because Oracle sorts its data using a binary algorithm, I’ll tell SQL Server to do that as well.  More precisely I’ll tell SQL Server to use the Latin1_General_BIN collation.  The updated SELECT statement from the T-SQL script above looks like this:

select * from @tbl
order by col1 collate Latin1_General_BIN asc;

Oracle: ORDER BY NLSSORT()

To ensure that results from Oracle are always returned using the same sorting algorithm, I will also tell the Oracle server to sort it’s data using the binary algorithm.

The first way I came up with was to change the NLS_SORT setting on the session.  That can be done by executing the following command before the SELECT statement:

ALTER SESSION SET NLS_SORT=BINARY;

This method is fine when you’re running the queries manually from a client such as Oracle SQL Developer.  However, in SSIS the OLE DB Source component will not accept anything else besides the SELECT statement.

Then I found another way.  There’s a function called NLSSORT() which you can apply to a column in the ORDER BY clause.  The following statement demonstrates how to use this function.  (I only show the ORDER BY clause as it can be applied to the Oracle script mentioned earlier.)

ORDER BY NLSSORT(col1, ‘NLS_SORT=BINARY’)

The following table shows the results from both binary sort queries:

SQL Binary Oracle Binary
-0 -0
-1 -1
-A -A
-B -B
0 0
0A 0A
1 1
A A
A A A A
A- A-
A0 A0
AA AA
BA BA
a a
b b

Finally I am able to get data from both Oracle and SQL Server using a consistent sort order.

But, How Big Is NULL?

However, even on this straightforward request, both database servers do not fully agree!  Here’s what they have to say about the topic:

“NULL is the smallest.”

“No, it’s the largest.”

“No, smallest!”

“Largest!!”

“Smallest.”

“Largest I tell you!!!”

“Bladiebla, not hearing you, anyway, it’s NOTHING!”

“No, it isn’t!”

*discussion goes on and on*

If I add NULL to my test data set, SQL Server will sort it first (thus NULL is the smallest value in my test set), while Oracle will put it last.  In my situation it wasn’t really an issue (the BKs are not supposed to be NULL), but it’s quite important to remember in cases where NULLs are actually possible.

Conclusion

When working with strings, always keep collation in mind.  And even more so when dealing with several different source systems!

Additional reference material:

Database Journal: The Globalization of Language in Oracle – The NLS_COMP and NLS_SORT variables

BOL 2008: How to: Sort Data for the Merge and Merge Join Transformations

  • Share/Bookmark

Tags: , , , , , ,

If you’ve ever needed to load a wide table, for instance one that consists of about 250 columns, you may have come to the conclusion that some manipulations in your SSIS package take forever.  Here’s a quick tip how to speed up your package development.

This tip is actually quite generic and applicable to daily Windows use.  It’s just that sometimes you forget about the most simple things because you’re used to a different way of working.  Why make it easy when difficult works fine, right?  Well, maybe not.

Okay, here it goes: don’t forget that you’ve got a keyboard, do not always use the mouse! The Business Intelligence Development Studio offers a very graphical way of designing your data flow (or control flow for that matter) but sometimes it’s really faster to use the keyboard.

An example.  Imagine the Derived Column Transformation.  To avoid typographical errors you want to select a column name from the tree view on the left in the Derived Column Transformation Editor.  And then you discover that the list of column names is not ordered alphabetically, and on top of that, there’s not way to rearrange them!  Good luck finding your column.

Derived Column Transformation Editor

And then you remember that you’ve got a keyboard.  By typing the first letters of the column name, the selected item will jump down to the first match.  Isn’t that a great invention?!  In the screenshot above I typed “pu” and it selected PurchaseOrderNumber, which is about 32 columns down from the top.

This tip not only works in our Derived Column Transformation but in any screen where a similar tree view appears, such as in the “Inputs and Outputs Properties” tab of the Advanced Editor of an OLE DB Source:

Advanced Editor of OLE DB Source

This can be useful if you need to tell the component that you’ve ordered the data on certain columns.  And why on earth would you want to do that, you may ask yourself.  Well, there are certain components that expect the data to be ordered, such as the Merge item.

If you apply this tip whenever you need to locate something in a tree view, you’ll find that your live has become just a bit easier.

And if you’re not satisfied with the way of working as described above, there’s always the XML!  That’s right, just right-click on your package in the Solution Explorer and select View Code.

Context menu on package in Solution Explorer

Then you’ll discover that a .dtsx file is made up completely of XML.  If you spend some time examining the XML code, you’ll probably find what you’re looking for.

As an example let’s say we want to locate our PurchaseOrderNumber from earlier and change its SortKeyPosition property to 1 (which means the data coming in is ordered by PurchaseOrderNumber).  So I open the XML and scroll down a bit.  I recognize a component which I’ve named “OLE_SRC Wide Table”:

<component id=”1″ name=”OLE_SRC Wide Table”

I also recognize my SELECT statement.  Scrolling down a bit further I see the

<outputs>

followed by a bunch of outputColumn items.  I put the cursor at the <outputs> line and hit CTRL-F.  Then I type “pur” and hit ENTER.

Find and Replace window in dtsx XML

The selection jumps to the column that we’re after.  I close the Find and Replace window and scroll to the right to find the sortKeyPosition property and I update its value to 1.

To confirm that this is working I open up the Advanced Editor again and locate our column.  SortKeyPosition is set to 1!

SortKeyPosition has been updated through XML

I hope that with this I showed you that sometimes it’s interesting to think about what you’re actually doing and try to find a way to do it better, and faster.

  • Share/Bookmark

Tags: , , , ,

Imagine an Integration Services package where you need to load data from an Oracle database.  Typical for ETL you try to use a timestamp field in the source table to determine if there were any changes since last load.  Let’s say that field is called LAST_MODIFICATION.  The query that you’re using to load the data is parameterized and the value passed in is the highest timestamp of the previous Load (or should I say Extract?).  So your query may look something like this:

SELECT * FROM THE_SCHEMA.THE_TABLE
WHERE LAST_MODIFICATION > to_date(?, ‘yyyy/mm/dd hh:mi:ss’)

In this query the question mark gets replaced with a datetime value, for instance ‘2009/08/27 21:26:32’.  (Yeah, I know, you shouldn’t select star but that’s not the point of this post.)

When running your package you suddenly get the following error:

ORA-01849: hour must be between 1 and 12

After some cursing and trial and error, you decide to have a closer look at Oracle’s to_date() function.  And there is your answer: HH returns the same as HH12, an hour between 1 and 12.  Which is not what we want here, we want hours starting at 1 and ending at 23.  So we should use HH24 instead.  A correct query would be:

SELECT * FROM THE_SCHEMA.THE_TABLE
WHERE LAST_MODIFICATION > to_date(?, ‘yyyy/mm/dd hh24:mi:ss’)

Another lesson learned… (and noted in case I need it again :-) )

  • Share/Bookmark

Tags: , , ,

If you’re using Integration Services 2008 and the Foreach Loop Container in the Control Flow, you’ll very likely encounter this bug.

The Foreach Loop has several enumerators available.  By default it selects the Foreach File Enumerator.  However, as the screenshot below shows, there’s no way to configure it – the Enumerator configuration group just shows blank space.

Foreach Loop Editor with empty Enumerator configuration

This phenomenon is caused by a bug which has been reported on Microsoft Connect.  The workaround, if you really need the File Enumerator, is to select another enumerator first and then switch back to the File Enumerator.  You’ll notice that the regular controls show up and on you go, define that folder name!

  • Share/Bookmark

Tags: , , , ,

« Older entries

© 2008-2010 A Developer's Blog All Rights Reserved