ETL

You are currently browsing articles tagged ETL.

When deploying packages to SQL Server Integration Services, it’s advisable to set up a folder structure so that you can easily distinguish packages belonging to different projects.  Furthermore it may be interesting to create subfolders under the main project folder to separate packages according to the different phases in your ETL (Extract, Transform, Load) process.  When loading a data warehouse, interesting folder names are Dimensions for your dimension ETLs and Facts for the packages that load the fact tables.

After a while you end up with lots of packages spread over lots of folders.  To get a good view of what is deployed on your server, it may be interesting to find a way to list all the packages.  And that’s exactly the reason why I’m writing this article.

The query further down generates a list of all packages deployed in the MSDB database on your SQL Server.  What you get is the name of the packages, their location and version-related information.  I’ve also created a  RootFolder column so that it’s easy to filter on project.  (See now why it’s interesting to create separate folders per project?)

It’s important to note that packages deployed to the File System will not be shown in the list.  After all, they are not stored in the MSDB database but in a folder somewhere on the server’s hard drive, more precisely in a subfolder of where you’ve installed your SQL Server.  In case you’ve forgotten where that was, here’s a small tip.  On your server, open up the list of Windows Services (Start > Run > type “services.msc” > enter) and locate the service called SQL Server Integration Services 10.0.  Open the properties of that service and have a look at the Path to executable value in the General tab.  Take the path, drop the \Binn part and add \Packages instead.  That is where, by default, the packages are deployed.  (If you’re running SQL Server 2005, apply the same procedure but look for a service called SQL Server Integration Services.)

On my system, this is where the packages are located: D:\Program Files\Microsoft SQL Server\100\DTS\Packages.  I will also prove it with following screenshot:

Packages deployed to the file system on SSIS 2008

If you’re looking for a way to list the packages deployed to the file system by using a T-SQL statement, check out the following article by Phil Factor: The TSQL of Text Files.

Okay, time for the real stuff, the query:

/*
    DESCRIPTION: Lists all SSIS packages deployed to the MSDB database.
    WRITTEN BY: Valentino Vranken
    VERSION: 1.1
    COPIED FROM: http://blog.hoegaerden.be

    Note: this query was written for SQL Server 2008. For SQL2005:
        o sysssispackagefolders => sysdtspackagefolders90
        o sysssispackages => sysdtspackages90
*/
with ChildFolders
as
(
    select PARENT.parentfolderid, PARENT.folderid, PARENT.foldername,
        cast('' as sysname) as RootFolder,
        cast(PARENT.foldername as varchar(max)) as FullPath,
        0 as Lvl
    from msdb.dbo.sysssispackagefolders PARENT
    where PARENT.parentfolderid is null
    UNION ALL
    select CHILD.parentfolderid, CHILD.folderid, CHILD.foldername,
        case ChildFolders.Lvl
            when 0 then CHILD.foldername
            else ChildFolders.RootFolder
        end as RootFolder,
        cast(ChildFolders.FullPath + '/' + CHILD.foldername as varchar(max))
            as FullPath,
        ChildFolders.Lvl + 1 as Lvl
    from msdb.dbo.sysssispackagefolders CHILD
        inner join ChildFolders on ChildFolders.folderid = CHILD.parentfolderid
)
select F.RootFolder, F.FullPath, P.name as PackageName,
    P.description as PackageDescription, P.packageformat, P.packagetype,
    P.vermajor, P.verminor, P.verbuild, P.vercomments,
    cast(cast(P.packagedata as varbinary(max)) as xml) as PackageData
from ChildFolders F
    inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
order by F.FullPath asc, P.name asc;

The query uses a recursive CTE (Common Table Expression) to get data out of a system table called sysssispackagefolders, located in the MSDB system database.  The CTE gives us a list of all folders stored in the database and at the same time uses the hierarchical structure of the table to build the FullPath and the Lvl columns.

Note: the CAST() calls are needed because the data type of the foldername column is sysname.  And sysname does not implicitly convert to varchar, which is needed for the concatenation building the FullPath column.

The CTE is joined with another system table called sysssispackages, also located in MSDB.  Not all columns are being retrieved from that table but I believe I’ve selected the most important ones.  Have a look in the Books Online for more info on the columns available.

There’s one column however on which I’d like to add some additional info myself.  That column is called packagedata and it contains the actual SSIS package.  The data type of this column is image, not sure why because after all, an SSIS package (or .dtsx file for that matter) is pure XML.  So why isn’t it stored as XML? If anyone knows the reason: post a comment!

Update: since I wrote the above paragraph I’ve come across the answer myself.  The reason that the XML is not stored as xml datatype is because of the overhead that this would cause.  So there you go, use image instead of xml if you’re not going to query the xml structure itself.

Anyway, as you can see in the query, to get it converted from image to XML you need to go through varbinary.  The image datatype cannot convert directly to XML.  See the Books Online here on what casts are allowed: http://msdn.microsoft.com/en-us/library/ms187928.aspx

Note for SQL Server 2005 users: as mentioned in the query’s comments, these tables don’t exist in SQL Server 2005.  Well, actually they do, they just have different names.  See the comment in the code for their equivalent.

To finish off I’ll show you what the results look like when executing the query on my test system.  But first, following screenshot shows all deployed packages as reported by the Management Studio.  As you can see, two packages are deployed to the File System.  These two packages were shown earlier in the first screenshot.  Some other packages have been deployed to the MSDB database.

Object Explorer showing all deployed SSIS packages

And here are the results of the query:

A list of all SSIS packages deployed to my SQL Server 2008 MSDB database

To be honest, I added a little filter to keep the results clean.  The Data Collector, a new feature of SQL Server 2008, also uses some packages so I’ve filtered those out by adding a WHERE clause to the SELECT statement at the bottom of the full query:

select F.RootFolder, F.FullPath, P.name as PackageName,
    P.description as PackageDescription, P.packageformat, P.packagetype,
    P.vermajor, P.verminor, P.verbuild, P.vercomments,
    cast(cast(P.packagedata as varbinary(max)) as xml) as PackageData
from ChildFolders F
    inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
where F.RootFolder <> 'Data Collector'
order by F.FullPath asc, P.name asc;

If you’ve been paying attention, you’ve noticed that the two packages deployed to the File System are not mentioned in the output of the query, as expected.

Now that you know how to list your packages, check out my article on deleting them.

That’s all for now folks, have fun!

References

BOL 2008: Tutorial: Creating a Simple ETL Package

BOL 2008: sysssispackagefolders (Transact-SQL)

BOL 2008: sysssispackages (Transact-SQL)

BOL 2008: Recursive Queries Using Common Table Expressions

Share

Tags: , , , , ,

© 2008-2014 BI: Beer Intelligence? All Rights Reserved