T-SQL

You are currently browsing the archive for the T-SQL category.

I usually don’t write posts just to mention a link to another site.  Except when I’ve come across articles which are so good that I want everyone to know about them.  Here are a couple of articles in that particular category.

What Does Microsoft Do With Those Memory Dumps?

The first one is written by Adam W. Saxton, a member of the Microsoft Customer Service and Support (CSS) SQL Server Escalation Services team.

The subject of the post is an issue that developers of Reporting Services reports may come across: “InvalidReportParameterException with Data Driven Subscription”.  The report complains about an invalid parameter while the parameters seem to be okay, visually.  Then the author goes on to describe the actual issue: trailing spaces!  If you’ve been doing BI for a while, this is a quite common issue.  Adam also shows the difference between the LEN() and the DATALENGTH() functions.

But that’s not the reason that I’m mentioning that article here.  The best part starts following the Summary chapter, and is entitled Techie Details.  In that extra chapter he shows how the CSS team uses those crash memory dumps which I’m sure you’ve all seen now and then. 

I won’t give you any details about that, just have a look at the article.  If you’re a developer, there’s some very valuable information there!

I am definitely looking forward to seeing his pre-conference session at the SQL PASS European Conference in Germany this year: Tackling Top Reporting Services Issues!

Some Crazy Code Samples

The second article that I’d like to mention here is written by Phil Factor, a regular at the Simple-Talk site.

This article is called Laying out SQL Code and mentions some database naming conventions and T-SQL coding layout, as the title already implies.  Even if you’re not interested in that (although as a serious developer you should be!!), the article is worth the effort of reading just for its code samples.

Here’s my favorite one:

CREATE TABLE "╚╦╩╗" ( "└┬┴┐" nvarchar(10))
DECLARE @ nvarchar(10)  set @='═'
INSERT  INTO "╚╦╩╗"
        ( "└┬┴┐" )
        SELECT  replicate(@,5)

SELECT  *
FROM    "╚╦╩╗"

And just to prove to you that this really is valid code, here’s the result:

Result of crazy query

So, how about that database for that plumbing company? :-)

BTW: when running that code without paying attention to the details (such as what DB your SSMS is connected to), you may end up with something like this:

Silly table stored in master DB - that's not a best practice! 

So you may want to clean up after running the query using this statement:

drop table "╚╦╩╗"

Have fun!

Share

Tags: , , , ,

When deploying packages to SQL Server Integration Services, it’s advisable to set up a folder structure so that you can easily distinguish packages belonging to different projects.  Furthermore it may be interesting to create subfolders under the main project folder to separate packages according to the different phases in your ETL (Extract, Transform, Load) process.  When loading a data warehouse, interesting folder names are Dimensions for your dimension ETLs and Facts for the packages that load the fact tables.

After a while you end up with lots of packages spread over lots of folders.  To get a good view of what is deployed on your server, it may be interesting to find a way to list all the packages.  And that’s exactly the reason why I’m writing this article.

The query further down generates a list of all packages deployed in the MSDB database on your SQL Server.  What you get is the name of the packages, their location and version-related information.  I’ve also created a  RootFolder column so that it’s easy to filter on project.  (See now why it’s interesting to create separate folders per project?)

It’s important to note that packages deployed to the File System will not be shown in the list.  After all, they are not stored in the MSDB database but in a folder somewhere on the server’s hard drive, more precisely in a subfolder of where you’ve installed your SQL Server.  In case you’ve forgotten where that was, here’s a small tip.  On your server, open up the list of Windows Services (Start > Run > type “services.msc” > enter) and locate the service called SQL Server Integration Services 10.0.  Open the properties of that service and have a look at the Path to executable value in the General tab.  Take the path, drop the \Binn part and add \Packages instead.  That is where, by default, the packages are deployed.  (If you’re running SQL Server 2005, apply the same procedure but look for a service called SQL Server Integration Services.)

On my system, this is where the packages are located: D:\Program Files\Microsoft SQL Server\100\DTS\Packages.  I will also prove it with following screenshot:

Packages deployed to the file system on SSIS 2008

If you’re looking for a way to list the packages deployed to the file system by using a T-SQL statement, check out the following article by Phil Factor: The TSQL of Text Files.

Okay, time for the real stuff, the query:

/*
    DESCRIPTION: Lists all SSIS packages deployed to the MSDB database.
    WRITTEN BY: Valentino Vranken
    VERSION: 1.1
    COPIED FROM: http://blog.hoegaerden.be

    Note: this query was written for SQL Server 2008. For SQL2005:
        o sysssispackagefolders => sysdtspackagefolders90
        o sysssispackages => sysdtspackages90
*/
with ChildFolders
as
(
    select PARENT.parentfolderid, PARENT.folderid, PARENT.foldername,
        cast('' as sysname) as RootFolder,
        cast(PARENT.foldername as varchar(max)) as FullPath,
        0 as Lvl
    from msdb.dbo.sysssispackagefolders PARENT
    where PARENT.parentfolderid is null
    UNION ALL
    select CHILD.parentfolderid, CHILD.folderid, CHILD.foldername,
        case ChildFolders.Lvl
            when 0 then CHILD.foldername
            else ChildFolders.RootFolder
        end as RootFolder,
        cast(ChildFolders.FullPath + '/' + CHILD.foldername as varchar(max))
            as FullPath,
        ChildFolders.Lvl + 1 as Lvl
    from msdb.dbo.sysssispackagefolders CHILD
        inner join ChildFolders on ChildFolders.folderid = CHILD.parentfolderid
)
select F.RootFolder, F.FullPath, P.name as PackageName,
    P.description as PackageDescription, P.packageformat, P.packagetype,
    P.vermajor, P.verminor, P.verbuild, P.vercomments,
    cast(cast(P.packagedata as varbinary(max)) as xml) as PackageData
from ChildFolders F
    inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
order by F.FullPath asc, P.name asc;

The query uses a recursive CTE (Common Table Expression) to get data out of a system table called sysssispackagefolders, located in the MSDB system database.  The CTE gives us a list of all folders stored in the database and at the same time uses the hierarchical structure of the table to build the FullPath and the Lvl columns.

Note: the CAST() calls are needed because the data type of the foldername column is sysname.  And sysname does not implicitly convert to varchar, which is needed for the concatenation building the FullPath column.

The CTE is joined with another system table called sysssispackages, also located in MSDB.  Not all columns are being retrieved from that table but I believe I’ve selected the most important ones.  Have a look in the Books Online for more info on the columns available.

There’s one column however on which I’d like to add some additional info myself.  That column is called packagedata and it contains the actual SSIS package.  The data type of this column is image, not sure why because after all, an SSIS package (or .dtsx file for that matter) is pure XML.  So why isn’t it stored as XML? If anyone knows the reason: post a comment!

Update: since I wrote the above paragraph I’ve come across the answer myself.  The reason that the XML is not stored as xml datatype is because of the overhead that this would cause.  So there you go, use image instead of xml if you’re not going to query the xml structure itself.

Anyway, as you can see in the query, to get it converted from image to XML you need to go through varbinary.  The image datatype cannot convert directly to XML.  See the Books Online here on what casts are allowed: http://msdn.microsoft.com/en-us/library/ms187928.aspx

Note for SQL Server 2005 users: as mentioned in the query’s comments, these tables don’t exist in SQL Server 2005.  Well, actually they do, they just have different names.  See the comment in the code for their equivalent.

To finish off I’ll show you what the results look like when executing the query on my test system.  But first, following screenshot shows all deployed packages as reported by the Management Studio.  As you can see, two packages are deployed to the File System.  These two packages were shown earlier in the first screenshot.  Some other packages have been deployed to the MSDB database.

Object Explorer showing all deployed SSIS packages

And here are the results of the query:

A list of all SSIS packages deployed to my SQL Server 2008 MSDB database

To be honest, I added a little filter to keep the results clean.  The Data Collector, a new feature of SQL Server 2008, also uses some packages so I’ve filtered those out by adding a WHERE clause to the SELECT statement at the bottom of the full query:

select F.RootFolder, F.FullPath, P.name as PackageName,
    P.description as PackageDescription, P.packageformat, P.packagetype,
    P.vermajor, P.verminor, P.verbuild, P.vercomments,
    cast(cast(P.packagedata as varbinary(max)) as xml) as PackageData
from ChildFolders F
    inner join msdb.dbo.sysssispackages P on P.folderid = F.folderid
where F.RootFolder <> 'Data Collector'
order by F.FullPath asc, P.name asc;

If you’ve been paying attention, you’ve noticed that the two packages deployed to the File System are not mentioned in the output of the query, as expected.

Now that you know how to list your packages, check out my article on deleting them.

That’s all for now folks, have fun!

References

BOL 2008: Tutorial: Creating a Simple ETL Package

BOL 2008: sysssispackagefolders (Transact-SQL)

BOL 2008: sysssispackages (Transact-SQL)

BOL 2008: Recursive Queries Using Common Table Expressions

Share

Tags: , , , , ,

After having successfully survived the festivities, here I am again, ready for another year of blogging, article writing, forum answering, conference attending, …

Happy New Year to anyone reading this!!

In this year’s first post I’d like to share my preferred way of adding leading zeroes to a number, useful for any situation where you need to convert a number to a string using SQL.  In fact, this method can be used whenever a certain string needs to get increased to a specific length using a particular character, not just numbers.

And here’s the code sample:

declare @number int = 42;
declare @character char(1) = '0';
declare @expectedLength int = 8;

select REPLICATE(@character, @expectedLength - LEN(@number)) + CAST(@number as varchar(8)) as Result;

 

I’m using the REPLICATE() function to create a string of our leading character, in this case a zero.  The length of this string of zeroes is the expected length minus the length of the actual number.  The LEN() function is used to get the length of our number – note that LEN() expects a string as parameter so an implicit cast from int to varchar is being performed here.

Then I use the + operator to concatenate the string of zeroes with our actual number.  This number gets converted to varchar using the CAST() function.  The length of the varchar should be equal to the maximum length that our output string can be.  This way we ensure that all acceptable values are being converted to string.  In the case that our number is longer than the expected output string, the result will be NULL.

And here’s the result:

Result of sample query to add leading zeroes to a number

Have fun in 2010!

Share

Tags: ,

This post is meant partly as a “reminder to self” but also to show that, if you need to generate a list of dates, all you need are numbers.  Oh well, if that doesn’t make sense: read on!

One of the undocumented tables of SQL Server is called spt_values and it is located in the master database.  This table serves as a lookup table for SQL Server itself, more precisely it’s used by several system stored procedures.

Being a lookup table, it doesn’t consist of many columns.  One of the most important columns though is called type.  This one defines the meaning of the record.  An interesting value for this column is “LNG”.  When filtering on “LNG” it returns the following:

List of languages with their LCID

I’m quite sure that SharePoint developers recognize these values stored in the number column.  That’s the locale identifier of the language mentioned in the name column.  Could be interesting in case you need a list of languages to be shown somewhere, in a report parameter for instance.

But anyway, these records are not the point of this blog post so let’s move on.  The title of this post says we’re going to generate some dates out of numbers.  So to get started, what we need are those numbers.  Guess what the spt_values table contains?  Indeed, when filtering the name column on “P”, what we get are numbers starting at zero and ending at 2047:

List of numbers retrieved from master..spt_values

And how can we turn these into dates?  Using some creativity it’s not that complicated.  Let’s say we want to generate a list of dates starting at the first of January of the current year up until today, could be interesting for YTD calculations.  Following query does just that:

select CAST(DATEADD(dd, -number, GETDATE()) as date) dt
from master..spt_values
where type = 'P' and number < DATEPART(dy, GETDATE())

As you can see, several datetime-related functions are being used here.  The easiest one is GETDATE() which returns the current system timestamp of the database server.

Then we’ve got DATEPART() in combination with the “dy” parameter.  This returns the numeric value for the day of the year that the date parameter represents.  The following query returns 336 when ran today (December 2, 2009):

select DATEPART(dy, GETDATE())

(For readability purposes I will not repeat “when ran today” each time, the remainder of the post assumes everything is being executed on the aforementioned date.)

These two functions are being used in the WHERE clause.   So what the WHERE clause says is that we want all numbers smaller than 336.

Okay, so let’s move on to the SELECT.  It uses the DATEADD() function to subtract the retrieved number values from the current system date.  How does that work?  In combination with the “dd” parameter it tells the function what we’re working on are the days, and the minus sign in front of the number field ensures that days are being subtracted instead of added.

And to finish off, the value returned by DATEADD() is being converted to the date type using the CAST() function.  This removes the time value of the datetime type which is good because all we needed are dates.

This is what the query returns:

YTD date values

As you can see, we get 336 records which matches exactly with the values returned by DATEPART(dy, GETDATE()).  (BTW: the number field starts at zero which is why we get 336 instead of 335 records here)

For more advanced usage of a list of numbers I’d like to point you the following article by colleague Experts Exchange Expert Mark Wills: Fun with MS SQL spt_values for delimited strings and virtual calendars

Have fun!

Valentino.

Share

Tags: ,

Initially I was going to call this article “Struggling With Collation: The SeQueL”, but it just doesn’t have the same ring to it as “Fun With Strings”.  In that previous article I showed how you might get different results when loading data from a temporary table or table variable and I suggested that one way of solving this is by switching your data type to nvarchar.

Unicode Or Not?

Reason #1 For Not

Today I’m going to show you that nvarchar is not always what we want to use, especially if we don’t need to support Unicode strings.  Imagine a staging scenario when loading a data warehouse.  Often the Business Keys (BK) are strings, and depending on the source system, sometimes very long strings – I’ve seen situations with a combined business key of over 500 bytes!  (You can’t imagine what some data sources look like but that’s another story.)  Do we really want to convert these to Unicode, and thus double their size?  Furthermore, to improve lookups we put indexes on those BKs.  These indexes would double in size as well.  So no, we don’t really want to make these fields Unicode, and certainly not when we want our ETLs to perform as fast as possible.

Reason #2 For Not

That was reason number one why nvarchar is not always the solution.  And here comes reason number two.  In my scenario, the source tables are located in an Oracle database.  And guess what: by default Oracle’s ORDER BY behaves different than SQL Server’s ORDER BY (when using the regular Latin1_General_CI_AS or SQL_Latin1_General_CP1_CI_AS collations)!  By default Oracle uses binary string comparison to sort its data and the reason for it appears to be that that’s the only way to prevent a full table scan.  I’m no Oracle expert but that’s what the documentation states.

Here’s a little demonstration.  The following script prepares a table variable and selects the data from it, sorted ascending.

declare @tbl table( col1 varchar(20));
insert into @tbl select ‘AA’;
insert into @tbl select ‘A’;
insert into @tbl select ‘A-’;
insert into @tbl select ‘A A’;
insert into @tbl select ‘BA’;
insert into @tbl select ’0′;
insert into @tbl select ’1′;
insert into @tbl select ‘-0′;
insert into @tbl select ‘-1′;
insert into @tbl select ‘A0′;
insert into @tbl select ’0A’;
insert into @tbl select ‘-A’;
insert into @tbl select ‘-B’;
insert into @tbl select ‘a’;
insert into @tbl select ‘b’;
insert into @tbl select ‘ ‘;

select * from @tbl
order by col1 asc;

I have executed it once just as stated above (while connected to a database that uses the SQL_Latin1_General_CP1_CI_AS collation) and once more while using nvarchar as data type for the column in the table variable.  The first execution will sort the data using a non-Unicode sorting algorithm, while the second execution will order the data according to the Unicode sorting method.  The results will be shown further below for easier comparison.

On Oracle I performed a similar procedure, as shown in following script.

select cast(‘AA’ as varchar(20)) as col1 from Dual union
select ‘A’ as col1 from Dual union
select ‘A-’ as col1 from Dual union
select ‘A A’ as col1 from Dual union
select ‘BA’ as col1 from Dual union
select ’0′ as col1 from Dual union
select ’1′ as col1 from Dual union
select ‘-0′ as col1 from Dual union
select ‘-1′ as col1 from Dual union
select ‘A0′ as col1 from Dual union
select ’0A’ as col1 from Dual union
select ‘-A’ as col1 from Dual union
select ‘-B’ as col1 from Dual union
select ‘a’ as col1 from Dual union
select ‘b’ as col1 from Dual union
select ‘ ‘ as col1 from Dual
order by col1 asc;

The Oracle script doesn’t use a table variable, it just creates a result set using several select statements with a union in between.  But for our test that doesn’t matter, the results using this method are suitable.

In the table below you can see the result of the three executions.

SQL non-Unicode SQL Unicode Oracle Binary
< space
-0 0 -0
-1 -0 -1
-A 0A -A
-B 1 -B
0 -1 0
0A A 0A
1 a 1
A -A A
a A- A A
A A A A A-
A- A0 A0
A0 AA AA
AA b BA
b -B a
BA BA b

As you can see, they only agree on one thing: space really is the smallest character in my test set!  And that’s not what I want, I want all the data to be sorted consistently, no matter what the source is.

But Why Sorted?

You may wonder why I need to sort the data.  Well, some components in Integration Services expect the incoming data flows to be ordered.  One of the standard components that requires this is the Merge Transformation.  Another (custom!) component is Table Difference.  I could of course add a Sort Transformation to my Data Flow, but that would not be interesting for performance.  I want the data to come from the database server in the expected order.  So now I’ll show you how you can do that.

Taking Control!

SQL Server: ORDER BY … COLLATE …

On SQL Server this was fairly easy.  The ORDER BY clause has a COLLATE part where you can specify what collation should be used to order the data.  Because Oracle sorts its data using a binary algorithm, I’ll tell SQL Server to do that as well.  More precisely I’ll tell SQL Server to use the Latin1_General_BIN collation.  The updated SELECT statement from the T-SQL script above looks like this:

select * from @tbl
order by col1 collate Latin1_General_BIN asc;

Oracle: ORDER BY NLSSORT()

To ensure that results from Oracle are always returned using the same sorting algorithm, I will also tell the Oracle server to sort it’s data using the binary algorithm.

The first way I came up with was to change the NLS_SORT setting on the session.  That can be done by executing the following command before the SELECT statement:

ALTER SESSION SET NLS_SORT=BINARY;

This method is fine when you’re running the queries manually from a client such as Oracle SQL Developer.  However, in SSIS the OLE DB Source component will not accept anything else besides the SELECT statement.

Then I found another way.  There’s a function called NLSSORT() which you can apply to a column in the ORDER BY clause.  The following statement demonstrates how to use this function.  (I only show the ORDER BY clause as it can be applied to the Oracle script mentioned earlier.)

ORDER BY NLSSORT(col1, ‘NLS_SORT=BINARY’)

The following table shows the results from both binary sort queries:

SQL Binary Oracle Binary
-0 -0
-1 -1
-A -A
-B -B
0 0
0A 0A
1 1
A A
A A A A
A- A-
A0 A0
AA AA
BA BA
a a
b b

Finally I am able to get data from both Oracle and SQL Server using a consistent sort order.

But, How Big Is NULL?

However, even on this straightforward request, both database servers do not fully agree!  Here’s what they have to say about the topic:

“NULL is the smallest.”

“No, it’s the largest.”

“No, smallest!”

“Largest!!”

“Smallest.”

“Largest I tell you!!!”

“Bladiebla, not hearing you, anyway, it’s NOTHING!”

“No, it isn’t!”

*discussion goes on and on*

If I add NULL to my test data set, SQL Server will sort it first (thus NULL is the smallest value in my test set), while Oracle will put it last.  In my situation it wasn’t really an issue (the BKs are not supposed to be NULL), but it’s quite important to remember in cases where NULLs are actually possible.

Conclusion

When working with strings, always keep collation in mind.  And even more so when dealing with several different source systems!

Additional reference material:

Database Journal: The Globalization of Language in Oracle – The NLS_COMP and NLS_SORT variables

BOL 2008: How to: Sort Data for the Merge and Merge Join Transformations

Share

Tags: , , , , , ,

« Older entries § Newer entries »

© 2008-2017 BI: Beer Intelligence? All Rights Reserved