Integration Services

You are currently browsing articles tagged Integration Services.

Quick post to ask a moment of your time.

Have you ever wanted a feature in the BIDS to quickly identify package variables which have become obsolete?  Well, I’ve been involved in the cleanup of existing packages and I can tell you, that feature would be very handy!

After a search on the internet, it turns out that one of the planned features for the BIDS Helper contains exactly that.  The request is a bit wider than what I need, but at least “highlight unused variables” is part of it.

All I’m asking now is just a minute of your time to vote for that feature request.

Have fun, and thank you!

Valentino.

Share

Tags: , , ,

When I opened an existing SSIS project in the new SQL Server 2012 RC0, I came to an interesting discovery: an empty Toolbox pane!  Even with an SSIS package open in the designer.  Hmm, that’s funny!  So where are my SSIS components?

Take a good look at the following screenshot:

The Toolbox is no longer the SSIS Toolbox but the new SSIS Toolbox is!

That’s right, they are not in the Toolbox anymore but in the SSIS Toolbox instead.  This new toolbox is a bit different from the old one.  Besides the grouping of components that has changed, the most important change is that it will automatically detect any custom components.  You no longer need to right-click, select Choose Items, go fetch a coffee, wait until it cools down a bit, drink it and finally … select your custom component.  No, you’ll have to find another reason to get that coffee shot.  Actually, that’s not entirely true: you still need to right-click and then click Refresh Toolbox and then the custom components will be shown.

Another difference is that it’s split in two parts.  The bottom half of the pane now contains a description of the selected item, including a link that should lead to samples and a link to the Books Online.

The new SSIS Toolbox shows a description of the selected=

Out of curiosity I tried the Find Samples link a couple of times, but for now it doesn’t seem to deliver much content:

Not many results through Find Samples link

Okay, so one thing remains: how do you open the new SSIS Toolbox pane?  According to the Books Online it should be opened automatically when you open an existing project.  Well, apparently not all the time!

The first place I’d look is in the View menu.  But alas, SSIS Toolbox is not one of the menu items.  Not even in the Other Windows submenu.  Why oh why?!

Long story short: do you see those two buttons in the below screenshot?  They’re new!

Package designer has gotten two new buttons

The first button leads to the Variables pane, the second button will open the SSIS Toolbox.  Good to know isn’t it?!

Further investigation led me to the following: according to the Books Online, the SSIS Toolbox item should actually be located in the View > Other Windows menu.  As that is not the case and I think it’s only logical to have that pane added to the View menu as well, I’ve filed a bug on Microsoft Connect.  Go ahead and vote!

Have fun!

Valentino.

References

SSIS Toolbox

Share

Tags: , ,

A couple of days ago I came across a funny and weird object in the Control Flow.  It looked like this:

Group Tasks

As you can see, this container looks a bit similar to the Sequence Container but the Sequence Container has a different icon.  Also, this container does not have any connectors sticking out.  I’m currently involved in a 2005 > 2008 migration project, which is how I came across this container in the first place, and thus I really needed to find out what this object actually is.  So I started to investigate.  Unfortunately the properties were not very helpful:

Group does not have any properties

No properties, nothing at all, not even an object name.  Then I started to scan the Toolbox pane for the icon.  Guess what?  Right, nothing either!  Is this some kind of custom control??

The logical next step is to perform some internet searches and I found the answer: this is standard SSIS functionality that exists since SQL Server 2005!  WTF! (is what I thought at that moment)

The Grouping Container

In the Control Flow, if you right-click on at least one task, you get the following options:

image

And when you select Group, you’ll get that group container around the selected objects, allowing you to collapse the items.  Well, okay, I’ll probably still never use it now that I know that it exists.  Which is possibly the reason that I don’t know about it in the first place.

How Not To Use It

Using that grouping functionality you can end up with some weird-looking flows:

How not to use the group tasks functionality

What have we learned?  That even with years of SSIS experience, it’s still possible to discover “new” functionality.

And speaking of new functionality, that brings us to SQL Server 2012 (formerly known as Denali).

Grouping In SQL Server 2012

As of SQL Server 2012, not only will you be able to group tasks in the Control Flow.  You can also group components in the Data Flow.  Now that may prove more interesting than its Control Flow counterpart.  Why?  Because in the Control Flow chances are that you’ve already grouped your tasks using some Sequence Containers while you don’t have any containers in the Data Flow.

Here’s what it looks like:

SQL Server 2012 allows grouping components in the Data Flow

As you can see, the icon is now gone.  But the properties pane is still totally empty with the group control selected.  Oh well…

Have fun!

Valentino.

References

How to: Group Tasks and Containers in a Control Flow

Share

Tags: ,

When you’ve used SSIS for a while, you may have run into the following situation already.  Or maybe today is your first time and that’s the reason that you’ve arrived here.

“Huh, what’s he talking about?”, I hear you thinking.  Read on then. :-)

The Scenario

You’ve got a stored procedure or another SQL statement that needs to get called from the Execute SQL Task in the Control Flow of your package.  So far so good.  One of the parameters that needs to get passed into the statement is of the DateTime type.

How would you do that?

Parameter Mapping – Take 1

Following the KISS principle, let’s say we’ve got the following really complex table in our database:

create table dt ( dtVal datetime );

And in our Execute SQL task we have this extremely complex INSERT statement:

insert into dt values (?)

The statement is expecting one parameter.  The parameter that I want to pass into it is System::StartTime which is of type DateTime as shown in the screenshot below.

Show the system variables by activating the Show System Variables button

“Hang on, how did you get the Variables window to display the system variables?”

Ah, good question, by clicking that Show System Variables button, indicated with the red rectangle.

So you set up the Parameter Mapping as follows, specifying DBTIMESTAMP as Data Type and zero as Parameter Name because it’s the first parameter in the statement:

Execute SQL Task: Parameter Mapping

Then you decide to give it a test run.  But alas, it throws you the following error:

Error: 0xC002F210 at Execute SQL Task 1, Execute SQL Task: Executing the query “insert into dt values (?)” failed with the following error: “Invalid time format”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

So we’ve got a datetime column in the table and we’ve got a DateTime package variable.  But alas, the Execute SQL Task is not happy with passing this value to the query.

Now what?

The SqlStatementSource Expression

Let’s try another method then.  Instead of passing the parameter’s value through the Parameter Mapping page, we’ll set up an expression that constructs the whole INSERT statement, including the parameter’s value.

Have a look at the following expression:

"insert into dt values ('" +

(DT_STR, 4, 1252) DATEPART("yyyy", @[System::StartTime]) + "-" +

(DT_STR, 2, 1252) DATEPART("mm", @[System::StartTime]) + "-" +

(DT_STR, 2, 1252) DATEPART("dd", @[System::StartTime]) + " " +

(DT_STR, 2, 1252) DATEPART("hh", @[System::StartTime]) + ":" +

(DT_STR, 2, 1252) DATEPART("mi", @[System::StartTime]) + ":" +

(DT_STR, 2, 1252) DATEPART("ss", @[System::StartTime]) + "." +

(DT_STR, 3, 1252) DATEPART("ms", @[System::StartTime]) + "')"

It uses the DATEPART function to fetch parts of the System::StartTime variable and feed it into the INSERT statement using a format that works all the time (YYYY-MM-DD HH:MM:SS.MIL).  Here’s what it generated when I clicked the Evaluate Expression button in the Expression Builder:

insert into dt values (’2011-5-31 17:59:37.0′)

So where exactly would you specify that expression?  In the Execute SQL Task editor, open up the Expressions page.  Then click the Expressions item in the Misc list so that the button with the ellipsis appears.  Now click that button, select SqlStatementSource as property and click the Ellipsis button in the Expression field to get to the Expression Builder.

Then you’ll end up with something like this:

The Property Expressions Editor with an expression specified for the SqlStatementSource property

Give the package another run.  If everything has been set up as expected, the Execute SQL Task should color green and a select on the table should give one record:

Our test table contains one timestamp!

Hang on, does it really have to be this complicated?

Well, maybe not…

Parameter Mapping – Take 2

So let’s give the Parameter Mapping another go.

Set up the Execute SQL Task just like in Take 1 above, with one small difference: select DATE instead of DBTIMESTAMP as Data Type for the parameter.

Choose DATE as Data Type when passing a DateTime package variable into the Execute SQL Task

Now give the package another run.  Look at that, it colors green and there’s an extra record in the table:

An extra timestamp was written to the table

DATE doesn’t seem like the most logical type to choose in this scenario, which is why most people won’t even consider it.  But it works!  Actually, “DATE” is not really the best name that could be given to this particular data type.  Here’s the description of DT_DATE (not to be confused with DT_DBDATE!) according to MSDN:

A date structure that consists of year, month, day, hour, minute, seconds, and fractional seconds. The fractional seconds have a fixed scale of 7 digits.

The DT_DATE data type is implemented using an 8-byte floating-point number. Days are represented by whole number increments, starting with 30 December 1899, and midnight as time zero. Hour values are expressed as the absolute value of the fractional part of the number. However, a floating point value cannot represent all real values; therefore, there are limits on the range of dates that can be presented in DT_DATE.

On the other hand, DT_DBTIMESTAMP is represented by a structure that internally has individual fields for year, month, day, hours, minutes, seconds, and milliseconds. This data type has larger limits on ranges of the dates it can present.

What this means is that you have to be careful when using this type.  Even though it works fine today, it may not run fine in a similar scenario that required different date ranges.  But obviously you’ve got that covered by your unit test scenarios!

Conclusion

In this article I have demonstrated how a DateTime package variable can be passed as parameter into the Execute SQL Task in more than one different way.  My method of preference is the one using the DATE type in the Parameter Mapping.

Have fun!

Valentino.

References

SSIS Execute SQL Task

SSIS DatePart function

KISS Principle

SSIS Junkie: Datetime variables don’t always do what you expect

Share

Tags: , ,

In my previous article I showed you how the XML Source component can be used to load XML files into a SQL Server database, using fairly simple XML structures.  In this follow-up article I will demonstrate how to tackle the complex XML issue.

The Complex XML Example

You probably know that SSRS reports, RDLs, are actually XML files.  And they’re not the easiest types of XML files around.  To humans they are still readable but the structure can be quite complex.  So there we’ve got our example: an RDL.  More specifically I’ll be using the RDL that’s available for download in one of my earlier articles.

The Goal

Every good example has got a goal.  Our goal today is to retrieve a list of datasets and fields as defined in the RDL.  Shouldn’t be too difficult, right?

Using The XML Source Component

Let’s try to get this done through the XML Source component with which we’re very familiar by now.  You know the drill: drag an XML Source into your Data Flow, open it up and configure the XML and XSD locations.

Note: to be able to do this I cheated a bit by manually manipulating the RDL a little.  More precisely I removed all the namespace references from the <report> tag and further down the XML (removed “rd:”).

With both files configured, let’s have a look at the Columns page:

The XML Source component handling a really complex XML file

Look at that massive list of output flows!  In total I’ve gotten 45 of them, all for free!  Even if you’re up to the task of creating 45 output tables, do you really want to find out how to get these joined together?  To prevent creating that bunch of tables you may consider using the Merge Join component… 45 times in your data flow. Didn’t think so!

Sure, it would run fine if you manage to get it all constructed.  But in my opinion this is just too silly to try out because there’s an interesting alternative.

And that alternative is XSLT – eXtensible Stylesheet Language Transformations.

Using XSLT

With XSLT you describe what you want to retrieve from the XML document and what it should look like.  In this example we’ll be retrieving the list of datasets and their fields, in CSV format.  CSV stands for Comma-Separated Values, although I prefer the term “Character-Separated Values” as the separator is not always a comma.

To be able to write correct XSLT, you need to know what the XML structure looks like.  Here are the first 31 lines of the sample RDL file mentioned earlier.

<?xml version="1.0" encoding="utf-8"?>
<Report>
  <AutoRefresh>0</AutoRefresh>
  <InitialPageName>A Very Unique Name</InitialPageName>
  <DataSources>
    <DataSource Name="srcContosoDW">
      <DataSourceReference>ContosoDW</DataSourceReference>
      <SecurityType>None</SecurityType>
      <DataSourceID>b7a3d32c-e95d-4acf-bb99-9d60755303ea</DataSourceID>
    </DataSource>
  </DataSources>
  <DataSets>
    <DataSet Name="dsProductList">
      <Query>
        <DataSourceName>srcContosoDW</DataSourceName>
        <CommandText>select DPC.ProductCategoryName, DPS.ProductSubcategoryName, DP.ProductName
from dbo.DimProduct DP
inner join dbo.DimProductSubcategory DPS
    on DPS.ProductSubcategoryKey = DP.ProductSubcategoryKey
inner join dbo.DimProductCategory DPC
    on DPC.ProductCategoryKey = DPS.ProductCategoryKey;</CommandText>
      </Query>
      <Fields>
        <Field Name="ProductCategoryName">
          <DataField>ProductCategoryName</DataField>
          <TypeName>System.String</TypeName>
        </Field>
        <Field Name="ProductSubcategoryName">
          <DataField>ProductSubcategoryName</DataField>
          <TypeName>System.String</TypeName>
        </Field>

As you can see, the main node is called Report.  Nested under Report we’ve got DataSets, which can have several DataSet elements.  Each DataSet has a set of Fields with one or more Field elements.  Using that information we come to the following XSLT.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
  <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>
  <xsl:template match="/">
    <xsl:text>DataSource;DataSet;Field</xsl:text>
    <xsl:text>&#13;&#10;</xsl:text>

    <xsl:for-each select="Report/DataSets/DataSet/Fields/Field">
      <xsl:text>"</xsl:text>
      <xsl:value-of select="../../Query/DataSourceName"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="../../@Name"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="@Name"/>
      <xsl:text>"</xsl:text>
      <xsl:text>&#13;&#10;</xsl:text>
    </xsl:for-each>

  </xsl:template>
</xsl:stylesheet>

So, what is the XSLT describing?  On line three, we say that the output should be text in UTF-8 encoding.  The “template match” on the fourth line takes the whole XML document into consideration, hence the forward slash.  Then on line five we start writing output through the xsl:text tag.  This is our header line.  As you can see we’re using the semi-colon as column separator in the CSV output.  Line six adds a CRLF (carriage-return + line feed) to the output.

Then the fun part starts.  If you have experience with XPath, the way XSLT walks through the XML document should look familiar to you.

The xsl:for-each tag loops over all the Fields in all the DataSets in the document.

Using the xsl:value-of tag, we can fetch values out of the XML.  The first value being retrieved is the name of the data source that dataset is using.  (I’ve added the retrieval of the data source to demonstrate how element values are retrieved.)  The path to the DataSourceName element is Report/DataSets/DataSet/Query/ so we use the double-dot syntax to navigate two levels up in the XML tree.  The value of the element itself is retrieved by just using its name, as demonstrated in the XSLT above.

The next value-of tag retrieves the Name attribute of the DataSet, hence the two levels up, and the final value-of fetches the Name attribute of the Field element.

Now that the XSLT is clear for everyone, how do we apply it to our XML document?  Here comes the time for SSIS once more!

Open up the BIDS with the Control Flow of an SSIS package active and throw in an XML Task component.

The XML Task, one of the Control Flow Items in Integration Services

Double-click the component to open up the XML Task Editor.  This is what it looks like by default:

XML Task Editor: default settings

As this is an all-round XML task that can handle several XML-related tasks, the first setting that we need to modify is called OperationType.  That’s not too complicated because it comes with a dropdown and XSLT is one of the possible values.

The different operation types supported by the XML Task

With XSLT selected, the editor transforms into the following:

The XML Task Editor with XSLT as OperationType

Now we need to configure where the task can find our XML file, through the Source property.  Click the Source textbox to make the dropdown appear and select <New File connection…>.

You can create a new File Connection through the XML Task Editor

In the File Connection Manager Editor, leave the Usage type at Existing file and select the RDL.

Next up we’re going to specify where the task can find the XSLT that needs to be applied to the XML.  That can be done through the Second Operand settings.  As SecondOperandType, select File Connection.  Use the dropdown of the SecondOperand property to create a second new file connection that points to your XSLT file.

With that set up as well, only one step remains.  The task still doesn’t know where the output should be saved.  Or that it actually should get saved.  So first switch the SaveOperationResult property to True.  As you can see, DestinationType is already set to File Connection, that’s what we need.  Use the dropdown of the Destination property to create a third new file connection.  This time however, Usage Type should be set to Create File.  Specify path and filename for the output file and click OK to close the File Connection Manager Editor.

This is what our XML Task now looks like in the editor:

The XML Task Editor with all input and output files specified, as expected for our XSLT experiment

As shown above, I’ve called the output file DatasetInfo.csv.

One more property that can be interesting is the OverwriteDestination property.  Setting it to True can ease the testing of your package if you need to execute it multiple times.  Which you’ll probably want when your XSLT is not giving the expected output.  Don’t forget to set it to False afterwards (depending on what behavior you actually expect from your package).

Okay, now close the XML Task Editor and execute the package.  If you haven’t made any mistakes, the task should color green and you should have an extra file on your hard drive somewhere.  Here’s what the content of my DatasetInfo.csv looks like:

DataSource;DataSet;Field

"srcContosoDW";"dsProductList";"ProductCategoryName"

"srcContosoDW";"dsProductList";"ProductSubcategoryName"

"srcContosoDW";"dsProductList";"ProductName"

"srcContosoDW";"dsProductList";"ProductCategoryColor"

"srcContosoDW";"dsProductList";"EasterEgg"

Look at that, a list of fields, all part of the dsProductList dataset.

“Hang on, wasn’t this article going to demonstrate how to get complex XML files imported into our database?  And now you’re writing the data to a file?!”

Well yeah, you’re right.  Unfortunately the XML Task does not offer the possibility to write to a table in a database.  So to get the data imported into your database you’ll need to set up a Data Flow that imports the CSV files.  But that shouldn’t be too difficult to achieve, right?

Mission accomplished!

Conclusion

With this article I have shown how Integration Services can be used to retrieve data out of complex XML files, without actually using the XML Source component.  I hope you’ve enjoyed reading it as much as I had while writing.  Or maybe you know another interesting method to get complex XML imported.  Feel free to post comments!

If, after reading and applying the above technique, you are struggling with getting special characters such as é, è or ö and even ô, imported make sure to read my follow-up article on SSIS, Flat Files and Accents.  It also gives some more insight into what the above method actually produces (code page UTF-8 is a hint).

Have fun!

Valentino.

References

XSLT (Wikipedia)

CSV (Wikipedia)

XML Task (MDSN)

Share

Tags: , ,

« Older entries § Newer entries »

© 2008-2017 BI: Beer Intelligence? All Rights Reserved