Community Day 2012

Here we are with already the sixth installment of our Belgian Community Day.

To please everyone, there are 6 different tracks with over 40 sessions!  As you’re reading my blog, you’re probably most interested in what sessions the SQLUG has got to offer.  Here they are:

Time Track Speaker Session
0930 – 1030 6

Pieter Vanhove

I know what you did last summer… on my database: SQL Audit on SQL 2012

1315-1415 4

Andre Kamman

Load-testing with the SQL Server 2012 Distributed Replay Controller

1715-1800 4

Koen Verbeeck

Power up your data warehouse with SQL Server 2012

When? Thursday June 21st 2012

Where? Utopolis Mechelen, Spuibeekstraat 5, 2800 Mechelen

Price? Nada, niente, nougabolle! (yes, it’s still free)

Of course there are many more sessions, some of them even related to SQL Server, such as this one by the CLUG: Business Intelligence with SharePoint and SQL Server 2012: what’s new by Serge Luca and Isabelle Van Campenhoudt. 

And that’s not all, here’s one more by the BIWUG: SQL Server for SharePoint geeks and other involuntary DBA’s by Thomas Vochten.

I’m also looking forward to the session on (M)ogre 3D game development by my Ordina-colleague Mario Van Hissenhoven.

So, convinced? Register now without hesitation because places are getting filled really quick!

Have fun!

Valentino.

Share

Tags: ,

This is a follow-up to my article on Loading Complex XML Using SSIS and XSLT.  In that article I demonstrated how you can convert complex XML into simple CSV using XSLT in SSIS.

The resulting DTSX package and input files can be downloaded from my SkyDrive through this link.

Dealing With Special Characters

If you’ve followed the instructions in my article mentioned above and you need to deal with special characters such as the é and è encountered in the French language, you probably noticed that it wouldn’t really work as expected.  In fact, in your final result you may have ended up with the special characters being replaced with other, even more special, characters.  Obviously not good.

Here’s an explanation on the reason why that happens, and also how to deal with it.

Setting The Scene

Imagine the following sample XML, representing a really huge book collection:


<books>
    <book>
        <title>The Hitchhiker's Guide to the Galaxy</title>
        <author>Douglas Adams</author>
        <language>EN</language>
        <description>The Hitchhiker's Guide to the Galaxy is a science fiction comedy series created by Douglas Adams.</description>
    </book>
    <book>
        <title>Le Trône de fer</title>
        <author>George R.R. Martin</author>
        <language>FR</language>
        <description>Le Trône de fer est une série de romans de fantasy de George R. R. Martin, dont l'écriture et la parution sont en cours. Martin a commencé à l'écrire en 1991 et le premier volume est paru en 1996. Prévue à l'origine comme une trilogie, la série compte désormais cinq volumes publiés et deux autres sont attendus.</description>
    </book>
</books>

As you can see, the second book in the list is the French version of the first book in the A Song of Ice and Fire series by George R.R. Martin and as it goes with French, there are some accents in the description of the book.

We’ll use the following XSLT to convert it to CSV:


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs=http://www.w3.org/2001/XMLSchema xmlns:fn="http://www.w3.org/2005/xpath-functions">
  <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>
  <xsl:template match="/">
    <xsl:text>BookTitle;Author;Language;Description</xsl:text>
    <xsl:text>&#13;&#10;</xsl:text>

    <xsl:for-each select="books/book">
      <xsl:text>"</xsl:text>
      <xsl:value-of select="title"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="author"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="language"/>
      <xsl:text>";"</xsl:text>
      <xsl:value-of select="description"/>
      <xsl:text>"</xsl:text>
      <xsl:text>&#13;&#10;</xsl:text>
    </xsl:for-each>

  </xsl:template>
</xsl:stylesheet>

Using an XML Task in the Control Flow, as explained in my article, we’d get the following output:

BookTitle;Author;Language;Description

"The Hitchhiker’s Guide to the Galaxy";"Douglas Adams";"EN";"The Hitchhiker’s Guide to the Galaxy is a science fiction comedy series created by Douglas Adams."

"Le Trône de fer";"George R.R. Martin";"FR";"Le Trône de fer (A Song of Ice and Fire) est une série de romans de fantasy de George R. R. Martin, dont l’écriture et la parution sont en cours. Martin a commencé à l’écrire en 1991 et le premier volume est paru en 1996. Prévue à l’origine comme une trilogie, la série compte désormais cinq volumes publiés et deux autres sont attendus."

So far so good, all accents are still present!

Then we’d import the file using a Flat File Source component in a Data Flow Task.  Here’s what the General page of the Flat File Connection Manager would look like:

Flat File Connection Manager: General

We’ve set double-quote as Text Qualifier and checked the Column names in the first data row textbox.

Switching to the Columns page we’d get the following:

Flat File Connection Manager: Columns - the Preview has messed up the accents!

Hang on, that’s not right!  The Preview is not displaying our accents as expected!  Oh my, what’s going on here? Let’s call the code page detectives!

A Mismatch Investigation

Take a good look at the XSLT which we’ve used to convert the XML into CSV, especially the xsl:output line:

<xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>

That line specifies that the text output should be encoded using the UTF-8 code page.

Now take a good look at the General page in the screenshot earlier, more precisely this part:

Code page: 1252 (ANSI - Latin I) is not what we need right now!

Indeed, code page 1252 (ANSI – Latin I).  While the input is UTF-8.  Of course that results in a mismatch of certain characters, as demonstrated here.  The fix is fairly easy, just change the Code page setting to 65001 (UTF-8).

Code page: 65001 (UTF-8) - much better!

If we now switch back to the Columns page we should come to the following result:

Flat File Connection Manager: Columns page preview with accents!

Ah, sure looks better doesn’t it?  All accents are present as expected.

But in case you thought that’s it, I’d advise you to think again.  Don’t worry, I’ll demonstrate what I mean.  Let’s do that by setting up a simple Data Flow.

Setting Up The Data Flow

Throw in a Flat File Source and specify our Flat File Connection Manager.  I also prefer to keep NULLs as they come in, using the Retain null values from the source as null values in the data flow checkbox.

Flat File Source: Connection Manager

If you click the Preview button you should get similar output as shown one screenshot earlier.

Now hook this up to an OLE DB Destination that writes the incoming data into a table in your favorite database:

OLE DB Destination is not happy :(

As you can see, our destination is not entirely happy with all this.  Here are the details of one of the error messages:

Validation error. Data Flow Task: Data Flow Task: The column "BookTitle" cannot be processed because more than one code page (65001 and 1252) are specified for it.

Looks like once more we’ve got a code page conflict.  And we sure do. Clicking the Data Flow connector between the Flat File source and OLE DB destination shows us the following:

Data Flow Path Editor shows that our strings are encoded using the 65001 code page.

Each of our incoming string values is encoded using the 65001 (UTF-8) code page.  But our database was created using the Latin1_General_CI_AS collation.  So we’ve indeed got a code page conflict!

Fear not, that’s easily remedied.  Add a Derived Column transformation in between the source and destination and convert each incoming string value using a cast expression such as this one:

(DT_STR, 50, 1252)BookTitle_IN

Note: whenever I need to manipulate incoming columns to create a second version of the same column, I rename the incoming column to TheColumn_IN.  The new version will be called TheColumn and preferably TheColumn is the name of the field in the destination table.  This makes it easy to distinguish all columns later down the flow.

Here’s what the final version of the Derived Column looks like:

Using the Derived Column transformation to cast the incoming strings into the correct code page.

 

Next we’ll need to open the Destination and change the mapped fields to the new ones.  Because my new columns are called exactly the same as the fields in the destination table, I can do that easily.  In the Mappings page, all I need to do is right-click the grey background in between the two tables and click Select All Mappings, hit the Delete button, right-click again and click Map Items By Matching Names:

Using Map Items By Matching Names, easy!

With the data flow finished, let’s give our package a run!

Flat File Source has got a length issue!

Ouch, our source is not happy!  A closer examination of the Output pane brings us to the following error:

Error: 0xC02020A1 at Data Flow Task, Flat File Source [16]: Data conversion failed. The data conversion for column "Description" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".

Oh right, so far we haven’t bothered looking at the actual length of the data that we’re importing.  Actually, what is the length of our data flow columns??  Well, if you’ve been paying close attention you should have noticed the number 50 several times in the screenshots and expressions above.  That’s indeed the default length for text columns when importing a flat file.

And if you scroll back up to the sample XML, you’ll notice that the content for the description is longer than 50 characters, thus causing our error!  Let’s find out how to get that solved!

Fixing The Field Length Issue

The first step in getting this fixed is opening up the Advanced page in the Flat File Connection Manager editor.

Flat File Connection Manager: using the Advanced page to change field length.

Then select the Description field and change its OutputColumnWidth property from 50 to 500.

That will cause the source to generate a warning.  Remove this warning by opening and closing the source editor.  Click the Yes button in the popup that appears.

The next step is changing the expression for the Description field in the Derived Column to this:

(DT_STR,500,1252)Description_IN

Indeed, the field length is one of the parameters in that cast.  The other numeric parameter is obviously the code page.

Having done that you’ll notice that the destination will start complaining.  Of course, you’ll need to adapt the destination table to reflect the field length increase as well.  So change the table definition and open/close the destination editor to make it happy.

Alright, let’s run the package once more!

Finally the data flow is happy with it all and has inserted two records:

That's more like it: all components colored green!

And what does our table contain?  Let’s find out:

All accents have been imported!

That’s looking good for sure!

Conclusion

In this follow-up article I have demonstrated what might go wrong when you need to deal with special characters while importing flat files, and how to solve your possible issues.  In case you missed the original article, have a look through this link.

Have fun!

Valentino.

References

Wikipedia: UTF-8

Share

Tags: ,

If you have ever had the chance to see a presentation by Rafal Lukawiecki, I’m sure you’ll agree that he surely knows how to deliver his stuff, and on top of that he knows what he’s talking about.

In other words: Rafal is once more in our small country!  Next Thursday he’ll be talking about the present and future of the Microsoft Business Intelligence stack.

Location: Utopolis Mechelen, Spuibeekstraat 5, Mechelen, Belgium.

Date/Time: Thursday April 26th, 0900 – 1700.

Check out the Belgian SQLUG site for more details.

Ow, and it’s a full-day free event, so what are you waiting for?  If you’re into BI, you’d better have a real good excuse to miss this opportunity!  Register now!

See you there!

Valentino.

Share

Tags: ,

You may or may not already have heard about it, but the next version of our favorite database, SQL Server 2012, ships with a client tool called SQL Server Data Tools (SSDT).  Just like the BIDS was actually Visual Studio 2008 (or 2005 if you’re long enough in the business) with BI-related project templates, SSDT is Visual Studio 2010.  But that’s not all: it contains additional functionality!  Do you remember the database projects you could create using the “data dude”, or officially known as Visual Studio 2008 with the GDR2 add-on?  Well, this is version-next of the data dude.

Let’s find out how such a database project can be created!

Getting Started With SSDT

Obviously the first step is installing SQL Server 2012 and making sure to select SQL Server Data Tools in the Feature Selection page:

SQL Server 2012 Feature Selection with the SQL Server Data Tools checked

With that up and running, launch the SQL Server Data Tools from the Microsoft SQL Server 2012 folder in the Start menu and go to File > New Project.

New Project - SQL Server Data Tools

One of the template categories is called SQL Server.  Under that you’ll find a template called SQL Server Data Tools – Database Projects (Web Install).

Erm, hang on, what do you mean, “web install”?  Does that mean it’s not installed yet?  The description on the right also gives some clue to what’s going on: “provides instructions for installing SQL Server Data Tools – Database Projects from the web”.  Okay, let’s get on with it then!

Click the OK button to proceed to the installer.

Installing the SSDT Database Projects template

SQL Server Data Tools - Database Projects (Web Install)

I think the message in that window above is a bit misleading.  It says that “Database Projects” is required, but isn’t that what we’re going to install now?  Sounds like chicken and egg to me!  Let’s just ignore the message then and click the Install button.

Data Developer Center - Download SQL Server Data Tools

We’re now presented with a web page in Internet Explorer.  In this Data Developer Center, click the blue Download SQL Server Data Tools link.

Next, when presented with the following pop-up, click Allow:

Do you want to allow this website to open a program on your computer? Yes we do!

The next window is the Web Platform Installer 3.0 that wants to install the Microsoft SQL Server Data Tools.  Click the Install button to get to the next step.

Web Platform Installer 3.0 - 1 items to be installed

As usual with any installer from Microsoft, we need to accept the license terms so click the Accept button.

Web Platform Installation - Accept License Terms

Finally, the installation begins!

Web Platform Installation - Progress Bar

To keep us busy, we’re presented with another pop-up with a progress bar:

image

Woah!  Is this thing installing SQL Server 2012 Express LocalDB like it says in the message?  I didn’t ask for that!  MS people, as I already have a version 2012 DB engine running on my machine, please make this optional…

Just a little later we’re presented with this:

Web Platform Installation - Congratulations!

Clicking the Finish button gives us yet one more pop-up:

Web Platform Installer 3.0 - Spotlight

The Web Platform Installer presents us with a list of applications we can install, including Microsoft SQL Server Data Tools.  Oh my, I thought we just finished installing it?  Let’s not be silly and click the Exit button

To conclude, we need to restart Visual Studio 2010, or aka SSDT.

Really Getting Started With SSDT

With everything up and running smoothly this time, open SSDT once more and in File > New Project you’ll now find the SQL Server Database Project under the SQL Server collection:

New Project - SQL Server Database Project

If you also have Visual Studio 2010 installed in Premium or Ultimate edition, you should take care to open up the correct SQL Server template collection.  The reason for that is because the data dude projects are included in those versions of Visual Studio, as shown in the screenshot below:

The SQL Server templates in Visual Studio 2010

So the templates under Database > SQL Server are not what you’re looking for!

That’s it for now, I’ll demonstrate my favorite SSDT feature in an upcoming post!

Have fun!

Valentino.

References

SSDT Main page on Data Developer Center

SQL Server Data Tools Team Blog

Share

Tags: , ,

SQLUG Event

Huh, the what?  I know INDEX and I know CLUSTERED INDEX.  Heck, I even know the filtered index, but COLUMNSTORE?  That’s new to me!

If that’s your reaction, or you simply want to find out how this new SQL Server 2012 feature has been used in one of the largest data warehouses in the country, you should be present next Monday for our upcoming SQLUG event.

Location: Microsoft Belgium, Corporate Village, Leonardo Da Vincilaan 3, 1935 Zaventem

Presenter: Ludo Bernaerts (ITN Senior Operational Engineer at Belgacom)

Date: 16/04/2012

Time: 1800 – 2100

Ow yeah, registration goes through here!

See you there!

Valentino.

Share

Tags: ,

« Older entries

© 2008-2012 A Developer's Blog All Rights Reserved