SQL Server

You are currently browsing the archive for the SQL Server category.

MCTS++

imageThe regular readers among you may have noticed that my posting frequency went down a little these last few weeks.  One reason for that is the excellent weather in Belgium, combined with the two-months long holiday of our kids.

Another reason is that I enrolled myself for a Microsoft certification exam, so I needed my spare time to prepare for that.

So last Friday I went to the Prometric test center and completed the exam.

What exam did I do?  Well, a couple of years ago I participated in two beta exams for SQL Server 2008 and achieved the MCTS DBA (71-432) and the MCTS BI (71-448) certificates.  Which means I was still missing the MCTS Developer (70-433) title.

Indeed, the exam in which I participated is the one leading to the MCTS: SQL Server 2008, Database Development certificate.  And I passed with success!! :-)

At the end of the exam you’re given the opportunity to comment on the experience.  One of my comments was that test takers should be allowed to take a drink with them inside the room.  The current temperatures here in Belgium are over 35°C (95°F) and I was inside for more than two hours.  Even though the temperature inside wasn’t as hot as outside, my throat was dry by the end of the exam.  If Microsoft is worried that people cheat by hiding notes inside the container, just add “unopened” and “see-through” to the requirements of the drink bottle.  I don’t see why that shouldn’t be possible.

So, I can now update my About page with the following logo:

The full SQL Server 2008 MCTS stack

Okay, time to enjoy a mojito now!

Have fun!

Valentino.

  • Share/Bookmark

Tags: , ,

On the forums I now and then encounter questions regarding images on SSRS reports.  Instead of re-inventing the wheel each time, I decided to write an article about the subject.  So in this article I’ll be discussing and demonstrating several different ways of how images can be put on a report.

I’m using SQL Server Reporting Services 2008 R2 CTP, more precisely version 10.50.1352.12, but the methods explained here will work on any SSRS 2008.  Furthermore I’m using the AdventureWorks2008R2 database, available at CodePlex.

The resulting report, including image files, can be downloaded from my Skydrive.

The Scenario

The marketing department has requested a product catalogue.  This catalogue should contain all products produced by our two daughter companies: The Canyon Peak and Great Falls Soft.  The catalogue should be grouped on company, with the next company’s products starting on a new page.

Further requirements are:

    1. Each page needs an image in its header, with even pages displaying a different image than odd pages.
    2. Each company has a logo.  The logo should be displayed in the company’s header.
    3. Each product has a logo.  The logo should be displayed as part of the product details.

A design document containing the expected layout, including all image material, has been provided.

The Data

The following query provides us with all the data needed to produce the report:

SELECT 'The Canyon Peak' as Company, 'TheCanyonPeak_logo.png' CompanyLogo,
    'The Canyon Peak company specializes in all kinds of bikes, such as touring and road bikes.' CompanyDescription,
    P.Name as Product, PS.Name as Subcategory, PC.Name as Category,
    PP.LargePhoto, P.ListPrice, P.Weight, P.Size,
    P.SizeUnitMeasureCode, P.WeightUnitMeasureCode
FROM Production.Product AS P
    INNER JOIN Production.ProductSubcategory AS PS
        ON PS.ProductSubcategoryID = P.ProductSubcategoryID
    INNER JOIN Production.ProductCategory AS PC
        ON PC.ProductCategoryID = PS.ProductCategoryID
    LEFT OUTER JOIN Production.ProductProductPhoto PPP
        ON PPP.ProductID = P.ProductID
    LEFT OUTER JOIN Production.ProductPhoto PP
        ON PPP.ProductPhotoID = PP.ProductPhotoID
WHERE PC.Name = 'Bikes' --The Canyon Peak sells bikes
    and PP.ProductPhotoID > 1 --I don't want NO IMAGE AVAILABLE
UNION ALL
SELECT 'Great Falls Soft' as Company, 'GreatFallsSoft_logo.png' CompanyLogo,
    'Great Falls Soft uses only the softest tissues available for those sporting clothes.  And on top of that, they''re waterproof.' CompanyDescription,
    P.Name as Product, PS.Name as Subcategory, PC.Name as Category,
    PP.LargePhoto, P.ListPrice, P.Weight, P.Size,
    P.SizeUnitMeasureCode, P.WeightUnitMeasureCode
FROM Production.Product AS P
    INNER JOIN Production.ProductSubcategory AS PS
        ON PS.ProductSubcategoryID = P.ProductSubcategoryID
    INNER JOIN Production.ProductCategory AS PC
        ON PC.ProductCategoryID = PS.ProductCategoryID
    LEFT OUTER JOIN Production.ProductProductPhoto PPP
        ON PPP.ProductID = P.ProductID
    LEFT OUTER JOIN Production.ProductPhoto PP
        ON PPP.ProductPhotoID = PP.ProductPhotoID
WHERE PC.Name = 'Clothing' --Great Falls Soft sells clothes, waterstopping soft clothes
    and PP.ProductPhotoID > 1 --I don't want NO IMAGE AVAILABLE
ORDER BY Category asc, Subcategory asc, Product asc;

I’m not going into the details of this query.  Let’s just say that I’m manipulating data from the database in combination with some hardcoded data to get usable data for our example.  I’ve added some comments to make it clear what the query is doing.  If you have a look at its output, you’ll see that it produces a list of products with some additional fields.

Results of the query

Different Ways Of Adding Images

To get started, open up a SSRS solution, add a new report, add a data source connecting to your AdventureWorks 2008 R2 DB, and add a dataset using the above query.

Embedding Images In Your Report

The first way of adding images to a report that we’ll take a look at is by embedding them inside the report.  Looking at the scenario requirements described earlier, this is requirement 1.

Let’s add a header to the report.  In the BIDS menu, select Report > Add Page Header.

Adding a header to a report

If you don’t see the Report menu item, you probably have not selected your report.  Click your report in the Design view to select it.

From the Toolbox, drag the Image report item onto the header portion of the report.  Doing that will show a pop-up window, the Image Properties.  By default, the Select the image source combobox is set to Embedded.  Good, that’s what we need at this point.  What we now need to do is import an image into the report, using the Import button.

Clicking the Import button shows a common file Open dialog.  Our marketing department has given me two images for use in the header: Cloudy_banner.png and AnotherCloudy_banner.png.  Let’s select the first one.

Adding an image to a report by using the Import button on the Image Properties window

If you don’t see any images, have a look at that filter dropdown as highlighted in the screenshot above.  By default this is set to JPEG files.

Here’s the result in the Image Properties:

Image Properties with an image selected

On the Size page, select Clip instead of Fit proportional.  This is a setting that you’ll need to look at case per case.  For our header images, Clip is the most suitable option.

Image Properties: set Display to Clip

Close the Image Properties window and enlarge the image placeholder so that it occupies the whole header area:

Image added to report header

As you can see, we now have an image in the header.  But we haven’t fully implemented the requirement yet.  The even pages should display a different image than the uneven ones.

To be able to do that, we’ll first add the second banner image to the report.  In the Report Data pane, locate the Images node and open it up.  You’ll notice that the image that we inserted earlier can be found here.

The Images node in the Report Data pane shows all embedded images

Right-click the Images node and select Add Image.

Right-click Images node to add an embedded image to the report

That opens up the familiar file Open dialog which was used to add the first image.  So I’m now selecting the file called AnotherCloudy_banner.png, after changing the default filter to PNG.  After clicking OK, the image gets added under the Images node.

Second banner image added to the report 

With the second image added, all that remains to be done is tell the header that it should pick different images depending on the page number.

Right-click the image in the header and select Image Properties.  On the General page, when you click the dropdown of the setting called Use this image, you’ll notice that there are two values now.  These are the same values as displayed in the Report Data pane.  And these are the values to be used in the expression that we’ll create to rotate the images depending on page number.

Click the fx button next to the dropdown and enter the following expression:

=IIF(Globals!PageNumber Mod 2 = 0, "Cloudy_banner", "AnotherCloudy_banner")

This is a fairly simple expression, using the Mod operator and the IIF() function.  When page number can be divided by two, which means it’s an even page number, Cloudy_banner is displayed.  Otherwise the other banner is displayed.

That’s it, the report header is finished.  When you have a look at the report in Preview, it should now show the second banner on the first page – this is an uneven page.

To conclude this chapter I’d like to mention that this method is usually not the preferred one.  A disadvantage here is that the images are stored inside the report RDL and thus cannot be modified without altering the report itself.

Here’s the evidence:

 <EmbeddedImages>
    <EmbeddedImage Name="Cloudy_banner">
      <MIMEType>image/png</MIMEType>
      <ImageData>iVBORw0KGgoAAAANSUhEUgAABVsAAABaCAIAAA...

To have a look at the RDL yourself, just right-click the report in the Solution Explorer and select View Code.

On to requirement number two!

Displaying Images Through A URL

At the moment, the report body is still empty, so drag a Table onto it.  Put the Table in the upper-left corner, remove one of the columns so that two remain, remove the Header row and make it a bit wider.

Now set the DataSetName property of the Tablix to the name of your dataset, in my case that’s dsProducts.

The report should display the data grouped on company, so right-click on the line that says Details in the Row Groups window part at the bottom of the Design View.  Select Add Group > Parent Group.

Right-click the Details line in Row Groups to add a new parent group

Group by Company and add a group header:

Tablix grouping

Remove the extra first column that just got generated:

Remove unwanted column

We’ve now got an empty tablix with two columns, a Details row and a Company header row.  In our dataset, one of the fields is called CompanyDescription.  Hover the mouse pointer above the textbox in the top-right, click the small icon that appears and choose the field from the dropdown that appears when you click the icon.

Click the small icon to get a list of fields

To add the company’s logo, drag an Image from the Toolbox pane into the textbox on the left of the company description.  Doing this opens up the by now familiar Image Properties dialog.

Give it a good name, such as CompanyLogo, and select External as image source.

Click the fx button next to the Use this image box and enter an expression such as this one:

="file:C:\vavr\test\" + Fields!CompanyLogo.Value

When using External as image source, the image expression should result in a valid URL, any valid URL.  In my example the files are located in a local folder called c:\vavr\test.  Keep in mind that, when you deploy the report to a server, the images should by located in that same folder, this time located on the server.

The Image Properties configured to display an External image

By default the image gets displayed using the Fit Proportional setting.  You can verify this in the Size page of the Image Properties.  We want the image to get fully displayed while maintaining the aspect ratio, so leave the setting as it is.  Close the image properties dialog.

Vertically enlarge the first row in our tablix to an acceptable size.  In my case the marketing department specified to use a height of 1.5 inches for the company logo.  With the image selected, locate the Size > Height property and set it to “1,5in”.  Note that the decimal separator used here depends on your local settings.

Now have a look at the report in Preview:

The report with company logos added

Note that I’ve removed the borders of all textboxes by setting their BorderStyle property to None.

With the logo images implemented we have fulfilled requirement two.  On to number three.

Retrieving Images From The Database

In this last requirement we’ll have a look at displaying images that are retrieved from the database, also known as data-bound images.

The retrieving part is actually already implemented.  In our dataset there’s a field called LargePhoto, that one contains a picture of the product.

Let’s add some product details and a picture in that remaining blank row.  To get full control over layout I want to make the detail part of the tablix a freestyle part.  First merge the two cells together by selecting both of them, then right-click and choose Merge Cells.

Merging two cells together in a tablix

Now select a Rectangle in the Toolbox pane and drop it into the merged area.  To add fields such as Subcategory and Product you can just select them from the Report Data pane and drop them inside the rectangle.  I’m also adding some additional labels and fields, as shown in the next screenshot.

The product details in Design view

As you can see I’ve modified the fonts a bit.  The rendered version:

The rendered product details

This is the expression used for displaying the weight:

=IIF(
    IsNothing(Fields!Weight.Value),
    "unknown",
    Fields!Weight.Value & " " & Fields!WeightUnitMeasureCode.Value
)

And here’s the expression for the size field:

=Fields!Size.Value & " " & Fields!SizeUnitMeasureCode.Value

For the layout the price field I’ve just entered C in the Format property of the textbox.

With the textual product details completed, all that remains to be done is adding the product image.

From the Toolbox pane, drag an Image into the remaining whitespace in the rectangle, next to the product details.  (You did keep some space available, right?)

Again we get the familiar Image Properties popup.  Give it a good name, like ProductImage, and select the image source that we haven’t used yet, Database.  In the Use this field dropdown, select LargePhoto, and select image/gif as MIME type.

Note: the images are stored as GIF.  You can verify this by running a select on the Production.ProductPhoto table.  Looking at the LargePhotoFileName field we see that the extension is .gif.

There one textbox on the General page that’s still blank.  That one is called Tooltip.  Click the fx button next to it and enter following formula:

=Fields!Product.Value

Click sufficient OK buttons until the properties dialog is gone, then resize the image placeholder so that it occupies the remaining whitespace.

Here’s what the result looks like in preview:

The final report, with a tooltip on the product image

When hovering the mouse pointer above the product image, you’ll get a nice tooltip.

Conclusion

In this article I have illustrated the three possible methods of adding an image to your Reporting Services report.

Have fun!

Valentino.

References

BOL: Adding Images to a Report

  • Share/Bookmark

Tags: , , , ,

As a BI developer I regularly use Excel to analyze my data on correctness.  In Excel it’s easy to filter out unrelated records, as you’ll probably already know.

How do I get my data in Excel?

With a simple copy/paste.  Since SQL Server 2008, you can just right-click on the results in the Management Studio and select Copy with Headers.  This will put the selected data into memory, including the column headers.

SSMS 2008: right-click on results to Copy with Headers

You can even make a selection in the Results pane and only copy that, as shown in the screenshot above.  Here’s what it looks like when pasted into Excel:

Data with headers pasted into Excel

How do I make the columns wider?

Excel will not expand the column widths by default, shown in that last screenshot.  You could manually enlarge them one by one, but that’s not convenient when you last filled column is called “DI”.  Instead, I use the following trick.  First I select the whole sheet by clicking on the grey square in the top-left corner:

Select the whole sheet using the grey square

Then I double-click on the vertical divider in between columns A and B.  This will adjust all column widths so that the data fits nicely.

Double-click the divider in between two columns to adjust width automatically

Does Excel know SELECT DISTINCT?

The other day I needed to filter out all duplicate records out of a really large dataset.  I first tried by activating the regular filter.  This is an Excel feature that I use a lot.  Just clicking the Filter button on the Data ribbon will add the small dropdowns next to each column’s header:

Excel's Filter functionality

But in those dropdowns I couldn’t find an option which would get me to the required result.

(Note that I added a couple of duplicate records in my dataset.)

After a little search, I found the magic checkbox!  I didn’t know that it could be so easy.

To find it, click the Advanced button in the Data ribbon.

The Advanced button on the Data ribbon

In the Advanced Filter window that appears you see a checkbox called Unique records only.

Advanced Filter: Unique records only checkbox

There’s your SELECT DISTINCT!

And here’s the sheet with the duplicates filtered out:

Excel sheet after filtering out duplicate records with the "Unique records only" option

Do you also see that the filter dropdowns are gone, although I did not remove them?  The Advanced Filter cannot be used in combination with the regular filter.  Once you activate the regular filter by clicking the Filter button, your duplicates will re-appear.

If you want to further filter your data after filtering out all duplicate records, you should select the Copy to another location option in the Advanced Filter window.  Copy it below your data on the same sheet and apply the regular filter on that new set of data.

Filtered set of data below full dataset on same sheet

Copying to another sheet is not supported.  Apparently you can only copy to the active sheet.  Should you try it out anyway, you’ll get this message:

Warning: You can only copy filtered data to the active sheet

Happy filtering!

Valentino.

  • Share/Bookmark

Tags: ,

Lately I’ve been using Visual Studio Team System 2008 Database Edition with the GDR R2 add-on, also known as “The Data Dude”, to compare databases with each other.

And during a schema compare I noticed that the dude can be quite stubborn now and then.

Here’s an example and an explanation of why, as far as I can tell, the dude is behaving that way.

The Scenario

At work we use the data dude’s schema compare to prepare our deployment scripts.  The most fun can be had when preparing a script to deploy the latest finalized version, version x, from the Development (DEV) to the Acceptance (UAT) environment.

In our DEV database there have already been changes for the next version, version x+1, so these changes need to be skipped.

Now imagine the following situation:

  • a table called TableA exists
  • a new field called NewFieldA was added to TableA
  • a view called ViewA exists and was modified – it uses TableA but not NewFieldA

The changes to ViewA belong with version x, while the changes to TableA are part of version x+1.  So the script that I’m going to generate should contain an ALTER VIEW ViewA … but not an ALTER TABLE TableA ….

The Behavior

Using the dude, I open up the dropdown on the line that states Different definition on TableA and I select Skip.  The value gets changed from Update to Skip.  However, when the focus changes to another line in the grid the value changes back to Update!  Furthermore, it’s now grayed out, impossible to change!

The first time that you see this happening you can’t believe your eyes and try it once more, first closing everything and then re-doing your actions.  Just to see if you didn’t select the wrong option somewhere.  But alas, the dude remains stubborn!

Tip: if you have just executed the compare and you find that some update actions cannot be modified, try to do a refresh (right-click on the background of the comparison window and click Refresh).  I noticed that after the refresh you will get the dropdown for these lines and thus will be able to change the Update Action.

The Explanation

It took me a while to realize, but here’s what seems to be happening.  Like with all things stubborn, it helps if you know what the thing in question is actually being stubborn for.

The dude knows that ViewA depends on TableA and it also knows that both ViewA and TableA where changed.  However, the dude is a little short-sighted, it doesn’t know that NewFieldA is not used in ViewA.  It doesn’t know that the change to ViewA was not related to the change on TableA.  So it automatically assumes that you’re making a mistake and prevents you from messing up.  It thinks, “You stupid user, I know it better, I am smart software!  And that’s not all, I AM THE DATA DUDE! And if there’s one thing you don’t want, that’s to mess with the Data Dude!”.

Here’s a screenshot of the dude’s stubbornness:

Grayed-out Update Action: the result of a stubborn data dude

In this particular case it refused to skip certain new tables, even though there were no dependent objects that needed those tables!  The first two lines show the grayed-out items.  These tables were located under a new schema and I had to put the schema’s Update Action to Skip to get the tables be skipped as well.  The Update Action for the tables was changed automatically when I modified the action for the schema.

But, if you ask me, I don’t see why I shouldn’t be able to Skip a new table even when I am going to create the new schema to which it belongs?  It’s the table that depends on the schema, not the other way around.

The Feature Request

It would have been great if I could manually override the dude’s decision.  As that was not the case, I have to manually change the generated script so that it only contains the changes that I want.  As long as it’s technically feasible, I think manual changes to the Update Action should be allowed.

One More Tip

Take a good look at the following screenshot.

Beware the miniature exclamation mark in the icon's corner!

By default, the node that reads Different dependencies was collapsed.  So you think, good, the line says Skip so nothing will happen.  Except, did you notice that minuscule exclamation mark in the bottom-left corner of the icon?  That means that other actions may be happening further down the dependency tree, as illustrated in the screenshot.

So don’t let the dude mislead you into thinking nothing will happen when it says Skip.  Sometimes the actual actions are hidden further down and you need to open up every node with the exclamation mark on the icon to verify if that’s what you want.

Have fun training the dude,

Valentino.

  • Share/Bookmark

Tags: , ,

In this article I will show you a couple of different T-SQL queries to fetch aggregated data.  The main purpose is to illustrate how the OVER clause can be used to aggregate data.

For the examples I will use data from the AdventureWorks2008R2 database, available at CodePlex.

The Data

The AdventureWorks 2008 R2 database contains a view called Sales.vSalesPerson.  This is the data with which I’ll be working in the examples below.  Here’s what it looks like:

My Working Data

I’ve hidden some fields so that all the relevant ones are in view.

The Scenario

Your manager has asked you to create one query, to be executed on the Sales.vSalesPerson table, that returns a list of:

  • all employees (FirstName, LastName, JobTitle, CountryRegionName, StateProvinceName, City),
  • their sales of last year (SalesLastYear),
  • the sum of the sales of last year for their country,
  • the average of the sales of last year compared to all employees with the same type of phone (PhoneNumberType)
  • the overall average and sum of the sales of last year.

Using Derived Tables

No problem you say, coming right up.  So you start building your query, retrieving all fields as requested.

After quite some typing, here’s what your query looks like:

select S.FirstName, S.LastName, S.JobTitle, S.PhoneNumberType, S.CountryRegionName,
    S.StateProvinceName, S.City, S.SalesLastYear,
    GeographicSales.SalesLastYearGeographic_SUM,
    SalesByPhoneType.SalesLastYearByPhoneNumberType_AVG,
    SalesSUM.SalesLastYear_AVG, SalesSUM.SalesLastYear_SUM
from Sales.vSalesPerson S
--Derived Table 1: the overall aggregates
cross join (
    select SUM(SalesLastYear) SalesLastYear_SUM, AVG(SalesLastYear) SalesLastYear_AVG
    from Sales.vSalesPerson
) SalesSUM
--Derived Table 2: the aggregate on Country level
inner join (
    select CountryRegionName, SUM(SalesLastYear) SalesLastYearGeographic_SUM
    from Sales.vSalesPerson
    group by CountryRegionName
) GeographicSales on GeographicSales.CountryRegionName = S.CountryRegionName
--Derived Table 3: the aggregate on phone type
inner join (
    select PhoneNumberType, AVG(SalesLastYear) SalesLastYearByPhoneNumberType_AVG
    from Sales.vSalesPerson
    group by PhoneNumberType
) SalesByPhoneType on SalesByPhoneType.PhoneNumberType= S.PhoneNumberType;

The main query is retrieving all fields as requested.  Further down there are three derived table queries, each one retrieving aggregates on a different level.

The first derived table is retrieving the overall aggregates.  These are cross-joined with every record in our main query so for each record the totals will be the same, which is what we want.

The second derived table retrieves the aggregates on Country level, including the CountryRegionName.  This is done using the conventional GROUP BY method.  The CountryRegionName is the key on which the derived table is joined to the main table.

The third derived table uses a similar system, this time for the aggregate on phone type.

And here’s the query’s output:

Output of the query using subqueries

Happy with this result, you go up to the cafeteria to finally have lunch with your colleagues (who left 15 minutes earlier but you wanted to get your query finished first).

Using The OVER Clause

During lunch you explain to your peers what kind of funny request you got from management and told them how you solved it.

Then one of them speaks up and says: “Want to know how you can avoid all that typing?  Use the OVER clause!  I’ll show you when we are back at our desks.”

After lunch, here’s what your colleague helps to produce:

select S.FirstName, S.LastName, S.JobTitle, S.PhoneNumberType, S.CountryRegionName,
    S.StateProvinceName, S.City, S.SalesLastYear,
    SUM(SalesLastYear) OVER (PARTITION BY CountryRegionName)
        SalesLastYearGeographic_SUM,
    AVG(SalesLastYear) OVER (PARTITION BY PhoneNumberType)
        SalesLastYearByPhoneNumberType_AVG,
    SalesSUM.SalesLastYear_AVG, SalesSUM.SalesLastYear_SUM
from Sales.vSalesPerson S
--Derived Table 1: the overall aggregates
cross join (
    select SUM(SalesLastYear) SalesLastYear_SUM, AVG(SalesLastYear) SalesLastYear_AVG
    from Sales.vSalesPerson
) SalesSUM;

As you can see, derived tables 2 and 3 are gone.  They have been replaced with the OVER clause, in combination with PARTITION BY.  What you say with the OVER clause is: “partition the dataset by the fields specified in the PARTITION BY and apply the aggregation on those partitions”.  Another word for this is aggregate window function.

As you like the approach, you ask your co-worker how you can get rid of that cross join.  He doesn’t really know but then another colleague who overheard your conversation says: “On this blog the other day I read that you can use the OVER clause and partition by anything you want.  As long as it’s a constant, it will work!”.

So you give that a try and you end up with the following final query:

select FirstName, LastName, JobTitle, PhoneNumberType, CountryRegionName,
    StateProvinceName, City, SalesLastYear,
    SUM(SalesLastYear) OVER (PARTITION BY CountryRegionName)
        SalesLastYearGeographic_SUM,
    AVG(SalesLastYear) OVER (PARTITION BY PhoneNumberType)
        SalesLastYearByPhoneNumberType_AVG,
    AVG(SalesLastYear) OVER (PARTITION BY 'duh') SalesLastYear_AVG,
    SUM(SalesLastYear) OVER (PARTITION BY 1) SalesLastYear_SUM
from Sales.vSalesPerson;

As illustrated in the example, you can use any constant value to calculate overall aggregates over the whole dataset using the OVER clause.

You happily thank your colleagues and tell them that next time you’ll be able to join them for lunch on time.

About a week later you’re explaining to one of your friends how you’ve gotten to know the OVER clause. After hearing how you use it to aggregate over the whole dataset, he smiles and says: “I know how you can simplify it even more! Don’t partition at all!”.

Taking a closer look it turns out that the PARTITION BY is actually optional:

Ranking Window Functions
< OVER_CLAUSE > :: =     OVER ( [ PARTITION BY value_expression, ... [ n ] ]            <ORDER BY_Clause> )Aggregate Window Functions
< OVER_CLAUSE > :: =     OVER ( [ PARTITION BY value_expression, ... [ n ] ] )

See those square brackets? Means it’s optional.

So here is the real final query:

select FirstName, LastName, JobTitle, PhoneNumberType, CountryRegionName,
    StateProvinceName, City, SalesLastYear,
    SUM(SalesLastYear) OVER (PARTITION BY CountryRegionName)
        SalesLastYearGeographic_SUM,
    AVG(SalesLastYear) OVER (PARTITION BY PhoneNumberType)
        SalesLastYearByPhoneNumberType_AVG,
    AVG(SalesLastYear) OVER () SalesLastYear_AVG,
    SUM(SalesLastYear) OVER () SalesLastYear_SUM
from Sales.vSalesPerson;

Conclusion

When you compare the final query with the first one, tell me, which one would you prefer to maintain?  Do you prefer to have lunch with your peers or to arrive late and miss all the fun?

Have fun!

Valentino.

References

OVER Clause (Transact-SQL)

Join Fundamentals

SELECT (Transact-SQL)

  • Share/Bookmark

Tags: , , ,

Earlier I wrote an article about the new lookup functions that ship with SQL Server 2008 R2.  Today I’m going to show you another new feature of SSRS 2008 R2, this time in the visualization department.  This feature is the Data Bar.  With this new component it’s fairly easy to make your tabular data a lot more visual, and thus easier to interpret.  And here’s how to do it.

I’ll be starting from the report that I created in Your First OLAP Report.  That allows me to focus on the visualization part, without first needing to build a table report.  (Okay, I admit, it’s not 100% the same report – I’ve modified the colors a bit because I felt the green was too dark.) But obviously this method will work with any report that’s showing data in a table.

Furthermore I’m running SQL Server 2008 R2 Nov CTP, 64-bit, and I’m using the BIDS to develop the report.

The final result can be downloaded from Skydrive here.

Implementing The Data Bar

The report that we’re using is showing some sales figures grouped in three levels: Country, State/Province and City.  We’re going to add an extra column on the right of the table to contain the data bar.  Let’s first explore the Toolbox pane to discover the new visualization report items.

The new Reporting Services 2008 R2 report items: Map, Data Bar, Sparkline and Indicator

The new items have been highlighted in yellow.  As you can see, besides Data Bar there’s also Sparkline, Map and Indicator.  But those are not on topic now.

To add a Data Bar, simply drag it from the Toolbox into a textbox on the report.  Doing that will show the following popup window:

Select Data Bar Type window

From left to right, there’s Bar, Stacked Bar and 100% Stacked Bar.  And those are also available in vertical direction, Column.  I’m going to use the regular Bar as highlighted in the screenshot.

The Data Bar has now been added to the report, but it doesn’t do anything yet.  We first need to tell it what data to visualize.  Clicking it once will select it, clicking it once more will show us the following Chart Data popup:

Data Bar: Chart Data

Click the plus icon to get a drop-down of fields in the dataset.  Select the numeric field that you want to visualize, in my case that’s the Reseller_Sales_Amount.

Data Bar: Chart Data with Reseller_Sales_Amount selected=

By default, the aggregation used on the data is Sum.  But there are other options as well, have a look at the dropdown next to the [Sum(Reseller_Sales…  In the example here I’m going to keep the Sum.

Other aggregation functions of the Data Bar

The report Preview looks like this:

Report preview with the data bar

In this report we can easily see what states have got a higher sales amount: those with the longer bars.

Let’s add bars for the Country level as well.  There are two ways to achieve that: you can either drag a new Data Bar onto the report or you can just copy/paste the textbox containing our first Data Bar.

To make the report easier to read I will change the color of the Data Bar to the color of the group’s background.  Setting up the color of the Data Bar is done as follows: select the Data Bar so that you get the Chart Data popup.  In the Properties pane you should see that the Chart Series is selected.  Then right-click on the bar and select Series Properties.

The right-click menu on the Data Bar.

In the Series Properties window, select the Fill page and select your favorite color.  If you want you can also use a gradient fill or pattern.  You can even use a Switch statement and color them differently depending on their value, similar to the method that I used in my SSRS and MDX: Detecting Missing Fields article.  Well, in short, any expression that you can think of and results in a color will work fine!

Series Properies: setting up the Fill color

With the second data bar and custom coloring set up, here’s the rendered report:

Report with data bar on two grouping levels

Hang on, is that correct? As you can see, the data bar for California is longer than the one for Canada while the sales amount for Canada is definitely the higher one.  An even nicer example is United Kingdom with only one state England.  Both amounts are equal yet their data bars are certainly not.

Well, this is because the data bar by default uses the same scope as the group where it’s put.  Canada and United Kingdom are in the Country group and all Country data bars compare nicely to each other.  California and England are in the State/Province group and also compare nicely to each other!

Depending on the report’s requirements this may or may not be the desired effect.  But I wouldn’t be mentioning this if there weren’t any other options, would I?

Setting The Maximum Value Of The Horizontal Axis

A correct column label for our current column would be “% of group”.  Let’s add a second column which will show the percentage of the row compared to the total of the dataset, “% of overall total”.  After adding the extra column, copy/paste the data bars from the first column over into the new textboxes in the second column.

To get what we want, we need to tell the data bar that the maximum for the horizontal axis is the total of the dataset and not the total of the grouping level of the table.  So, right-click one of the data bars in the new column and select Horizontal Axis Properties.

Right-click menu on tablix textbox with a data bar

As you can see, the default for the Maximum value is set to “Auto”.  Note that the name of my tablix is “Tablix1”, as shown in the Align axes in dropdown.

Horizontal Axis Properties

Let’s replace the Auto as Maximum with the following expression:

=Sum(Fields!Reseller_Sales_Amount.Value, "Tablix1")

This expression says that we need the sum of the Reseller_Sales_Amount, scoped over the whole tablix.

Having changed the Maximum value on both data bars gives us the following preview:

Report rendered with status bar scoped over whole dataset

As you can see, this time around the data bars for England and United Kingdom have gotten the same size.  Bars from different grouping levels can now be compared with each other.

Showing Labels On The Data Bars

In this report it would be interesting to add a label that displays the percentage to the status bars, so let’s do that.

For the percentage calculation of the first column of data bars we need to get the total of the current group and divide that by the total of the group one level higher.  For the State/Province level that gives us the following expression:

=Sum(Fields!Reseller_Sales_Amount.Value)
    / Sum(Fields!Reseller_Sales_Amount.Value, "grpCountry")

The grpCountry refers to the name of the grouping one level higher, as shown in following screenshot:

The groupings as defined on my example tablix

And for the Country level we need this expression:

=Sum(Fields!Reseller_Sales_Amount.Value)
    / Sum(Fields!Reseller_Sales_Amount.Value, "Tablix1")

The only difference with the previous expression is the scope.

“Ok, so where do I type those expressions?” I hear you think.  Right-click the Data Bar (after first selecting the textbox that contains it) and choose Show Data Labels.

Show Data Labels in right-click popup on Chart Series

That adds a label to the chart but we still need to configure it to show the percentage.  By default it uses the value as label.  This can be verified in the Chart Series properties:

Chart Series properties: UseValueAsLabel

The UseValueAsLabel is set to True, confirming what I just stated.  Furthermore, activating the Show Data Labels option also set the Visible property to True.  So another way of adding the label is by just setting this property to True.

Let’s now configure that label.  Right-click it and select Series Label Properties.

Right-click menu on label

The General page of the Series Label Properties window allows you to define the Label data.  That’s where you need to enter the expression that I mentioned earlier.

After you’ve entered the expression, the BIDS will ask you if you want to set the UseValueAsLabel to False, so click Yes.

Do you want to set the UseValueAsLabel to False?

With the label selected as shown in the next screenshot you can use the formatting toolbar buttons to give it a decent font and all.

Data Bar with Label selected

For our second column of data bars the expressions need to take into account that we need the percentage as compared to all the data, unrelated to the level.  This actually makes it easier because we can use the same expression on both the State/Province and Country level.  Furthermore, it’s exactly the same expression as the one used on Country level in the first data bar column (i. e. the second expression above) because Country is the highest level.

Let’s have a look at the result in preview:

Preview of report with labels on data bars

Woah, looks like we forgot something doesn’t it?  We forgot to format the label as being a percentage!  There are two ways to get that done: through the Number page on the Series Label Properties window:

Series Label Properties: formatting number as percentage

Or by using the properties with the label selected (the Properties pane should show Chart Series Labels in the dropdown on top):

Chart Series Labels properties

I’ve opted for a percentage without any decimals.

Let’s have another look at that report:

The final report with two data bar columns

That sure looks better doesn’t it?  The status bars on the right compare over the grouping levels while those on the left compare with each other within the same group.  And they all have a clear label indicating what they represent.

Have fun putting those data bars on your reports!

Valentino.

References

BOL 2008R2: What’s New (Reporting Services)

BOL 2008R2: Sparklines and Data Bars

An Introduction to Data Bars in SQL Server Reporting Services 2008 R2

  • Share/Bookmark

Tags: , , , ,

Almost a year ago I wrote a small blog post to ask everyone to vote on a certain Connect item related to the linking of two datasets in a Reporting Services report.  At this moment there have been 87 positive votes on the request.

Now that SQL Server 2008 R2 has been RTM’ed, it’s time to demonstrate new functionality in SSRS that allows the retrieval of data from another dataset, a dataset not linked to your data region.  It’s not completely the same as actually joining two datasets, but it’s better than nothing – and it works!

In this article I’ll show you how exactly this works by using the new Lookup, LookupSet and MultiLookup SSRS functions.

I’m using the AdventureWorks 2008R2 relational database and the AdventureWorksDW 2008R2 data warehouse, available from CodePlex.  The main data is coming from the data warehouse while all the lookups are done on the relational database.

Setting Up A Basic Table Report

In my report I’ve created a dataset called dsInternetSales.  This dataset is retrieving data from the AdventureWorksDW2008R2 data warehouse using the following query:

select PROD.EnglishProductName, PROD.ProductAlternateKey, PROD.ListPrice,
    PSC.EnglishProductSubcategoryName, PC.EnglishProductCategoryName,
    S.OrderQuantity, S.SalesAmount
from FactInternetSales S
inner join DimProduct PROD on S.ProductKey = PROD.ProductKey
inner join DimProductSubcategory PSC
    on PROD.ProductSubcategoryKey = PSC.ProductSubcategoryKey
inner join DimProductCategory PC on PSC.ProductCategoryKey = PC.ProductCategoryKey

Using that dataset, I’ve set up a Table as shown in following screenshot.

Basic Table Report

The Details group has been set up to group on ProductAlternateKey.  On top of the Details group, I’ve grouped on EnglishProductSubcategoryName and the top-level group is grouping on EnglishProductCategoryName.  The Order Quantity column is displaying the sum of the OrderQuantity values for each ProductAlternateKey.

The result is a report that shows all (internet) sales per product, without any filtering.  Very useful report if you want to know how many items your company has sold since it’s existence.  Okay, management would probably like to see some filtering on here, but that’s not the purpose of this article.

Here’s what it looks like in Preview:

Basic Table Report Rendered

 

Adding Data From Another Database

Imagine now that you need to add an extra line under each product, containing the product description.  But this description is not available in the data warehouse.  In fact it could even be stored on another server.

In the example here we will retrieve the description from the AdventureWorks2008R2 relational database.

Setting Up The Second Dataset

I’ve created an additional dataset called dsProductInfo, using the following query:

select P.ProductNumber, PD.Description
from Production.Product P
inner join Production.ProductModel PM on P.ProductModelID = PM.ProductModelID
inner join Production.ProductModelProductDescriptionCulture PMPDC
    on PMPDC.ProductModelID = PM.ProductModelID
    and PMPDC.CultureID = 'en'
inner join Production.ProductDescription PD
    on PMPDC.ProductDescriptionID = PD.ProductDescriptionID

Not only does it retrieve the product’s description, we’re also fetching the ProductNumber.  Here’s what part of the result looks like:

Result of product description query

The reason that we’re retrieving ProductNumber as well is because it matches with the ProductAlternateKey which we’ve retrieved earlier in our first dataset.  And this is very important because that’s the key on which we’re going to link the datasets.

Using The Lookup Function

I’ve added an additional row inside the Details group and inserted a Placeholder to retrieve the product’s description, using the new Lookup function.

Additional row inside Details group

So, what does the Placeholder’s expression look like?  Here it is:

=Lookup(
    Fields!ProductAlternateKey.Value,
    Fields!ProductNumber.Value,
    Fields!Description.Value,
    "dsProductInfo"
)

As you can see, the Lookup function requires four parameters.

The first parameter is the key value in your current dataset, the dataset used by the table data region.  In our case that’s the ProductAlternateKey field in the dsInternetSales dataset.

The second parameter is the name of the key field in the second dataset, the one on which the lookup will happen.  In our case that’s the ProductNumber in the dsProductInfo dataset.

The third parameter is the field from the second dataset that you’re wanting to retrieve using the lookup, in our case the Description field from dsProductInfo.

And finally, the last parameter is the name of the dataset on which you want to do the lookup.

Please note that parameter number four is a string parameter, so the value needs to be enclosed by double quotes.  If you forget about that, you’ll get a couple of nice error messages like these:

[rsInvalidLookupScope]  The Value expression for the textrun ‘Textbox29.Paragraphs[0].TextRuns[0]’ has a scope parameter that is not valid for a lookup function. The scope parameter must be set to a string constant that is the name of a dataset.

So, don’t forget the quotes.

With the Lookup call set up as explained, here’s the updated report Preview:

Rendered report with product descriptions added through Lookup

How cool is that huh?  Each product has gotten a description, retrieved from another database, and still in the same table data region.  Before R2 of SQL Server 2008, this wasn’t possible to achieve (well, not easily anyway) and now it’s actually fairly simple!

Is that all?  Ha, I was kinda hoping that you were going to ask that.  No, it’s not all, there are two more new lookup functions: LookupSet and MultiLookup.

More Lookups: The LookupSet Function

In case you’re wondering, in the Expression Builder the new lookup functions are located under the Miscellaneous node:

Expression Builder: the new lookup functions are under Miscellaneous

Let’s say that you want to add another detail row, this time it needs to show all colors in which the product is manufactured.  Again this additional info is coming from the AdventureWorks relational database.

I’ve created a dataset called dsProductColors using the following query:

select distinct LEFT(P.ProductNumber, 6) as ProductCodeWithoutColorAndSize,
    P.Color
from Production.Product P
where P.Color is not null

When looking at the product codes, I noticed that for the products which are available in several colors and sizes, the last four characters represent the color and size.  Which means the first six characters define the product itself, without color or size.  That’s why the query is using the Left function to create a product code of only the first six characters of the ProductNumber.  Using the distinct keyword, we remove any duplicate records.

(Please note that I’m not 100% sure if this logic applies to all products but for this demo it’s fine.)

Here’s what the query retrieves:

List of product colors per product

As you can see, for some products there’s more than one record.  And that’s exactly what the LookupSet function was made for: it retrieves a set of data based on the key given to it.  This is different from the Lookup, where for each key value it would fetch only one value.

Again I’ve added an additional row inside the Details group and used a placeholder with the following expression:

=Join(
    LookupSet(
        Left(Fields!ProductAlternateKey.Value, 6),
        Fields!ProductCodeWithoutColorAndSize.Value,
        Fields!Color.Value,
        "dsProductColors"
    ),
    ", "
)

The LookupSet call itself looks very similar to the Lookup, with the same four parameters.  I’ve used the Left function on the first parameter to apply the same logic to the ProductAlternateKey as we did with the ProductNumber.

However, there’s one important difference: the call of the Join function.  This is needed because the LookupSet is returning a set, or better, a VariantArray, not just a single value.  And an array cannot be visualized without first concatenating the values somehow.  With the Join, we can concatenate the different values, using a comma as separator.

And here’s the resulting report:

Our report displaying the list of colors for each product

With the first two lookup functions covered there’s one more to go.

Just One More Lookup: The MultiLookup Function

Guess what crazy request the business people have come up with this time?!  The report should have a multi-value filter on region, and for each region selected, the top of the table should list the number of shops opened in the first year in those regions.  For example, if the first shop in France was opened in 1970 and in that same year there were two other shops opened in France, the report should state “France: 3 shop(s) opened in 1970”.

Ow, and that list should be located right under the main header so deciding to use a textbox outside of the table is not a good idea :-)

Sounds like we can use the MultiLookup function for this request.  But let’s first set up the filter.

I’ve created a dataset called dsRegions, using the following query on the data warehouse:

select DST.SalesTerritoryAlternateKey,
    DST.SalesTerritoryCountry + ' - ' + DST.SalesTerritoryRegion as CountryRegion
from DimSalesTerritory DST
where DST.SalesTerritoryAlternateKey > 0

Then I’ve added a multi-value parameter called Regions with the Available Values coming from the dsRegions dataset.

Regions parameter: the Available Values

This parameter can now be used in our main dataset.  Here’s the updated query:

select PROD.EnglishProductName, PROD.ProductAlternateKey, PROD.ListPrice,
    PSC.EnglishProductSubcategoryName, PC.EnglishProductCategoryName,
    S.OrderQuantity, S.SalesAmount
from FactInternetSales S
inner join DimProduct PROD on S.ProductKey = PROD.ProductKey
inner join DimProductSubcategory PSC
    on PROD.ProductSubcategoryKey = PSC.ProductSubcategoryKey
inner join DimProductCategory PC on PSC.ProductCategoryKey = PC.ProductCategoryKey
inner join DimSalesTerritory DST on S.SalesTerritoryKey = DST.SalesTerritoryKey
where DST.SalesTerritoryAlternateKey in (@Regions)

The only difference with the previous query are the two last lines: we add DimSalesTerritory to the joins and filter it on SalesTerritoryAlternateKey.

Don’t forget to set up the parameter.

Setting up the parameter on dsInternetSales

With the filter implemented, let’s get started on that extra lookup. 

First we need to add the dataset containing the data that we need.  I’ve created a dataset called dsShopsOpenedInFirstYear, using the following query on the relational database:

with ShopOpened as
(
    select T.TerritoryID, T.Name Territory, S.Name ShopName,
        S.[Demographics].value('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";
        (/StoreSurvey/YearOpened)[1]', 'integer') AS [YearOpened]
    from Sales.Store S
    inner join Sales.SalesPerson SP on S.SalesPersonID = SP.BusinessEntityID
    inner join Sales.SalesTerritory T on SP.TerritoryID = T.TerritoryID
),
FirstShopOpened as
(
    select MIN(YearOpened) YearOpened, TerritoryID
    from ShopOpened
    group by TerritoryID
)
select SO.TerritoryID,
    SO.Territory + ': ' + CAST(COUNT(*) as varchar(100)) +
    ' shop(s) opened in ' + CAST(SO.YearOpened as char(4)) as ShopString
from FirstShopOpened FSO
inner join ShopOpened SO on SO.TerritoryID = FSO.TerritoryID
    and SO.YearOpened = FSO.YearOpened
group by SO.TerritoryID, SO.Territory, SO.YearOpened

This query uses a couple of Common Table Expressions to get to the result as we need it.  The first CTE, ShopOpened, creates a list of all shops with their territory and the opening year.  The second CTE, FirstShopOpened uses the ShopOpened CTE to retrieve the first opening year for each territory.

And finally the main query uses both CTEs to create the following result:

Number of shops opened in first year per territory

For each territory we’ve constructed a string that shows how many shops were opened in the first year of that region, and in what year it happened.  Coincidentally all regions had shops opened in 1970.

The TerritoryID corresponds with the SalesTerritoryAlternateKey, which is the value of our Regions parameter.

I’ve added an extra row under the top row in the table data region, and I’m using the following expression in that row:

=Join(
    MultiLookup(
        Parameters!Regions.Value,
        Fields!TerritoryID.Value,
        Fields!ShopString.Value,
        "dsShopsOpenedInFirstYear"
    ),
    "<br>"
)

The MultiLookup takes four parameters, just like the two previous lookup functions.  They are all the same, except for the first one.  It may not be very obvious in the example here, but the Parameters!Regions.Value is in fact not just a single value.  It’s an array because we’ve set up the parameter as being multi-valued.

And that’s exactly what the MultiLookup function requires.  Here’s the description for that first parameter, as stated in the Books Online:

(VariantArray) An expression that is evaluated in the current scope and that specifies the set of names or keys to look up. For example, for a multivalue parameter, =Parameters!IDs.value.

Just like the LookupSet function, MultiLookup returns a VariantArray, so we use the Join function to concatenate the values.

Interesting to note here is that I’m adding the break HTML tag as separator.  I want the result of the expression to be treated as HTML, so that each value retrieved ends up at a new line in the textbox.  To get this to work as expected, you need to tell the Placeholder that the resulting value should be treated as HTML:

Using HTML in a Placeholder

Everything is now set up to have another report Preview.  The following screenshot shows the report with the data filtered on Canada, France and Australia:

The final report performing three different lookups

Seems to be working fine, doesn’t it?

Okay, that’s it for now, have fun looking up that data!

Valentino.

References

BOL2008R2: Lookup Function

BOL2008R2: LookupSet Function

BOL2008R2: MultiLookup Function

  • Share/Bookmark

Tags: , , , ,

Consider the following scenario. You’ve been developing some SSIS packages, nicely making sure that all errors and warnings are gone from the Error List window. So the next step is to deploy the packages to the server and schedule them. After some days you decide to have a look at the sysssislog table and discover that it’s filled with OnWarning records!

Well, this is an extreme example of course. Careful developers would notice the warnings in either the Output window or the Progress tab while testing the package in the BIDS.

All these warnings are indicating that some columns in your data flows have become obsolete.  Here’s an example:

The output column “BKCustomerID” (13529) on output “OLE DB Source Output” (12088) and component “OLE_SRC MySource” (12077) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.

The problem with these warnings is not only that they slow down the transformation process but they obfuscate the SSIS log with thousands of records, making it difficult for you when you need to investigate an issue. And this is something that you really don’t want, especially when the business people are calling you to ask why the data is not up-to-date on the production servers.

If you ever wished that you’d see these warnings whilst developing the package instead of at runtime only, now’s your chance to make a change. Or better, to ask Microsoft to implement the change.

Follow this link to cast your vote on Connect!

Thank you Jamie for pointing this out.

Have fun!

Valentino.

  • Share/Bookmark

Tags: , ,

A while ago I wrote an article about how you can use the SQL Server Import and Export wizard to import Excel data into a SQL Server database.  In this sequel I’m going to show you some problems which you may encounter when using that wizard.  Just like any good old wizard, he’s only as good as his recipes.  If a recipe is missing an ingredient, the resulting potion will probably not behave as expected and before you know it it explodes in your face.

I’ll be using the same Excel file as in my previous article.  In case you’re having some problems understanding the data or locating certain screens mentioned in this article, I recommend you to first read the prequel.

Some Common Wizard Pitfalls And Their Solution

Drop And Re-Create Destination Table

Imagine that for your import process you’re planning to import that Excel data regularly.  So the logical step is to make sure that each time the process runs, it starts from scratch.  Following that, what seems to be an interesting checkbox is located on the Column Mappings screen.  This checkbox is called Drop and re-create destination table.

Column Mappings: Drop and re-create destination table

So you activate that checkbox and execute the package.  However, it ends with an error in the Executing phase:

SQL Server Import and Export Wizard: error while Executing

Here’s what the error details say:

image

Error 0xc002f210: Drop table failed… Cannot drop the table dbo.ProductList$ because it does not exist or you do not have permission.

So why does this error occur?  Let’s examine the SSIS package that was generated.  This is what the Control Flow looks like:

A Control Flow that doesn't take a non-existing table into account

New here is that Drop table SQL Task.  Taking a closer look at the task, here’s the query:

drop table [dbo].[ProductList$]
GO

So the first task is to drop the table.  This flow does not take into account that the table may not yet exist, as is the case here, causing the error.

However, do you notice the blue arrow connecting the Drop table task with the next SQL Task?  That means that it doesn’t require the execution of the task to end successfully, the flow will continue even when an error occurred (unlike when the green connector – meaning Success – is used).

Double-clicking the blue connector gives us the Precedence Constraint Editor window:

Precedence Constraint Editor

Indeed, the value for the constraint is set to Completion.

This also explains why the wizard continued executing all phases, even though an error occurred.  And the end result was that the table is really created, containing the data as expected.

If you’d like to avoid the error and handle the non-existing table, you could replace the query in the Drop table SQL Task with the following one:

if exists
(
    select * from INFORMATION_SCHEMA.TABLES T
    where T.TABLE_NAME = 'ProductList$'
        and T.TABLE_SCHEMA = 'dbo'
)
    drop table [dbo].[ProductList$];

It first checks if the table exists, taking schema into account, and will only perform the DROP TABLE statement if the table actually exists.

Field Length Insufficient

On to the next possible issue.  When examining our Excel sheet we find out that the content of the Class column is either blank or just one character:

Possible values of the Class field

So we decide to change its type to char(1), again using the Column Mappings screen.

Column Mappings: change type to char(1)

However, clicking Finish at the end of the wizard gives us a fatal error:

SQL Server Import and Export Wizard: Operatoin stopped in Error

And these are the details of the error:

    • Error 0xc02020c5: Data Flow Task 1: Data conversion failed while converting column “Class” (63) to column “Class” (139).  The conversion returned status value 4 and status text “Text was truncated or one or more characters had no match in the target code page.”.(SQL Server Import and Export Wizard)
    • Error 0xc020902a: Data Flow Task 1: The “output column “Class” (139)” failed because truncation occurred, and the truncation row disposition on “output column “Class” (139)” specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

      (SQL Server Import and Export Wizard)

    • Error 0xc0047022: Data Flow Task 1: SSIS Error Code DTS_E_PROCESSINPUTFAILED.  The ProcessInput method on component “Data Conversion 0 – 0″ (131) failed with error code 0xC020902A while processing input “Data Conversion Input” (132). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.  There may be error messages posted before this with more information about the failure.

      (SQL Server Import and Export Wizard)

Wow, that’s quite a long list for such a simple change!  Apparently it wasn’t such a good idea to limit the length of this field.  But why does this error occur?  After all, the values in this field are just one character long.  Except, after taking a closer look at the Excel sheet it turns out that these values contain a trailing space!

One way of avoiding this issue is by specifying a higher length for the field.  In this particular case however, you may decide to modify your Data Flow so that it removes the trailing space.  Here’s what the Data Flow currently looks like:

Data Flow to import Excel data

The Data Conversion transformation takes care of converting the Class column into a DT_STR of length 1.

Data Conversion Transformation Editor

We’re going to replace that Data Conversion Transformation with another one, a Derived Column transformation.  This is one of the transformations which I’m using all the time.

Derived Column Transformation Editor

As you can see in the screenshot, I’ve set it up so that it uses the TRIM() function on the Class input column to remove any leading or trailing spaces.  Furthermore I’m using a cast to DT_STR of length 1 to ensure the correct field type.  The resulting column is called Class_STR, to clearly indicate that it has been converted to DT_STR.

This is the resulting Data Flow:

Data Flow: Destination component complains about lineage ID

Looks like we’ve got an issue with our Destination component now.  Which is quite logical: we still need to tell it that it should use the newly-created Class_STR column.  So double-click the component.

 Restore Invalid Column References Editor

Double-clicking the Destination component will open up the Restore Invalid Column References Editor.  In Available Columns, select the new Class_STR column to replace the Class column which was used previously.

That’s it, your SSIS package will now remove the trailing spaces from the Class column and store it in a column of char(1).

Excel Layout Change

Here’s another common issue when dealing with Excel SSIS imports.  If you’re not 100% in control of that Excel sheet, someone will someday make a structural change to it and it will cause your import process to fail.

In this really simple example I’ve opened up the Excel sheet and renamed the Color column to Colour.  Which is something that may happen in real life: a British person takes over the maintenance of that product list and sees that the Color column is spelled the wrong way, and corrects it without informing anyone.

What does that mean for our import?  Here’s the result when manually executing the package using DTExecUI.exe:

DTExecUI: Package Execution Progress

In this case execution will fail because the package cannot find the Color column in the Excel sheet.  More precisely it says VS_NEEDSNEWMETADATA.  This is a really common error when using Integration Services, but you need to manually update the package to handle such changes.

In other words: try to be in control of that sheet as much as you can, and if possible: set up another way to maintain such data.  For instance by using a Master Data Management system.  But that’s stuff for later on, in future articles.

Conclusion

As long as you’re aware of some of the common issues into which you may run, I still think the Import and Export Wizard is an interesting option to start your first SSIS package.  When running into an issue, I recommend to open up the SSIS package and have a closer look through the Business Intelligence Development Studio.  Then make any changes there so you can handle the errors.

Happy importing!

Valentino.

  • Share/Bookmark

Tags: , , ,

In a previous article I’ve shown you how to import data from an Excel sheet using the OPENROWSET() function.  And I concluded by stating that it’s not the best option when automating your data import.

Today I’ll repeat the Excel data import process by using SQL Server Integration Services, also known as SSIS.

I’ll be using SQL Server 2008 R2, but I’m quite sure that the process is very similar to the first release of 2008, and even to 2005.  The Excel file that I will be importing is the one used in my previous article, and I’ll also refer to some parts of that article, so you may want to have a read over that one when something here isn’t clear.

Furthermore I’m using a Windows 7 64-bit machine, with the ACE 14 driver (beta) installed.  To avoid any discussion about versions and for my own (future) reference, here’s the result of a SELECT @@VERSION:

Microsoft SQL Server 2008 R2 (CTP) – 10.50.1352.12 (X64)   Oct 30 2009 18:06:48   Copyright (c) Microsoft Corporation  Enterprise Evaluation Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )

Create SSIS Package To Import Excel Data

Usually you will start by creating a new package in an Integration Services project, add an Excel source to a new Data Flow, throw in some Data Flow Transformations and end your flow with an OLE DB Destination connecting to your SQL Server.

But that’s not the approach that I’ll take in this article.  I’ll make use of a shortcut (and meanwhile I’m showing you how well integrated some components really are).

First, I’m creating a new database called ExcelImport, using the Management Studio (aka SSMS).  Once the database is created, right-click on it and choose Tasks > Import Data.

Start the Import Data Wizard through Management Studio

This will open up the SQL Server Import And Export Data Wizard.  Like all good wizards, it starts with a Welcome screen containing an introductory text about its purpose – something about “create simple packages that import and export data between many popular data formats including databases, spreadsheets” – and so on.  It also includes a checkbox that says “Do not show this starting page again”.  That’s my favorite object on the page :-)   Okay, I agree, the page is useful for people who have never seen the wizard before and who may have opened it up by accident, but that’s about as far as its use goes methinks.

So, do whatever you like with the checkbox and click Next.  That opens up the Choose a Data Source screen.

SQL Server Import and Export Wizard - Choose a Data Source

In that screen you have several options in the Data Source dropdown.  The one we’re interested in is called Microsoft Excel.  Once that option is selected, the controls further down the screen change into what is shown in the screenshot.  Select your file and the right version, in my case I’ve got an Excel 2007 file.  If your sheet contains a header row, activate the First row has column names checkbox.

Clicking Next will open up the Choose a Destination screen.

SQL Server Import and Export Wizard - Choose a Destination

In that screen, select Microsoft OLE DB Provider for SQL Server as Destination.  Ensure that your SQL Server instance is the right one in the Server Name dropdown and the Authentication is filled out as expected.  The correct database should be selected by default because we did a right-click on it to start the wizard.

Another Next click opens up the Specify Table Copy or Query window.

SQL Server Import and Export Wizard - Specify Table Copy or Query

Here we can choose between either retrieving the full table – all rows, all columns – or writing a query ourselves.  I’ll go for the first option.

Click Next once more to open the Select Source Tables and Views screen.  That name seems a bit weird in the context of an Excel import but I guess that’s not really important here.  Just read it as “Select the sheet that you’d like to import”.

SQL Server Import and Export Wizard - Select Source Table and Views 

I’ll go for the ProductList$ Source.  The sheet in my Excel file is called ProductList.  Note that the Destination is editable – I’m changing the destination table to dbo.ProductList (with the dollar sign removed).

If you’d like to view your data right now you can hit the Preview button.  It opens up a window such as this one:

SQL Server Import and Export Wizard - Select Source Table and Views - Preview Data 

That should look familiar.

A more interesting button is the one called Edit Mappings.  Clicking that button opens a screen that lets you change the destination table’s schema.  By default, all columns that seem to be numeric are mapped to a type of float and all the others are mapped to nvarchar with a length of 255.  Depending on your situation you can either leave it as it is (in case you’re just loading a staging table and want to handle data conversions later on) or you should review each column (for instance when you’re erasing and loading the full table every night and this is the actual table being used by other processes).

Note: whenever a column contains a blank cell, its type will be nvarchar(255), even when all other values are numeric.  Also, if you don’t need Unicode support, don’t use nvarchar but change it to varchar instead.

Here’s the Column Mappings screen with some of the defaults changed:

The Column Mappings window

Everything that I changed has been indicated using the yellow marker.  I’ve changed some field types and lenghts, made one field non-nullable and adapted the destination name.

Clicking OK followed by Next brings us to the following screen, Review Data Type Mapping.

SQL Server Import and Export Wizard - Review Data Type Mapping

This screen gives another overview of all the columns that we’re planning to import.  Note that each column of which we’ve changed the type has gotten a nice yellow warning sign.  This happens because a data conversion needs to occur, and there’s always something that can go wrong when converting data.  The On Error and On Truncation columns show the action that should happen when such an event occurs.  We’ll leave them all at Use Global.  The Global settings are located at the bottom of the screen and are both set to Fail.  That’s the best option at the moment (the only other one is Ignore but that means you won’t get any notification in case of an error or truncation problem).

After clicking Next we end up at the Save and Run Package window.

SQL Server Import and Export Wizard - Save and Run Package

I have activated the Save SSIS Package checkbox and chose the Do not save sensitive data option.  By default it is saved in SQL Server, which is actually the MSDB.  That’s good because we’re going to examine the contents of the package later on so we want to keep it.

The Run immediately checkbox was activated by default.  This will execute the package in the last step of the wizard.

Another Next click and we’re on the Save SSIS Package screen.

SQL Server Import and Export Wizard - Save SSIS Package

Here we can give our package a decent name, such as ExcelImport.  You can also see the server on which I’m saving the package.

The final Next click brings us to the Complete the Wizard screen, woohoo!

 SQL Server Import and Export Wizard - Complete the Wizard

We can see a short overview of what we’ve configured in the previous steps.  Click Finish to execute and save the package!

SQL Server Import and Export Wizard - The execution was successful

And we’ve successfully executed the package!

Taking A Closer Look At The SSIS Package

When connecting to the Integration Services server (through SSMS for instance), I now have a new package in the root of the MSDB folder.

The ExcelImport SSIS package, stored in MSDB

It’s located there because I chose to save the package to SQL Server.

Adding An Existing Package To An SSIS Project

We’re now going to open it in the Business Intelligence Development Studio (aka BIDS).  So, open the BIDS and create a new SSIS Project (or use an existing one, doesn’t really matter).

Once the project is open, right-click the SSIS Packages folder in the Solution Explorer and select Add Existing Package.

Right-click SSIS Packages folder in Solution Explorer to Add Existing Package

That opens up the Add Copy of Existing Package window.

Sidenote: do you see that SSIS Import and Export Wizard option in the previous screenshot?  That’s right, the wizard that we’ve used extensively in the earlier part of this article can be launched from here as well.

Add Copy of Existing Package

Select SSIS Package Store as location of the package and enter the name of your server in the second dropdown.  Once that is done you can click the button with the ellipsis and select your package under the MSDB node.

Clicking OK will add the package to your project in the Solution Explorer.  Double-click it to open it up.

The Control Flow

In the Control Flow you can see two components: a SQL Task that contains a CREATE TABLE statement and a Data Flow.

Here’s what the CREATE TABLE statement looks like:

CREATE TABLE statement in the Execute SQL Task

As you can see, the table is created using the column names and types just like we configured them through the wizard.

Important to note here is that the Control Flow does not take anything else into account.  For instance, what happens if we execute the package twice?  It will fail because the table already exists!

In case you don’t believe me, just try it out!

Execute SQL Statement failed There is already an object named 'ProductList' in the database. 

The Data Flow

Opening up the Data Flow we see that it contains three components: an Excel source component that uses an Excel Connection Manager to connect to our now well-known Excel sheet, a Data Conversion Transformation to take care of the conversions that we requested and an OLE DB Destination that uses an OLE DB Connection Manager to connect to our SQL Server.

The Data Flow to transfer Excel data into SQL Server

Important to note here is that whenever an issue occurs, such as a conversion problem, the flow will fail.

In production environments, certain events need to be taken into account.  The purpose of this article was just to show you how you can use a wizard to generate an SSIS package for you.  You can now use this package as the basis for a well-developed Excel Import template.

Conclusion

With this article I hope to have shown you how to use Integration Services to import Excel data, and also that the Management Studio knows how to use other SQL Server components, such as SSIS, quite well.

If you’re running into some issues while using the wizard, or you just like reading what I write, check out my follow-up article covering some common pitfalls.

Need to go to sleep now, long drive tomorrow, PASS European Conference in Germany!  I do hope that Adam Saxton will be there because I was planning to see his presentations all day long.  I already read that one of the speakers – Brent Ozar – won’t be able to make it.  Darned ash cloud…  Next time someone starts talking to me about Azure I’ll run away screaming.

Just kidding :-)

Have fun!

Valentino.

References

Microsoft Support: How to import data from Excel to SQL Server

How to: Run the SQL Server Import and Export Wizard

  • Share/Bookmark

Tags: , , , ,

« Older entries

© 2008-2010 A Developer's Blog All Rights Reserved