October 2009

You are currently browsing the monthly archive for October 2009.

A while ago I wrote an article called Chart Optimization Tips.  This article explained how to optimize a Column Chart.  Today I have returned to show you some Pie Chart implementation techniques.

As usual, I will be using the AdventureWorks2008 database, available at CodePlex.  The chart itself will be implemented using SQL Server 2008 Reporting Services.

Retrieving The Data

The dataset in our report uses the following query:

select SWD.*, SWA.City, SWA.StateProvinceName,
    SWA.PostalCode, SWA.CountryRegionName, SWA.AddressType
from Sales.vStoreWithDemographics SWD
inner join Sales.vStoreWithAddresses SWA
    on SWA.BusinessEntityID = SWD.BusinessEntityID

This query illustrates a bad coding practice: never use “SELECT *”.  Ideally you should only retrieve the columns that you need for the report.  That will optimize performance when generating the report.  But that is not the goal of this article so I’ll leave the query as it is.

A Basic Pie Chart

To get started with our Pie Chart I have selected the third icon in the list of Shape charts.  This adds a regular 3D pie chart to the report.

Select Chart Type window

To set up the chart I dragged AnnualSales from the Report Data pane into the “Drop data fields here” area and StateProvinceName into the “Drop category fields here”.

In case you don’t see the Report Data view (it has a tendency to disappear now and then), you can open it through the main menu: View > Report Data.

Report Data Pane

This is what our report looks like in Preview:

Basic Pie Chart

Wow, we’ve still got some work to do, this looks like a kids color book!  You wouldn’t say that this chart is showing the annual sales, would you?  There aren’t even any numbers on it!  Let’s get started on improving this.

Sorting The Numbers

A good implementation practice is to sort the slices from large to small.  If the slices are not sorted, it’s difficult to tell which state is performing better than another.  Just take a look at the previous image and compare the following two slices:

  • the pie shown in grey at 12:00
  • the pie shown in yellow at 03:00

Which one is the larger of the two?  Indeed, “I don’t know” is the right answer.

To implement the sorting you need to think about what you want to achieve.  What is it that we want to sort?  The states.  And these are shown as categories on the chart, so we should take a look at its properties.  As shown in the following screenshot, right-clicking on the [StateProvinceName] button gives a pop-up menu.  Select Category Group Properties.

Context menu for Category Group

Then we need to think about how we want the states to get sorted.  For that we should look at what is being shown as data of the chart.  In our case that is the sum of the AnnualSales field.

In the Category Group Properties, select the page called Sorting.  Clicking the Add button will add a line in the sorting options list.  Use the following expression for the “Sort by” field:

=Sum(Fields!AnnualSales.Value)

As we want to sort the largest values first, select “Z to A” for the Order dropdown.

Category Group Properties

Right, time to have another look at our report in Preview.

Sorted Pie Chart

So, we went from a colorful mess to an ordered colorful mess.  Chaos has been reduced a bit, but this is still one difficult-to-read report.

On to the next improvement!

Limiting The Pies

As you have noticed, a pie chart is not suitable to show that many categories.  We need to find a way to reduce the slices.  One way to do that is by adding a filter.  Another way is to add the smallest slices together into one slice.  This can be interesting in cases where we want to use all the data but we’re only interested in the larger slices.  Luckily, this can be done using standard pie chart properties.

Click on the pie itself, this will select the Chart Series.  One way to tell if you’ve selected the correct part of the chart is by looking at the Properties pane.  Its selection should show something like “AnnualSales Chart Series”, where AnnualSales is the name of the chart series.  Another way to tell is by the small white selector bulbs: they should be surrounding the pie.

Now, among the properties of the Chart Series you will find a property group called CustomAttributes.  Open this one by clicking the plus icon in front of it.  Change the CollectedStyle property to SingleSlice.  This tells the chart that we want to group the smallest slices into one slice.

Other interesting properties here are CollectedThreshold and CollectedThresholdUsePercent.  I’ve put CollectedThreshold to 2 and CollectedThresholdUsePercent to True (which is its default).  This means that any slice smaller than 2 percent of the pie will be added into the “collected slice”.

More useful properties are CollectedLabel, that’s the text that is shown on the slice itself, and CollectedLegendText, the text shown in the legend.

Chart Series - collected slice properties

The CollectedStyle property has another option besides the one I’ve shown, called CollectedPie.  Choosing that will generate a second pie next to the main one to represent all the small slices.  See the following screenshot for what it looks like.  In some cases this may be an interesting option but not in our example here.

Pie Chart showing a CollectedPie

The collected pie can also show labels by setting the CollectedChartShowLabels property to True, and the categories shown on the collected pie can be shown in the pie’s legend by setting CollectedChartShowLegend to True.

If you’d like the collected slice to jump out, there’s a property called CollectedSliceExploded.  Setting it to True will produce something like the following:

Exploded Collected Slice

As you have noticed, the previous screenshots have started to show text on the slices.  This can be easily activated by right-clicking the pie and selecting the Show Data Labels menu item.

Context menu on pie chart - Show Data Labels

And the next screenshot shows what our chart currently looks like.

Pie chart with collected slice

The small slices have been replaced by a really large one, and the text on the large slice is our customized version.  The other slices are showing some rather large numbers, so we still have some work to do.

Displaying Percentages

Let’s customize the label shown on the slices.  As the numbers are really large, I recommend to divide them by 1,000.  As long as it’s clearly mentioned on the report, it will make everything more readable.

Furthermore I’ll show you how to use built-in chart keywords (only available to ToolTips, custom legend text, and data point label properties), such as #PERCENT.

Right-click on one of the data labels and select Series Label Properties.

Context menu of data labels - Series Label Properties

Click the expression (fx) button on the General page and enter the following expression:

=FormatCurrency(Sum(Fields!AnnualSales.Value) / 1000, 0) & " (#PERCENT{P1})"

The first part divides the sum of AnnualSales by 1,000 and then applies the FormatCurrency function to the result.  The second parameter for FormatCurrency tells the function that we don’t want any decimals.  The result of this function call is concatenated with the second part using Visual Basic string concatenation (&).

The second part looks like a regular string but it contains a built-in keyword: #PERCENT.  This will show the percentage that the slice represents.  Furthermore, there’s a custom string formatter appended: P1.  By default the percentage would show 2 decimals.  This way it will only use one digit for the decimal fraction.

See here for a list of all built-in keywords and this page for more information on the available formatting options.

And following screenshot shows what our chart now looks like.

Pie chart showing percentages on slices

I’ve also given it a clear title, decreased the Data Label font size to 8 and moved the legend down.

To move the legend: right-click it, select Legend Properties and play with the radio buttons for the Legend Position.

Legend Properties

So, we’ve now got a fairly readable chart.  It’s not perfect, some labels are overlapping, but it’s doable.  However, we won’t rest here.  On to the next tip.

Rotating The Pie

Some people may ask you, “Why on earth does the first slice (the blue one representing 10.6% in our example) start at this weird angle at 4 o’clock?  Why can’t it start at 12:00?”.

Again we’re lucky because this can be controlled using a standard property.  Among the Chart Series CustomAttributes property group there are still some properties which haven’t been mentioned earlier.  One of them is called PieStartAngle.  By default it is set to zero.  Funny enough, zero stands for 30°.  Try it out and enter 30 for the property value.  Did you see the effect?  Indeed, nothing happens!  Now enter 90.  Did you see the chart rotate, even in Design mode?  Switch to Preview to get a better view of what the impact is.  As you can see, setting it to 90 will cause the first slice to start at 06:00.  To make it start at 12:00, we thus need to set the property to 270 degrees.

Pie Chart with customized rotation angle

Labels Outside Pie Chart

Other people may tell you, “But I don’t want all these labels on the pie itself, I want them next to it.”.

We’re still lucky because again this can be achieved using standard properties.  Still in the Chart Series CustomAttributes, there’s a property named PieLabelStyle.  Its default value is Inside.  Switching it to Outside will render the labels outside the pie, with lines attaching them to their respective slice.

Other interesting properties for the outside labeling are 3DLabelLineSize and MinimumRelativePieSize.

3DLabelLineSize defines the amount of space used for drawing the line between the label and its corresponding slice and is a percentage of its default size.  Values range from 30 to 200.  I’ve put it to 30 to get as much space as possible for the pie itself and the labels.

MinimumRelativePieSize represents a percentage of the chart area size and defines the minimum acceptable pie size.  Values range from 10 to 70.  I’ve put this one to 70 to maximize the size of the pie.

Chart Series properties with properties for outside labeling highlighted

With these modifications we’ve actually gotten some extra space for the labels.  Let’s take advantage of that and add extra information in the labels.  Change the Data Label expression to the following:

="#LEGENDTEXT" & vbcrlf &
FormatCurrency(Sum(Fields!AnnualSales.Value) / 1000, 0) & " (#PERCENT{P1})"

Our expression uses another built-in keyword: #LEGENDTEXT.  This will add the legend text to the label itself, which means the legend becomes obsolete.  So I’ve removed it.

And this is what our chart now looks like:

Pie chart showing labels on the outside

With Halloween coming up I thought it would be nice to create a spidery chart :-)

One More Custom Attribute

I’ve already mentioned several CustomAttribute properties of the Chart Series and I’d like to mention one more.  This property is called PieDrawingStyle and it will only appear in the list of properties when 3D is not enabled.  After disabling 3D I could set it to either SoftEdge or Concave.  I also noticed that labels outside of a pie chart will only have lines attached to them when rendered in 3D, so I’ve switched back to Inside for the PieLabelStyle property.

This is what SoftEdge looks like.  I think it’s rather nice.

Pie chart using SoftEdge drawing style

Custom Coloring

To conclude, there may be some people who tell you, “I don’t like those colours, and I don’t like any of the predefined sets.  I want to specify custom colours.”.

So again we’re lucky because even that is supported by default.

To get started with our color customization, select the chart object.  To know if you’ve made the correct selection, the Property pane should show “Chart” as non-bold part of the dropdown.  Alternatively you can just use that dropdown to select the Chart.  As Chart is a main object on the report, it is shown in the list (whereas parts of a Chart, such as Chart Series and Chart Area, are not shown in that list).

With the Chart selected, locate the Palette property.  By default it is set to BrightPastel.  In case you’re happy with one of the predefined palettes you can just select it here.  But we go for Custom, located at the bottom.

Next, locate the CustomPaletteColors property.  Selecting the property will show a button with an ellipsis as button text.  Click this button to get to the ReportColorExpression Collection Editor (what a name for a popup window!).  This window allows you to specify a list of colors.  I’ve specified the following 10 colors:

ReportColorExpression Collection Editor

And finally this is what our report looks like.  To stay in the theme, I’ve specified some colors which are suitable for Halloween-time charting.

Pie chart with custom colors - Halloween-style!

Coloring Consistency Using Dynamic Colors

And now to really conclude this article I’d like to mention one additional tip related to chart coloring.  In some occasions it may be interesting to have coloring consistency between different charting periods.  With that I mean that California would always show in grey, no matter whether it came first or not, Washington as brown, and so on.  This is currently not the case.  With the current implementation it’s the first pie that gets the grey color, the second pie is brown, and so on.

The best way to achieve that is to store the colors in the database and then fetch them in the same dataset that is used to retrieve the chart data.  The AdventureWorks database hasn’t got any color codes stored so I’ll just illustrate what I mean using a little cheat.

In order to get our dynamic coloring working, we will override the colors from the palette.  This is how it’s done.  Right-click on the pie and select Series Properties.  Select the Fill page and click the Expression (fx) button to define the color.  In the case where you’re selecting the color code as one of the database fields, your expression would look similar to this (assuming that colors are stored using their 6-digit hexadecimal representation with 000000 being black and FFFFFF being white):

="#" & Fields!ColourCode.Value

To imitate dynamic coloring I’ve used the following expression:

=Switch
(
    Fields!StateProvinceName.Value = "California", "Blue",
    Fields!StateProvinceName.Value = "Washington", "Red",
    Fields!StateProvinceName.Value = "Florida", "Green",
    True , "#888888"
)

The expression gives three states their own color and all the others will be colored a kind of grey.

This is what it looks like:

Pie Chart using dynamic colors

So, I hope you’ve enjoyed reading this article.  Feel free to post any comments should you wish to do so, and… happy charting!

References

SQL Server 2008 Books Online: Pie Charts

How to: Collect Small Slices on a Pie Chart

Formatting Data Points on a Chart

How to: Define Colors on a Chart Using a Palette

Share

Tags: , , , ,

image

For the second year in a row, the Belgian SQL Server User Group organizes the SQL Server Day.

Like last year, the event will take place at Utopolis Mechelen.

If you’re ready for a day filled with learning and networking opportunities and you can make yourself available at December 3, register at www.sqlserverday.be!

There will be some interesting speakers such as:

  • Chris Webb – SSAS and Gemini
  • Henk van der Valk – SSIS world record
  • Dirk Gubbels – Change tracking
  • my colleagues Luc Lemaire and Mario Van Hissenhoven – Report Builder 3.0

If you’re attending as well and would like to meet up for a chat, don’t hesitate to post a comment here!

See you there!

Share

Tags: , ,

Initially I was going to call this article “Struggling With Collation: The SeQueL”, but it just doesn’t have the same ring to it as “Fun With Strings”.  In that previous article I showed how you might get different results when loading data from a temporary table or table variable and I suggested that one way of solving this is by switching your data type to nvarchar.

Unicode Or Not?

Reason #1 For Not

Today I’m going to show you that nvarchar is not always what we want to use, especially if we don’t need to support Unicode strings.  Imagine a staging scenario when loading a data warehouse.  Often the Business Keys (BK) are strings, and depending on the source system, sometimes very long strings – I’ve seen situations with a combined business key of over 500 bytes!  (You can’t imagine what some data sources look like but that’s another story.)  Do we really want to convert these to Unicode, and thus double their size?  Furthermore, to improve lookups we put indexes on those BKs.  These indexes would double in size as well.  So no, we don’t really want to make these fields Unicode, and certainly not when we want our ETLs to perform as fast as possible.

Reason #2 For Not

That was reason number one why nvarchar is not always the solution.  And here comes reason number two.  In my scenario, the source tables are located in an Oracle database.  And guess what: by default Oracle’s ORDER BY behaves different than SQL Server’s ORDER BY (when using the regular Latin1_General_CI_AS or SQL_Latin1_General_CP1_CI_AS collations)!  By default Oracle uses binary string comparison to sort its data and the reason for it appears to be that that’s the only way to prevent a full table scan.  I’m no Oracle expert but that’s what the documentation states.

Here’s a little demonstration.  The following script prepares a table variable and selects the data from it, sorted ascending.

declare @tbl table( col1 varchar(20));
insert into @tbl select ‘AA’;
insert into @tbl select ‘A’;
insert into @tbl select ‘A-’;
insert into @tbl select ‘A A’;
insert into @tbl select ‘BA’;
insert into @tbl select ’0′;
insert into @tbl select ’1′;
insert into @tbl select ‘-0′;
insert into @tbl select ‘-1′;
insert into @tbl select ‘A0′;
insert into @tbl select ’0A’;
insert into @tbl select ‘-A’;
insert into @tbl select ‘-B’;
insert into @tbl select ‘a’;
insert into @tbl select ‘b’;
insert into @tbl select ‘ ‘;

select * from @tbl
order by col1 asc;

I have executed it once just as stated above (while connected to a database that uses the SQL_Latin1_General_CP1_CI_AS collation) and once more while using nvarchar as data type for the column in the table variable.  The first execution will sort the data using a non-Unicode sorting algorithm, while the second execution will order the data according to the Unicode sorting method.  The results will be shown further below for easier comparison.

On Oracle I performed a similar procedure, as shown in following script.

select cast(‘AA’ as varchar(20)) as col1 from Dual union
select ‘A’ as col1 from Dual union
select ‘A-’ as col1 from Dual union
select ‘A A’ as col1 from Dual union
select ‘BA’ as col1 from Dual union
select ’0′ as col1 from Dual union
select ’1′ as col1 from Dual union
select ‘-0′ as col1 from Dual union
select ‘-1′ as col1 from Dual union
select ‘A0′ as col1 from Dual union
select ’0A’ as col1 from Dual union
select ‘-A’ as col1 from Dual union
select ‘-B’ as col1 from Dual union
select ‘a’ as col1 from Dual union
select ‘b’ as col1 from Dual union
select ‘ ‘ as col1 from Dual
order by col1 asc;

The Oracle script doesn’t use a table variable, it just creates a result set using several select statements with a union in between.  But for our test that doesn’t matter, the results using this method are suitable.

In the table below you can see the result of the three executions.

SQL non-Unicode SQL Unicode Oracle Binary
< space
-0 0 -0
-1 -0 -1
-A 0A -A
-B 1 -B
0 -1 0
0A A 0A
1 a 1
A -A A
a A- A A
A A A A A-
A- A0 A0
A0 AA AA
AA b BA
b -B a
BA BA b

As you can see, they only agree on one thing: space really is the smallest character in my test set!  And that’s not what I want, I want all the data to be sorted consistently, no matter what the source is.

But Why Sorted?

You may wonder why I need to sort the data.  Well, some components in Integration Services expect the incoming data flows to be ordered.  One of the standard components that requires this is the Merge Transformation.  Another (custom!) component is Table Difference.  I could of course add a Sort Transformation to my Data Flow, but that would not be interesting for performance.  I want the data to come from the database server in the expected order.  So now I’ll show you how you can do that.

Taking Control!

SQL Server: ORDER BY … COLLATE …

On SQL Server this was fairly easy.  The ORDER BY clause has a COLLATE part where you can specify what collation should be used to order the data.  Because Oracle sorts its data using a binary algorithm, I’ll tell SQL Server to do that as well.  More precisely I’ll tell SQL Server to use the Latin1_General_BIN collation.  The updated SELECT statement from the T-SQL script above looks like this:

select * from @tbl
order by col1 collate Latin1_General_BIN asc;

Oracle: ORDER BY NLSSORT()

To ensure that results from Oracle are always returned using the same sorting algorithm, I will also tell the Oracle server to sort it’s data using the binary algorithm.

The first way I came up with was to change the NLS_SORT setting on the session.  That can be done by executing the following command before the SELECT statement:

ALTER SESSION SET NLS_SORT=BINARY;

This method is fine when you’re running the queries manually from a client such as Oracle SQL Developer.  However, in SSIS the OLE DB Source component will not accept anything else besides the SELECT statement.

Then I found another way.  There’s a function called NLSSORT() which you can apply to a column in the ORDER BY clause.  The following statement demonstrates how to use this function.  (I only show the ORDER BY clause as it can be applied to the Oracle script mentioned earlier.)

ORDER BY NLSSORT(col1, ‘NLS_SORT=BINARY’)

The following table shows the results from both binary sort queries:

SQL Binary Oracle Binary
-0 -0
-1 -1
-A -A
-B -B
0 0
0A 0A
1 1
A A
A A A A
A- A-
A0 A0
AA AA
BA BA
a a
b b

Finally I am able to get data from both Oracle and SQL Server using a consistent sort order.

But, How Big Is NULL?

However, even on this straightforward request, both database servers do not fully agree!  Here’s what they have to say about the topic:

“NULL is the smallest.”

“No, it’s the largest.”

“No, smallest!”

“Largest!!”

“Smallest.”

“Largest I tell you!!!”

“Bladiebla, not hearing you, anyway, it’s NOTHING!”

“No, it isn’t!”

*discussion goes on and on*

If I add NULL to my test data set, SQL Server will sort it first (thus NULL is the smallest value in my test set), while Oracle will put it last.  In my situation it wasn’t really an issue (the BKs are not supposed to be NULL), but it’s quite important to remember in cases where NULLs are actually possible.

Conclusion

When working with strings, always keep collation in mind.  And even more so when dealing with several different source systems!

Additional reference material:

Database Journal: The Globalization of Language in Oracle – The NLS_COMP and NLS_SORT variables

BOL 2008: How to: Sort Data for the Merge and Merge Join Transformations

Share

Tags: , , , , , ,

© 2008-2017 BI: Beer Intelligence? All Rights Reserved