28 Feb

SSIS Excel Run As 32bit (Run64BitRuntime)

I know I’m not the only one who does this, and I know I’ll do it again, but when I use Excel files in SSIS I always crash and get the following error.

[Excel Source [2]] Error: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager “Excel Connection Manager” failed with error code 0xC0209303. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

It’s so clear that I have no idea how it could confuse anyone. It tells you exactly what the problem is and since the progress window text doesn’t wrap you couldn’t possibly get confused.  You will also have this error message, which will make a lot more sense.

[Connection manager “Excel Connection Manager”] Error: The requested OLE DB provider Microsoft.ACE.OLEDB.12.0 is not registered. If the 64-bit driver is not installed, run the package in 32-bit mode. Error code: 0x00000000.
An OLE DB record is available. Source: “Microsoft OLE DB Service Components” Hresult: 0x80040154 Description: “Class not registered”.

The error means exactly what it says, you have to run the project in 32bit mode.  Here is how you do that.

  1. Open the “Solution Explorer”
  2. Right-click on the project and select Properties.
  3. Click on Configuration, then debugging.
  4. Change Run64BitRuntime to false.

I’m not here to do a dissertation on why SSIS develops one way, and runs in another, I just wanted to give you a quick answer to an annoying problem that I always forget about.

Here’s more information.

http://technet.microsoft.com/en-us/library/ms141766%28v=sql.105%29.aspx

28 Feb

Minimum And Maximum Length For A Column

This will be easy, and short, for a change.  Suppose you want to get the minimum length of a column and the maximum length.  All you need to do is the following:

SELECT max(len(country)) as maxLength, 
min(len(country)) as minLength
FROM militaryspending;

maxLength	minLength
22	        4

Now if I want to find all of the records with that minimum length I simply do the following,

SELECT country, spending
FROM militaryspending
WHERE LEN(country) = (SELECT MIN(LEN(country)) FROM militaryspending);

country	spending
Togo	59
Mali	194
Chad	242
Peru	2098
Oman	4291
Iraq	5845

This works mostly the same with MySQL except that LEN is CHAR_LENGTH over on that side of the house.

mysql> SELECT country, spending
    -> FROM militaryspending
    -> WHERE CHAR_LENGTH(country) = (SELECT MIN(CHAR_LENGTH(country)) FROM militaryspending);
+---------+----------+
| country | spending |
+---------+----------+
| Togo    |       59 |
| Mali    |      194 |
| Chad    |      242 |
| Peru    |     2098 |
| Oman    |     4291 |
| Iraq    |     5845 |
+---------+----------+
6 rows in set (0.00 sec)

You should also be aware that both of these functions will count spaces at the end, or the beginning of a name. So ‘Alpha” = 5 while ‘Alpha ‘ = 6. It’s just something to be aware of.

See, I can do a short post.

27 Feb

Row Letters Over A Range – SQL Server

This is, another, continuation from the prior day’s post. This time I’m going to group by letter on the SQL Server side of the house over a range and produce something like this,

country		        spending	rowLtr
United States	        711421		A
China			142859		A
Russia			71853		A
United Kingdom	        62685		A
France			62535		A
Japan			59327		B
India			48889		B
Saudi Arabia	        48531		B
Germany			46745		B
Brazil			35360		B

This turned out to be trickier than I expected. The obvious function to use is NTILE. It’s easy to use except it doesn’t quite do what we want. Here’s my first iteration at the problem with NTILE.

SELECT TOP(10) country, 
spending, 
NTILE(26) OVER (PARTITION BY spending ORDER BY spending DESC) AS rowLtr
FROM militaryspending
ORDER BY spending DESC;

country			spending	rowLtr
United States	        711421		1
China			142859		1
Russia			71853		1
United Kingdom	        62685		1
France			62535		1
Japan			59327		1
India			48889		1
Saudi Arabia	        48531		1
Germany			46745		1
Brazil			35360		1

The problem here is that NTILE works over a range of the same values, however, spending being what it is, literally every value in the table is unique. But it is close. We need to create a constant value for NTILE to work against so that it tiles our spending categories properly. So, for the first attempt I just tossed a constant at it, which didn’t go well.

SELECT TOP(10) country, 
spending, 
1 as 'someCol',
NTILE(26) OVER (PARTITION BY 'someCol' ORDER BY 'someCol' DESC) AS rowLtr
FROM militaryspending
ORDER BY spending DESC;

But it didn’t like that.

Msg 5309, Level 16, State 1, Line 4
Windowed functions and NEXT VALUE FOR functions do not support constants as ORDER BY clause expressions.

That actually makes some sense to me. But I’m a cheater, and cheaters prosper, so here is what I did,

SELECT TOP(10) country, 
spending, 
CHAR(64+NTILE(26) OVER (PARTITION BY FLOOR(rand()) ORDER BY spending DESC)) as rowLtr
FROM militaryspending
ORDER BY spending DESC;

country			spending	rowLtr
United States	        711421		A
China			142859		A
Russia			71853		A
United Kingdom	        62685		A
France			62535		A
Japan			59327		B
India			48889		B
Saudi Arabia	        48531		B
Germany			46745		B
Brazil			35360		B

Lets take a look at each part of the rowLtr column to see what is happening.

The first part of the line is a simple CHAR, it’s identical to what MySQL was doing.  I simply start the increment at 64 and add 1 to it over each NTILE cycle.

NTILE(26) says to break out the data into 26 sections.  This is purely coincidental but it just happens that the military spending table has 126 records in it and 126 / 5 rounded up is 26.

The FLOOR(rand()) function returns a random number, which floor rounds down, to 0 in this case, and we’re golden, or not. Here are the last 8 records in the result set.

country		        spending	rowLtr
Guyana			29.9		Y
Timor Leste		27.3		Y
Sierra Leone	        26		Y
Moldova			20.8		Y
Belize			15.7		Z
Mauritius		10.1		Z
Cape Verde		9.7		Z
Seychelles		9.3		Z

Oops, NTILE breaks the data into 26 groups, not 25 groups with a final group of 1. So, in this case we have 22 groups of 5 and 4 groups of 4. That’s probably a defensible position but not quite what we want.

So lets dump the lousy rowLtr and try a better one, one that works this time.

First, we add ROW_NUMBER() to the query.

SELECT country, 
spending, 
ROW_NUMBER() OVER (ORDER BY spending DESC) as rowLtr
FROM militaryspending
ORDER BY spending DESC

country			spending	rowLtr
United States	        711421		1
China			142859		2
Russia			71853		3
United Kingdom	        62685		4
France			62535		5
Japan			59327		6
India			48889		7
Saudi Arabia	        48531		8
Germany			46745		9
Brazil			35360		10

So far, so good. Now we need to increment the rowLtr counter every 5 records. So, lets divide the ROW_NUMBER() value by 5, apply the CEILING function to it, and we should be close.

SELECT TOP(10) country, 
spending, 
CEILING(ROW_NUMBER() OVER (ORDER BY spending DESC) / 5) as rowLtr
FROM militaryspending
ORDER BY spending DESC

country			spending	rowLtr
United States	        711421		0
China			142859		0
Russia			71853		0
United Kingdom	        62685		0
France			62535		1
Japan			59327		1
India			48889		1
Saudi Arabia	        48531		1
Germany			46745		1
Brazil			35360		2

This almost works but we only have 4 rows at 0 when we need 5. In case you are wondering, FLOOR has the same problem as CEILING. Anyways, now we take a dollup of cheese, put on our cowboy hats, and cowboy up a solution by subtracting 1. Yes, the magical, mystical, solution to an unexplained problem in the middle of your code.  It’s best not to document these so you can save hard drive space because you know it costs a fortune these days.

SELECT TOP(10) country, 
spending, 
FLOOR((ROW_NUMBER() OVER (ORDER BY spending DESC)-1) / 5) as rowLtr
FROM militaryspending
ORDER BY spending DESC

country			spending	rowLtr
United States	        711421		0
China			142859		0
Russia			71853		0
United Kingdom	        62685		0
France			62535		0
Japan			59327		1
India			48889		1
Saudi Arabia	        48531		1
Germany			46745		1
Brazil			35360		1

Perfect, we have five 0’s to begin the record set. Now, for the final touch, which is simply to add CHAR to the query.

SELECT TOP(10) country, 
spending, 
CHAR(FLOOR((ROW_NUMBER() OVER (ORDER BY spending DESC)-1) / 5)+65) as rowLtr
FROM militaryspending
ORDER BY spending DESC

country			spending	rowLtr
United States	        711421		A
China			142859		A
Russia			71853		A
United Kingdom	        62685		A
France			62535		A
Japan			59327		B
India			48889		B
Saudi Arabia	        48531		B
Germany			46745		B
Brazil			35360		B

And the last 10 records?

country		        spending	rowLtr
Lesotho			57		X
Nicaragua		54.7		X
Guyana			29.9		X
Timor Leste		27.3		X
Sierra Leone	        26		Y
Moldova			20.8		Y
Belize			15.7		Y
Mauritius		10.1		Y
Cape Verde		9.7		Y
Seychelles		9.3		Z

Ah, perfect!

Honestly, I still have a hard time imagining any real use for this. NTILE should be more than adequate but I find it an interesting intellectual exercise, and it is my blog, so there ya go.

26 Feb

Row Letters Over A Range – MySQL

This is a continuation of yesterday’s post on ordering by letters.  Lets change things around a bit and instead of changing the letter every row, lets change it every 5 results.  Here is yesterday’s code.

mysql> SELECT CHAR(@ctr := @ctr + 1) AS rowLtr, country, 
    -> spending
    -> FROM militarySpending
    -> JOIN (SELECT @ctr := 64) as A
    -> ORDER BY spending DESC LIMIT 10;
+--------+----------------+----------+
| rowLtr | country        | spending |
+--------+----------------+----------+
| A      | United States  |   711421 |
| B      | China          |   142859 |
| C      | Russia         |    71853 |
| D      | United Kingdom |    62685 |
| E      | France         |    62535 |
| F      | Japan          |    59327 |
| G      | India          |    48889 |
| H      | Saudi Arabia   |    48531 |
| I      | Germany        |    46745 |
| J      | Brazil         |    35360 |
+--------+----------------+----------+
10 rows in set (0.00 sec)

Now we need to add a second counter so that can increment the CHAR function every 5 times. We will also use the modulus (%) function to generate a remainder so that when it returns 0 as a remainder the value is incremented.

mysql> SELECT IF(@rowCtr % 5 = 0, @ctr:= @ctr + 1, 0) AS modulusTest,
    -> @rowCtr := @rowCtr + 1 AS RowIncrement,
    -> CHAR(@ctr) AS rowLtr,
    -> country,
    -> spending
    -> FROM militarySpending
    -> JOIN (SELECT @ctr := 64, @rowCtr := 0) as A
    -> ORDER BY spending DESC LIMIT 10;
+-------------+--------------+--------+----------------+----------+
| modulusTest | RowIncrement | rowLtr | country        | spending |
+-------------+--------------+--------+----------------+----------+
|          65 |            1 | A      | United States  |   711421 |
|           0 |            2 | A      | China          |   142859 |
|           0 |            3 | A      | Russia         |    71853 |
|           0 |            4 | A      | United Kingdom |    62685 |
|           0 |            5 | A      | France         |    62535 |
|          66 |            6 | B      | Japan          |    59327 |
|           0 |            7 | B      | India          |    48889 |
|           0 |            8 | B      | Saudi Arabia   |    48531 |
|           0 |            9 | B      | Germany        |    46745 |
|           0 |           10 | B      | Brazil         |    35360 |
+-------------+--------------+--------+----------------+----------+
10 rows in set (0.00 sec)

I left the modulus and row counters in the data to show what I’m doing. However, we need to clean this up and there are two ways we can do that. The first is to use a simple subquery like this,

mysql> SELECT rowLtr, country, spending
    -> FROM (
    -> SELECT IF(@rowCtr % 5 = 0, @ctr:= @ctr + 1, 0) AS modulusTest,
    -> @rowCtr := @rowCtr + 1 AS RowIncrement,
    -> CHAR(@ctr) AS rowLtr,
    -> country,
    -> spending
    -> FROM militarySpending
    -> JOIN (SELECT @ctr := 64, @rowCtr := 0) as A
    -> ORDER BY spending DESC) as B LIMIT 10;
+--------+----------------+----------+
| rowLtr | country        | spending |
+--------+----------------+----------+
| A      | United States  |   711421 |
| A      | China          |   142859 |
| A      | Russia         |    71853 |
| A      | United Kingdom |    62685 |
| A      | France         |    62535 |
| B      | Japan          |    59327 |
| B      | India          |    48889 |
| B      | Saudi Arabia   |    48531 |
| B      | Germany        |    46745 |
| B      | Brazil         |    35360 |
+--------+----------------+----------+
10 rows in set (0.00 sec)

Tomorrow I’ll take a look at the same thing with SQL Server.

25 Feb

GROUP BY and ordering, SQL Server This Time

A couple of weeks ago I wrote about how MySQL will apply an ORDER BY to a GROUP BY query. Well, I did a bit of research and it turns out that SQL Server will do the same thing, and in a couple of other places, only it’s not quite the same thing. First, as always, lets get some data,

CREATE TABLE orderby
(
	id INT
)
DECLARE @ctr INT = 1;
SET NOCOUNT ON;
WHILE @ctr <= 10
BEGIN
	INSERT INTO orderby VALUES (@ctr)
	SET @ctr = @ctr + 1
END
SET NOCOUNT OFF;

Now lets run a couple of queries with the execution plan. First a simple SELECT.

SELECT *
FROM orderby;

select

 

 

 

 

 

 

Nothing unexpected here, just a normal table scan which you would expect.

SELECT SUM(id)
FROM orderby
GROUP BY id;

selectgroupby

This, on the other hand is more interesting. It shows a SORT. This is because the query optimizer processed the query as a stream aggregate. I’m not going into stream aggregates, especially when you have this guy, who does a much better job it here:

http://blogs.msdn.com/b/craigfr/archive/2006/09/06/743116.aspx.

All I’m really doing is pointing out that in some narrow cases SQL Server will sort without an ORDER BY. Here’s another case.

SELECT distinct id
FROM orderby;

selectdistinct

 

 

 

 

 

Hey, DISTINCT does the same thing.

Now, just for the fun of it, lets pile another 9,990 records on the table.

DECLARE @ctr int = 11;
WHILE @ctr <= 10000
SET NOCOUNT ON;
BEGIN
	INSERT INTO orderby VALUES (@ctr)
	SET @ctr = @ctr + 1;
END
SET NOCOUNT OFF;

Now, we’ll run the same queries.

SELECT *
FROM orderby;

select

 

 

 

 

 

 

That looks familiar enough.

SELECT SUM(id)
FROM orderby
GROUP BY id;

selectgroupby

 

 

 

 

 

 

Oo, that’s not the same thing. Quick, lets look at DISTINCT.

SELECT distinct id
FROM orderby;

selectdistinct

 

 

 

 

 

Yep, the optimizer is using a hash aggregate in both cases. Sigh, so much for relying on SQL Server to do the sort for me. Here’s more on hash aggregates courtesy of the same site,

http://blogs.msdn.com/b/craigfr/archive/2006/09/13/752728.asp

That’s it for today.

24 Feb

Result Numbering By Letter

I happened to stumble across this over on the SQL Team forums, liked it, and decided to see how many posts I could get out of it. This one will be for MySQL because it’s a slightly different approach and probably less familiar to my legions of readers (that’s sarcasm, in case anyone is confused).

First, we’ll go back to my military spending data.

mysql> SELECT country, spending
    -> FROM militarySpending
    -> ORDER BY spending DESC LIMIT 10;
+----------------+----------+
| country        | spending |
+----------------+----------+
| United States  |   711421 |
| China          |   142859 |
| Russia         |    71853 |
| United Kingdom |    62685 |
| France         |    62535 |
| Japan          |    59327 |
| India          |    48889 |
| Saudi Arabia   |    48531 |
| Germany        |    46745 |
| Brazil         |    35360 |
+----------------+----------+
10 rows in set (0.00 sec)

First, lets add a counter to that.

mysql> SELECT @ctr := @ctr + 1 AS rowCtr, country, spending
    -> FROM militarySpending
    -> JOIN (SELECT @ctr := 64) as A
    -> ORDER BY spending DESC LIMIT 10;
+--------+----------------+----------+
| rowCtr | country        | spending |
+--------+----------------+----------+
|     65 | United States  |   711421 |
|     66 | China          |   142859 |
|     67 | Russia         |    71853 |
|     68 | United Kingdom |    62685 |
|     69 | France         |    62535 |
|     70 | Japan          |    59327 |
|     71 | India          |    48889 |
|     72 | Saudi Arabia   |    48531 |
|     73 | Germany        |    46745 |
|     74 | Brazil         |    35360 |
+--------+----------------+----------+
10 rows in set (0.00 sec)

You are probably wondering why I’m starting at 65 with my counter. That’s because when we use the CHAR function, 65 is the first capitalized letter of the alphabet, you know it as the letter A.

mysql> SELECT CHAR(@ctr := @ctr + 1) AS rowCtr, country, spending
    -> FROM militarySpending
    -> JOIN (SELECT @ctr := 64) as A
    -> ORDER BY spending DESC LIMIT 10;
+--------+----------------+----------+
| rowCtr | country        | spending |
+--------+----------------+----------+
| A      | United States  |   711421 |
| B      | China          |   142859 |
| C      | Russia         |    71853 |
| D      | United Kingdom |    62685 |
| E      | France         |    62535 |
| F      | Japan          |    59327 |
| G      | India          |    48889 |
| H      | Saudi Arabia   |    48531 |
| I      | Germany        |    46745 |
| J      | Brazil         |    35360 |
+--------+----------------+----------+
10 rows in set (0.00 sec)

And that’s how you can use letters to number your rows, and it’s a great solution all the way up to, uh, 26 rows.

Oh, and here’s some other numbers.

mysql> SELECT CHAR(64) AS '64', CHAR(91) AS '91', CHAR(97) AS '97', CHAR(126) AS '126';
+------+------+------+------+
| 64   | 91   | 97   | 126  |
+------+------+------+------+
| @    | [    | a    | ~    |
+------+------+------+------+
1 row in set (0.00 sec)
23 Feb

GROUP BY CUBE

I wanted something fast, but useful, for a weekend post, especially after the ORDER BY post.  So here goes.

CUBE is used to generate aggregate values for all possibilities in the CUBE portion of the GROUP BY.  In other words, it provides subtotals by all grouping possibilities contained in the CUBE.  Lets go back to the cake data used in procMeans.  Here is the data.


SELECT * FROM cake;

LastName	Age	PresentScore	TasteScore	Flavor		Layers
Orlando		27	93		80		Vanilla		1
Ramey		32	84		72		Rum		2
Goldston	46	68		75		Vanilla		1
Roe		38	79		73		Vanilla		2
Larsen		23	77		84		Chocolate	0
Davis		51	86		91		Spice	        3
Strickland	19	82		79		Chocolate	1
Nguyen		57	77		84		Vanilla		0
Hildenbrand	33	81		83		Chocolate	1
Byron		62	72		87		Vanilla		2
Sanders		26	56		79		Chocolate	1
Jaeger		43	66		74				1
Davis		28	69		75		Chocolate	2
Conrad		69	85		94		Vanilla		1
Walters		55	67		72		Chocolate	2
Rossburger	28	78		81		Spice		2
Matthew		42	81		92		Chocolate	2
Becker		36	62		83		Spice		2
Anderson	27	87		85		Chocolate	1
Merritt		62	73		84		Chocolate	1

Now lets run a simple aggregate using AVG against the data.

SELECT layers, flavor, AVG(presentscore) AS avgScore
FROM cake
WHERE flavor = 'Chocolate'
GROUP BY layers, flavor
ORDER BY layers, flavor

layers	flavor	        avgScore
0	Chocolate	77
1	Chocolate	75
2	Chocolate	72

This is a simple result. It simply measures the average of the presentScore value by layer when the flavor is chocolate. You could actually drop the flavor column and effectively get the same result like this.

SELECT layers, AVG(presentscore) AS avgScore
FROM cake
WHERE flavor = 'Chocolate'
GROUP BY layers
ORDER BY layers

layers	avgScore
0	77
1	75
2	72

Now, lets add a GROUP BY CUBE to this result set.

SELECT layers, AVG(presentscore) AS avgScore
FROM cake
WHERE flavor = 'Chocolate'
GROUP BY CUBE(layers)
ORDER BY layers

layers	avgScore
NULL	74
0	77
1	75
2	72

This really only did one thing, add a average score for all columns. The NULL value in this case designates the aggregate. It’s the same thing as writing as separate query with

SELECT AVG(presentscore)
FROM cake
WHERE flavor = 'Chocolate'

However, if we go back to our original query, and CUBE it we get something a bit more interesting as far as explaining how CUBE works.

SELECT layers, flavor, AVG(presentscore) AS avgScore
FROM cake
WHERE flavor = 'Chocolate'
GROUP BY CUBE (layers, flavor)
ORDER BY layers, flavor

layers	flavor		avgScore
NULL	NULL		74
NULL	Chocolate	74
0	NULL		77
0	Chocolate	77
1	NULL		75
1	Chocolate	75
2	NULL		72
2	Chocolate	72

The cube has added a number of grouped values for the data.

Row 1 – The average value for all rows returned in the query.
Row 2 – The average value for all rows returned where the flavor is chocolate.
Row 3 – The average value for all rows where the flavor is chocolate and the layer is 0.
Row 5 – The average value for all rows where the flavor is chocolate and the layer is 1.
Row 7 – The average value for all rows where the flavor is chocolate and the layer is 2.

In other words, CUBE provides summary rows for all combinations in the group by column. It will be even clearer if I remove the chocolate restriction.

SELECT layers, flavor, AVG(presentscore) AS avgScore
FROM cake
GROUP BY CUBE (layers, flavor)
ORDER BY layers, flavor

layers	flavor		avgScore
NULL	NULL		76
NULL			66
NULL	Chocolate	74
NULL	Rum		84
NULL	Spice		75
NULL	Vanilla		79
0	NULL		77
0	Chocolate	77
0	Vanilla		77
1	NULL		76
1			66
1	Chocolate	75
1	Vanilla		82
2	NULL		74
2	Chocolate	72
2	Rum		84
2	Spice		70
2	Vanilla		75
3	NULL		86
3	Spice		86

So, because of how I ordered the columns we have the following combinations,

  • The average score for all columns.
  • The average score for each flavor.
  • The average score by layer and flavor.
  • The average score  by layer.

Keep in mind that the average column is designated by the NULL value.  The blank in the data represents an actual empty field in the data.  It came over that way from the SAS site.  SAS has it’s won way of dealing with NULL values and I just left that data alone.

22 Feb

Is Order Guaranteed Without ORDER BY?

Anyone whose been around a database, probably any version, always hears “if you want a guaranteed result order, use ORDER BY”. Well, it’s true. A result-set, in order to technically qualify as a result set, is unordered. Further, you can easily get in trouble relying on a result-set to show up in a certain fashion and then for all kinds of reasons, it doesn’t. So, if you want a guaranteed result order, 100% of the time, use ORDER BY.

But, on the other side of the story, SQL Server does have certain rules in certain situations that do, in effect, guarantee sort order. It’s just that, well, you can’t rely on them, at all, so, they aren’t really rules, except that, they are.

Lets create some data.

CREATE TABLE orderby
(
	id decimal(3,1),
	letter char(1)
);
INSERT INTO orderby VALUES (1,'a'),(5,'e'),(8,'h'),(4,'d'),(9,'i'),(7,'g'),
(10,'j'),(2,'b'),(11,'k'),(3,'c'),(6,'f');
SELECT * FROM orderby;

id	letter
1.0	a
5.0	e
8.0	h
4.0	d
9.0	i
7.0	g
10.0	j
2.0	b
11.0	k
3.0	c
6.0	f

As you can see the result set was returned in the order that the data was entered. If you could archive that data, and never, ever, touch it again except by the prior query you could count on that query returning in that order. Why? Because that’s how SQL Server rolls.

Now lets add a clustered index to the table and run a select against it.

CREATE CLUSTERED INDEX idx_orderby ON orderby (id DESC);
SELECT * FROM orderby;

11.0	k
10.0	j
9.0	i
8.0	h
7.0	g
6.0	f
5.0	e
4.0	d
3.0	c
2.0	b
1.0	a

Look, the results were returned in the order of the clustered index. If, as I said in the last paragraph, you could archive that data, and never, ever, touch it again except for that query, you would have a guaranteed sort order. Because, that’s how SQL Server rolls.

However, before I go any further, you should have just realized why you can’t rely on the database ordering your data for you, right? Because people can do things to it. For instance, they could add data.

INSERT INTO orderby VALUES (3,'m'),(3,'b');
SELECT * FROM orderby;
id	letter
11.0	k
10.0	j
9.0	i
8.0	h
7.0	g
6.0	f
5.0	e
4.0	d
3.0	c
3.0	m
3.0	b
2.0	b
1.0	a

Here, as you can see, SQL Server stored the data according to the clustered index, right at the end of the 3.0 section, but it stored it in insert order. So, is it technically sorted? Yeah, but probably not in a way that you could rely on it absent an ORDER BY.

Now lets drop the clustered index and query the data.

DROP INDEX idx_orderby ON orderby;
SELECT * FROM orderby;

id	letter
11.0	k
10.0	j
9.0	i
8.0	h
7.0	g
6.0	f
5.0	e
4.0	d
3.0	c
3.0	m
3.0	b
2.0	b
1.0	a

Hey! It’s still sorted based on the clustered index. That’s because the clustered index ordered the stored data and, because, that’s how SQL Server rolls, it returns data in that order.

Now lets add another row to the table.

INSERT INTO orderby VALUES (9,'n');
SELECT * FROM orderby;

id	letter
11.0	k
10.0	j
9.0	i
8.0	h
7.0	g
6.0	f
5.0	e
4.0	d
3.0	c
3.0	m
3.0	b
2.0	b
1.0	a
9.0	n

As you can see the INSERT put the data at the end of the table, presumably, not in the order that we want it. Oh, and the archive stuff, SQL Server rolling, actually still applies to this query and this data.

Finally, lets put another clustered index on the table and query that.

CREATE CLUSTERED INDEX idx_orderby ON orderby (id);
SELECT * FROM orderby;

id	letter
1.0	a
2.0	b
3.0	c
3.0	m
3.0	b
4.0	d
5.0	e
6.0	f
7.0	g
8.0	h
9.0	i
9.0	n
10.0	j
11.0	k

Well, at least the numbers are sorted, and reliably so at this point, that is, if you weren’t looking for the results in descending order per your earlier clustered index, and you never changed anything.

Hopefully, by now you are convinced, but, if not, here’s one last test. First, we’ll drop and recreate the orderby table with a single column. Then I’ll insert 200,000 records numbered 1 to 200000 in order. The results are,

DROP TABLE orderby;
GO
CREATE TABLE orderby
(
	id int
)
DECLARE @ctr INT = 1;
DECLARE @totrows INT = 200000;

WHILE @ctr <= @totrows
BEGIN
INSERT INTO orderby VALUES(@ctr);
SET @ctr = @ctr + 1;
END
SELECT * FROM orderby

First 5 records
id
1
2
3
4
5

Last 5 records
id
199996
199997
199998
199999
200000

This is exactly what you’d expect given the earlier examples. Now lets delete some rows and insert some rows and see what happens.

DELETE FROM orderby
WHERE id BETWEEN 10000 and 190000;

DECLARE @ctr INT = 1;
DECLARE @totrows INT = 10000;

WHILE @ctr <= @totrows
BEGIN
INSERT INTO orderby VALUES(@ctr+500000);
SET @ctr = @ctr + 1;
END
SELECT * FROM orderby

First 5 records
id
1
2
3
4
5

Last 5 Records
id
500887
500888
500889
500890
500891

Records 9,995 through 10,005 in the result set.
id
9995
9996
9997
9998
9999
500107
500214
500321
500428
500535
500642

O:o! Absent an index SQL Server just dumped the data into the heap in any available space it could find and this is the result.  If you had a lot of updates and deletes on a smaller table you would end up with the same problem: SQL Server just tossing data seemingly hither and yawn into it’s tables.

If we place a clustered index on the table, we could, in fact, guarantee order, again, on this specific table.

One final, but important note. The reason a clustered index can reliably sort this data is because of the following:

  • It’s the only index on the table.
  • The table only has one column.
  • You don’t have any weird collations that might affect the results.
  • You have the right data type.
  • You have a very specific query.
  • Most importantly, the table can’t change, ever.

And anything I didn’t list can’t happen either.

In other words, you have a super narrow situation that will never happen outside of an example like this.  In the real world your tables are constantly receiving inserts, deletions and updates.  They have multiple indexes, indexes are added and deleted, and who knows what kind of query will be written against it.  All of that combines into a very combustible environment that brings about the golden rule:

ALWAYS USE AN “ORDER BY” TO GUARANTEE RESULT ORDER!

This is an issue that causes a lot of confusion.  New users who work with small samples of data run a query and get the result the same way every time, and rightly wonder, “isn’t order guaranteed”, when it just isn’t, absent an ORDER BY, except in the scenario they just ran.

Anyways, I’ll probably catch hell for this, if someone ever reads it, but I hope it’s worthwhile to someone, someday.

21 Feb

Put A Smile In Your Messages

OK, this isn’t exactly the most useful thing you’ll ever see, or do, but it can actually be fun in the right situation. Here’s the code:

IF OBJECT_ID('smiley','P') IS NOT NULL
DROP PROCEDURE smiley
GO
CREATE PROCEDURE smiley
AS
BEGIN
DECLARE @smiley nvarchar(4000)='                          oooo$$$$$$$$$$$$oooo
                      oo$$$$$$$$$$$$$$$$$$$$$$$$o
                   oo$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o         o$   $$ o$
   o $ oo        o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o       $$ $$ $$o$
oo $ $ "$      o$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$o       $$$o$$o$
"$$$$$$o$     o$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$o    $$$$$$$$
  $$$$$$$    $$$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$$$$$$$$$$$$$$
  $$$$$$$$$$$$$$$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$$$$$$  """$$$
   "$$$""""$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$
    $$$   o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$o
   o$$"   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$       $$$o
   $$$    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$" "$$$$$$ooooo$$$$o
  o$$$oooo$$$$$  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$   o$$$$$$$$$$$$$$$$$
  $$$$$$$$"$$$$   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     $$$$""""""""
 """"       $$$$    "$$$$$$$$$$$$$$$$$$$$$$$$$$$$"      o$$$
            "$$$o     """$$$$$$$$$$$$$$$$$$"$$"         $$$
              $$$o          "$$""$$$$$$""""           o$$$
               $$$$o                                o$$$"
                "$$$$o      o$$$$$$o"$$$$o        o$$$$
                  "$$$$$oo     ""$$$$o$$$$$o   o$$$$""
                     ""$$$$$oooo  "$$$o$$$$$$$$$"""
                        ""$$$$$$$oo $$$$$$$$$$
                                """"$$$$$$$$$$$
                                    $$$$$$$$$$$$
                                     $$$$$$$$$$"
                                      "$$$""""';
PRINT @smiley;
END

All this does is put a smiley in the messages tab.

You can do this all kinds of ways, so I’ll leave the details to you, but if you get the urge to be weird some day, this happens to be one way.

20 Feb

Knight’s Microsoft SQL Server 2012 Integration Services 24-Hour Trainer

I’ve been studying for the 70-463 test and I’ve been having serious problems with the 70-463 Training Kit.  Things just aren’t working.  Maybe it’s me, maybe it’s the book, who knows. SSIS can be really messy and easy to mess up when you are following instructions because it’s so easy to skip a step. All I know for sure is that it was taking me too much time to process parts of that book.

So I switched books to Knight’s 2012 Integration Services 24-Hour Trainer and it’s been a massive improvement.  The explanations are clear, concise and most importantly, they work.  The book also covers a fairly broad spectrum of topics ranging from installation, to data and control flow tasks, variable usage, deployment, error handling and even administration.  None of it is super in-depth, meaning that no way is it top-end test preparation, but it’s more than enough to get you going in a positive direction.

I also have the 2008 version but SSIS has changed in 2012 so I repurchased the book.  From what I can tell a fair amount of the book is similar but there is a stronger emphasis on data warehousing and BI solutions than in the 2008 version.  The 2008 book is mostly, “here’s a EXECUTE SQL TASK and this is what you do with it”.  The 2012 version still does this but it will go deeper in a later lesson by using it in the context of building a fact table or something like that.  Data warehousing topics were barely mentioned in the 2008 edition from what I remember.

The book is also longer than the 2008 version by 140 pages and 13 lessons.  Some of that is just due to the version change but it really is a more substantive book than the prior edition.

I highly recommend the book to someone trying to get their hands dirty with SSIS 2012 for the first time.

Positives

  • I’m at chapter 44, and with one minor exception, everything has worked.
  • Much expanded from the 2008 version.
  • A great introduction to the product.
  • Easy to follow exercises.

Negatives

  • The CD doesn’t include exercise content.  It’s just videos so you still have to download everything.

This and the 2008 edition are available at amazon.com.

Knight’s Microsoft SQL Server 2012 Integration Services 24-Hour Trainer

Knight’s 24-Hour Trainer: Microsoft SQL Server 2008 Integration Services