31 Jan

Training Kit (70-462): Administering Microsoft SQL Server 2012 Databases

70-462 Training Kit

70-462 Training Kit

Well, one more test down. I have to admit, straight up, that I enjoyed this one a lot less than I enjoyed 70-461. When I did the Oracle SQL Expert test it was hard, but honest. I never had the feeling that the official study guide was missing anything major and I felt the questions on the test were really clear and direct. They were just mean at times.

This test felt more like what I remember from Microsoft tests, a tendency to rely on vagueness or very subtle word choices, and minutiae, rather than testing knowledge. Honestly, a large part of this test were simply things that unless you were a full-time DBA, then it would all be 2nd nature to you, are things the rest of us would look up.

But, this is about the book, so here goes.

First, this book doesn’t even make the attempt to be interesting. It’s simply, here are the steps to do Y, for 500 pages. It’s just a straight up grind through the content. I don’t know if that’s the author’s fault, or the content of the test, which is a straight-up grind. Just be aware that this is, a, grind.

Second, the book, at times looks to have been lifted almost directly from the online documentation. Again, that’s probably due to the content, but when you see sentences that are essentially identical to the online content you wonder, a little, as to why you bothered to buy the book.

Other commentators have noted that you need to set up 6 servers. I didn’t do this, I used brute-force on the process, but I can see that helping. There’s so much micro stuff on the test that going through the process could help, they certainly test it. I wish I’d done it but at the same time the test is so heavy on memorization that maybe it wouldn’t have made any difference.

I guess if you’ve gotten this far you’ve figured out that I wasn’t a big fan of this test or the book. The prior training kit was excellent but the content of that book is rife for more interesting writing and reading. The question I can’t really answer is the pain I felt plowing this book, due to the book, the test content or less interest on my part?

Finally, you really have no where else to go to study for the test. The book does provide enough to pass the test, so there is that, but for me it was just painful to get done. This sort of test screams out for a solid study guide and this isn’t that. I felt like I was fighting the book and the test so I’m very happy to have this one in my rear view mirror.

If I had to rate this book I’d probably give it a 2-1/2 out of 5. No way is this a 5, or even a 4 but at least it gets you through the basics of the content.

The book is available at amazon.com.

31 Jan

Row Numbering SQL Server Version

So, lets do this the easier way. SQL Server has four numbering function but this post will only deal with row_number() and it will duplicate yesterday’s row number post from the SQL Server side.

Here’s the fist part, without using the row_number() function.

SELECT top(10) *
FROM spending
ORDER BY spending DESC;

country		spending	gdp
United States	711421		4.8
China		142859		2.1
Russia		71853		3.9
United Kingdom	62685		2.6
France		62535		2.3
Japan		59327		1
India		48889		2.7
Saudi Arabia	48531		10.1
Germany		46745		1.4
Brazil		35360		1.6

This is basically the same query as MySQL except that to limit the result to the top 10 I use TOP (10) instead of LIMIT 10. So now lets add row numbers to the result set.

SELECT TOP(10) ROW_NUMBER() OVER(ORDER BY spending DESC) AS ‘Row Number’, country, spending
FROM spending;

Row Number	country		spending
1		United States	711421
2		China		142859
3		Russia		71853
4		United Kingdom	62685
5		France		62535
6		Japan		59327
7		India		48889
8		Saudi Arabia	48531
9		Germany		46745
10		Brazil		35360

As you can see this is a completely different animal. The biggest change, besides the use of the ROW_NUMBER function is that the ORDER BY is not at the end of the SELECT statement. It is, however, very handy and easy to use so I’ll forgive Microsoft, 7 years after they implemented it, for doing so.

30 Jan

Row numbering MySQL version

I haven’t posted in a bit, and to make it worse, I’m going to do a minor rerun. In this case I’m going to discuss how to set up a row counter in MySQL. Unlike SQL Server MySQL doesn’t have easy-to-use built-in row numbering functionality. So you have to cheat a little bit.

First lets look at some data. This is SIPRI military spending data for 2011 and we’re going to look at the 10 largest military expenditures. The initial query and data look like this:

mysql> SELECT *
    -> FROM spending
    -> ORDER BY spending DESC LIMIT 10;
+--------------+----------+---------+
| country      | spending | gdpperc |
+--------------+----------+---------+
| USA          |   711421 |     4.8 |
| China, P.R.  |   142859 |     2.1 |
| Russia       |    71853 |     3.9 |
| UK           |    62685 |     2.6 |
| France       |    62535 |     2.3 |
| Japan        |    59327 |       1 |
| India        |    48889 |     2.7 |
| Saudi Arabia |    48531 |    10.1 |
| Germany      |    46745 |     1.4 |
| Brazil       |    35360 |     1.6 |
+--------------+----------+---------+
10 rows in set (0.00 sec)

Now we want to add row numbers but because there isn’t any direct functionality we have to create a variable and increment that as the results are returned. We do that as follows:

mysql> use military;
Database changed
mysql> SET @row = 0;
Query OK, 0 rows affected (0.00 sec)

+------------+--------------+----------+
| Row Number | country      | spending |
+------------+--------------+----------+
|          1 | USA          |   711421 |
|          2 | China, P.R.  |   142859 |
|          3 | Russia       |    71853 |
|          4 | UK           |    62685 |
|          5 | France       |    62535 |
|          6 | Japan        |    59327 |
|          7 | India        |    48889 |
|          8 | Saudi Arabia |    48531 |
|          9 | Germany      |    46745 |
|         10 | Brazil       |    35360 |
+------------+--------------+----------+
10 rows in set (0.00 sec)

So, what we do in this query is first create a variable @row in this case, set it to zero, then all we need to do is add it to the query and increment it’s value with the @row := @row + 1. This query can also be written without the SET as follows,

mysql> SELECT @row := @row + 1 as 'Row Number', country, spending
    -> FROM spending
    -> JOIN (SELECT @row := 0) AS a
    -> ORDER BY spending DESC LIMIT 10;

The difference between this and the prior query is simply that you define the @row variable with a SELECT statement in the JOIN. I don’t know if there is any official consensus on which one to use. They both work, the first version is more readable and the latter version is a bit cleaner because it doesn’t have a SET.

27 Jan

ANSI_NULLS setting

Back in my Cyberquery days we used to search for NULL values with colname = NULL. I always thought it was strange but that’s how it worked. Well, ou can do the same thing with SQL Server by using the ANSI_NULLS setting.

SET ANSI_NULLS ON; — This is the default setting.
with cte as
(
select null as col1
union
select 1
)
select * from cte where col1 = NULL;

Returns no rows.

However, the same query with ANSI_NULLS OFF returns a different result.

SET ANSI_NULLS OFF; — Turns ANSI_NULLS off so the = operator will work.
with cte as
(
select null as col1
union
select 1
)
select * from cte where col1 = NULL;

Returns

col1
NULL

You’ll probably never run into it but if you see “= NULL”, and it works, this is what’s going on.

Note: At some point in the future this will be turned off so that this setting is always on and trying to turn it off will result in an error.

MSDN: http://msdn.microsoft.com/en-us/library/ms188048.aspx

27 Jan

COALESCE And NULL Ordering

NULL’s are a strange concept in that they equal nothing so how do you order them? In the case of SQL Server they come first. Other databases have options to sort them first or last and the ANSI standard says either is OK as long as you pick one. Here is a quick example,

SELECT null AS col1
UNION
SELECT 1
UNION
SELECT 2
ORDER BY col1; — This could be written as ORDER BY 1

col1
NULL
1
2

And that’s that. Actually the most relevant parts of this are that the alias can only apply to the first SELECT and the ORDER BY, just like in a regular query must appear last. This is all pretty standard stuff.

But, what if you wanted to put the NULL into the middle of this result set because, well, just because.

The easiest way to do that would be to ORDER BY a COALESCE statement as follows:

SELECT null AS col1
UNION 
SELECT 1 as 
UNION
SELECT 2
ORDER BY COALESCE(col1,1.5);
go

Except for the small problem that it returns the following errors,

Msg 207, Level 16, State 1, Line 6
Invalid column name ‘col1’.
Msg 207, Level 16, State 1, Line 6
Invalid column name ‘col1’.
Msg 104, Level 16, State 1, Line 6
ORDER BY items must appear in the select list if the statement contains a UNION, INTERSECT or EXCEPT operator.

Ok, that is kind of obvious. What’s interesting about that is outside of a UNION, INTERSECT or EXCEPT you can do exactly what I just tried to do and it will work. And, the ORDER BY item does, in fact, appear in the SELECT List. Fine, be that way!!!!!

So, what do we do? We use a Common Table Expression and call it a post.

WITH cte AS
(
SELECT null
UNION
SELECT 1
UNION
SELECT 2
)
SELECT *
FROM cte
ORDER BY col1

col1
1
NULL
2

I can’t really imagine using this, at least not with a NULL, but it does show how to use a SET operator with an alias or an ORDER BY and it also shows that you can order a SET by a column that isn’t in the result set, unless, you are using a UNION, INTERSECT or EXCEPT.

26 Jan

Clustered Indexes on MySQL (MyISAM / Innodb) and SQL Server

I’m going to write about a topic that comes up occasionally in regards to MySQL, SQL Server and clustered indexes. It’s nothing gigantic but I’ve seen it misrepresented.

A clustered index causes data to be stored in the order of the index and as such only one clustered index can exist on a given table. This can result in queries being returned more quickly in some cases. OK, now that I’ve said the same thing that 1000 other blogs have said, lets look at the different table types I’m discussing. We’ll start with MySQL.

MyISAM was the default engine on MySQL until version 5.5 and it’s not transactional. In regards to clustered indexes it’s very easy to talk about, it doesn’t have them. Next!

InnoDB, unlike MyISAM is transactional, is now the default engine type in MySQL, and most importantly for this discussion, has clustered indexes. Specifically, clustered indexes are created as follows under InnoDB.

  • If there is a primary key that becomes the clustered index.
  • If there is no primary key the first unique index on the table becomes the clustered index.
  • If there is no unique index or primary key, InnoDB will create a hidden clustered index.

Now, lets talk about SQL Server tables.  SQL Server is transactional and it uses clustered indexes.  However, because of the way clustered indexes are created it’s a bit different than the other two table types I’ve discussed.

  • A primary key will be the clustered index, unless, the primary key is created as nonclustered.
  • If the primary key is nonclustered you can create a different index as the clustered index.
  • If all of your indexes are nonclustered then your table will not have a clustered index.

This latter piece tends to confuse people because a primary key defaults to clustered so people think that the primary key is always clustered when, in fact, it can be nonclustered.  Actually, this is all I really wanted to say in this post but it would have been a really short post if that is all that I did.

Hope this helps someone.

26 Jan

Correlated Subquery For Top N

In my last post I used a CTE on some terrible data (I know that, and, I know this example is even worse) to produce a TOP (2) Sales By Region. I’m going to do the same thing on the same data.

Note: This works because the names are unique. If they weren’t I’d have to have a unique identifier for the rep names. But this isn’t about normalized or tight data, it’s about a correlated subquery. So, here’s the data,

SELECT * FROM mysales

Region	Rep	Sales
reg1	Bob	1000
reg1	Bill	2000
reg1	Joe	3000
reg1	Frank	4000
reg2	Sam	5000
reg2	Jill	6000
reg2	Greg	7000
reg2	Harry	8000

and, again, we want the TOP 2 reps by region. So, here’s how the same query would look using a correlated subquery,

select *
FROM mysales o1
WHERE o1.rep IN (SELECT TOP (2) rep as big
FROM mysales o2
WHERE o1.region = o2.region
ORDER BY o2.sales DESC)
ORDER BY region, sales DESC

Like a CTE this query essentially runs two select statements: the main query and the correlated subquery.  The first item of note is the IN operator.  The IN operator says to match any item in the result set that the subquery returns.  The second thing to note is the WHERE clause in the subquery, it’s how we link the two queries together.  In this case it’s saying to run the subquery against the region in the outer query.  This is how the query knows what region to search against.

To think of it another way, the main query is saying: give us the top 2 reps in each region and it’s doing it with the IN operator and the WHERE clause.

All things considered a CTE is probably easier to write and cleaner to read but not all databases have that functionality available.  It’s very probable that some day you’ll need a correlated subquery for a problem like this.

25 Jan

Top N By Group With Ranking

This is a continuation of my earlier article on LIMIT and TOP.  It’s a similar exercise but it’s a bit more complex.  Essentially, you get asked at some point, probably an interview, to write a query that returns the TOP 2 of something ranked by something.  That’s pretty vague on my part so lets be a bit more specific with some data.  Suppose you have a simple data set like the following:

SELECT * FROM mysales

Region	Rep	Sales
reg1	Bob	1000
reg1	Bill	2000
reg1	Joe	3000
reg1	Frank	4000
reg2	Sam	5000
reg2	Jill	6000
reg2	Greg	7000
reg2	Harry	8000

Now your manager comes to you with a request to see the top 2 sales reps by region. If you use a TOP (n) you’ll only get the top 2 reps. There are other ways to do this but this is the most common, and easiest, on the SQL Server side of the house.

WITH CTE
AS
(
SELECT row_number() OVER (partition by region ORDER BY sales desc) as rowNum, Region, Rep, Sales
FROM mysales
)
SELECT rowNum, Region, Rep, Sales
FROM CTE
WHERE rowNum IN(1,2);

rowNum	Region	Rep	Sales
1	reg1	Frank	4000
2	reg1	Joe	3000
1	reg2	Harry	8000
2	reg2	Greg	7000

What we’re doing here is creating a common table expression that assigns a row number to the query results that is partitioned by region. In other words, the row numbers are recycled for each region. Then all we have to do is run a SELECT against the CTE and limit the results to a row number of 1 or 2, and we’re done.

24 Jan

A Multiplication exercise

I was looking through some programming exercises and one that came up was build a multiplication table.  This is really easy in most programming languages.  You set up two loops, roll through them and you produce something that looks like

1 2 3
2 4 6
3 6 9

The easiest way to do it in SQL would be to use two WHILE loops and roll through it, but that wouldn’t be much fun so I’m going to do it differently.  I’m going to make the easy, hard.

STEP 1: Create and populate a  temporary Table

CREATE TABLE #Temp
(
col1 INT,
col2 INT
)
DECLARE @ctr INT = 1
WHILE @ctr <= 12
BEGIN
INSERT INTO #Temp VALUES (@ctr, @ctr)
SET @ctr += 1
END

Already, a bad sign, I have two columns.  Why two columns?  Because it makes the PIVOT table that I’l create in a second work.

Step 2: Create a pivot table on the data

SELECT * FROM #Temp
PIVOT (SUM(col1) FOR col2 IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS a

This produces a result set that looks like the following:

1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12

Step 3: Repeat this data 12 times

Before I repeat the data I need to do a quick math class.  It’ll make sense in a second.  Once you’ve learned the multiplication tables we just automatically think 12 * 6 = 72.  In other words, we know it and we forget that 12 * 6 is actually 12 + 12 + 12 + 12 + 12 + 12.  Why does this matter here?  Because we’re working with a result set and we’re going to build a running total in a second.

So lets replicate the data with a FULL OUTER JOIN.  In case you forget a FULL OUTER JOIN returns all combinations from both sides of the table.  Since one side of our join is a single row this is that rare time that a FULL OUTER JOIN is going to be useful.

SELECT *
FROM #Temp
FULL OUTER JOIN
(
SELECT * FROM #Temp
PIVOT (SUM(col1) FOR col2 IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS a
) AS b ON [12] >= col1
ORDER BY col1

Which Produces

col1 col2 1 2 3 4 5 6 7 8 9 10 11 12
1    1    1 2 3 4 5 6 7 8 9 10 11 12
2    2    1 2 3 4 5 6 7 8 9 10 11 12
3    3    1 2 3 4 5 6 7 8 9 10 11 12
4    4    1 2 3 4 5 6 7 8 9 10 11 12
5    5    1 2 3 4 5 6 7 8 9 10 11 12
6    6    1 2 3 4 5 6 7 8 9 10 11 12
7    7    1 2 3 4 5 6 7 8 9 10 11 12
8    8    1 2 3 4 5 6 7 8 9 10 11 12
9    9    1 2 3 4 5 6 7 8 9 10 11 12
10   10   1 2 3 4 5 6 7 8 9 10 11 12
11   11   1 2 3 4 5 6 7 8 9 10 11 12
12   12   1 2 3 4 5 6 7 8 9 10 11 12

Now you are probably wondering about some of the code. First, ON [12] >= col1 is there because I have to have a JOIN condition, I want the join to always fire and this is how I did it.  Almost any test would work that didn’t exclude specific values in the data.  So, 1=1, 1 <> 1, [12] IS NOT NULL, etc, would all be fine.  On the other hand, [12] IS NULL or [12] < 6 wouldn’t work.

Second, the ORDER BY clause is there because I need the sort order guaranteed.

Now lets clean this up a bit.

STEP 4: Create running totals in the data

SELECT col1 AS ‘ ‘,
SUM([1]) OVER ( ORDER BY col1) AS ‘1’,
SUM([2]) OVER ( ORDER BY col1) AS ‘2’,
SUM([3]) OVER ( ORDER BY col1) AS ‘3’,
SUM([4]) OVER ( ORDER BY col1) AS ‘4’,
SUM([5]) OVER ( ORDER BY col1) AS ‘5’,
SUM([6]) OVER ( ORDER BY col1) AS ‘6’,
SUM([7]) OVER ( ORDER BY col1) AS ‘7’,
SUM([8]) OVER ( ORDER BY col1) AS ‘8’,
SUM([9]) OVER ( ORDER BY col1) AS ‘9’,
SUM([10]) OVER ( ORDER BY col1) AS ’10’,
SUM([11]) OVER ( ORDER BY col1) AS ’11’,
SUM([12]) OVER ( ORDER BY col1) AS ’12’
FROM #Temp
FULL OUTER JOIN
(
SELECT * FROM #Temp
PIVOT (SUM(col1) FOR col2 IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS a
) AS b ON [12] >= col1
ORDER BY col1

Which produces

     1  2  3  4  5  6  7  8  9   10  11  12
1    1  2  3  4  5  6  7  8  9   10  11  12
2    2  4  6  8  10 12 14 16 18  20  22  24
3    3  6  9  12 15 18 21 24 27  30  33  36
4    4  8  12 16 20 24 28 32 36  40  44  48
5    5  10 15 20 25 30 35 40 45  50  55  60
6    6  12 18 24 30 36 42 48 54  60  66  72
7    7  14 21 28 35 42 49 56 63  70  77  84
8    8  16 24 32 40 48 56 64 72  80  88  96
9    9  18 27 36 45 54 63 72 81  90  99  108
10   10 20 30 40 50 60 70 80 90  100 110 120
11   11 22 33 44 55 66 77 88 99  110 121 132
12   12 24 36 48 60 72 84 96 108 120 132 144

These changes do a couple of things.  The obvious one is that it uses SUM with an OVER clause to generate running totals for each column.  Remember how I said  12 * 6 is actually 12 + 12 + 12 + 12 + 12 + 12?  Well, this is what it was leading up to.  I also dropped col2 from the results because I wanted to have a cleaner axis and aliased col1 as ‘ ‘ so the results looked a bit cleaner.

STEP 5: Drop it all in a stored procedure

IF OBJECT_ID(‘sp_junk’,’P’) IS NOT NULL
DROP PROCEDURE sp_junk;
GO
CREATE PROCEDURE sp_junk
AS
BEGIN
SET NOCOUNT ON

CREATE TABLE #Temp
(
col1 INT,
col2 INT
)
DECLARE @ctr INT = 1
WHILE @ctr <= 12
BEGIN
INSERT INTO #Temp VALUES (@ctr, @ctr)
SET @ctr += 1
END

SELECT col1 AS ‘ ‘,
SUM([1]) OVER ( ORDER BY col1) AS ‘1’,
SUM([2]) OVER ( ORDER BY col1) AS ‘2’,
SUM([3]) OVER ( ORDER BY col1) AS ‘3’,
SUM([4]) OVER ( ORDER BY col1) AS ‘4’,
SUM([5]) OVER ( ORDER BY col1) AS ‘5’,
SUM([6]) OVER ( ORDER BY col1) AS ‘6’,
SUM([7]) OVER ( ORDER BY col1) AS ‘7’,
SUM([8]) OVER ( ORDER BY col1) AS ‘8’,
SUM([9]) OVER ( ORDER BY col1) AS ‘9’,
SUM([10]) OVER ( ORDER BY col1) AS ’10’,
SUM([11]) OVER ( ORDER BY col1) AS ’11’,
SUM([12]) OVER ( ORDER BY col1) AS ’12’
FROM #Temp
FULL OUTER JOIN
(
SELECT * FROM #Temp
PIVOT (SUM(col1) FOR col2 IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) AS a
) AS b ON [12] >= col1
ORDER BY col1

SET NOCOUNT OFF
END

Other than the stored procedure I made two other minor changes.  Temporary tables are automatically dropped when you exit a stored procedure so I removed the DROP TABLE code and I also added the SET NOCOUNT option to get rid of the INSERT messages.

STEP 6: Wonder why I did it

Honestly, and I ‘ll talk about this someday, I see SQL as at least something of an art.  Yes, there are ways to optimize code, to be precise, and you want to do those kinds of things, especially if you are being paid, but sometimes, “I did it because it was there” is a good enough answer.

23 Jan

LIMIT vs TOP vs OFFSET

One of the questions that comes up on the MySQL forums is how do you do a TOP (N) in MySQL?  The answer people give is to use the LIMIT clause, except, LIMIT isn’t really the same as TOP.  It’s kind of a combination of TOP and a bit of OFFSET FETCH but really all 3 are different things.  Lets run through them.

TOP (N)

This simply says give me the first N records in a result set.  For instance,

SELECT TOP(5) score FROM stats.scores;

This works exactly the same as the following LIMIT clause in MySQL.

SELECT score FROM stats.scores LIMIT 5;

Top can also be used with the PERCENT clause to return the N percent rows in the result set.  So, if you have 20 records in a result set the following query would return the first two rows.

SELECT TOP(10) PERCENT score FROM stats.scores;

You cannot do this with the MySQL LIMIT clause.

T-SQL also has the OFFSET FETCH clause that allow you to return a certain number  of rows in a result set from any position in the result set.  Unlike the LIMIT clause or the TOP clause you must use the ORDER BY clause in a query for it to work.  Here are some examples.

First 5 rows (same as LIMIT and TOP clauses earlier):

SELECT score FROM stats.scores ORDER BY score OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

Return rows 6 to 10:

SELECT score FROM stats.scores ORDER BY score OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY;

You can also write a query like this with MySQL using the LIMIT clause:

SELECT score FROM stats.scores ORDER BY score LIMIT 5,5;

In this case the LIMIT clause is saying to bypass the first 5 records and return the next 5 records.  Again, keep in mind that MySQL does not require the use of ORDER BY with the LIMIT clause.

This comes up a lot on the MySQL world so hopefully it helps someone.