21 Feb

Put A Smile In Your Messages

OK, this isn’t exactly the most useful thing you’ll ever see, or do, but it can actually be fun in the right situation. Here’s the code:

IF OBJECT_ID('smiley','P') IS NOT NULL
DROP PROCEDURE smiley
GO
CREATE PROCEDURE smiley
AS
BEGIN
DECLARE @smiley nvarchar(4000)='                          oooo$$$$$$$$$$$$oooo
                      oo$$$$$$$$$$$$$$$$$$$$$$$$o
                   oo$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o         o$   $$ o$
   o $ oo        o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$o       $$ $$ $$o$
oo $ $ "$      o$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$o       $$$o$$o$
"$$$$$$o$     o$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$o    $$$$$$$$
  $$$$$$$    $$$$$$$$$$$      $$$$$$$$$$$      $$$$$$$$$$$$$$$$$$$$$$$
  $$$$$$$$$$$$$$$$$$$$$$$    $$$$$$$$$$$$$    $$$$$$$$$$$$$$  """$$$
   "$$$""""$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$
    $$$   o$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     "$$$o
   o$$"   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$       $$$o
   $$$    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$" "$$$$$$ooooo$$$$o
  o$$$oooo$$$$$  $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$   o$$$$$$$$$$$$$$$$$
  $$$$$$$$"$$$$   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     $$$$""""""""
 """"       $$$$    "$$$$$$$$$$$$$$$$$$$$$$$$$$$$"      o$$$
            "$$$o     """$$$$$$$$$$$$$$$$$$"$$"         $$$
              $$$o          "$$""$$$$$$""""           o$$$
               $$$$o                                o$$$"
                "$$$$o      o$$$$$$o"$$$$o        o$$$$
                  "$$$$$oo     ""$$$$o$$$$$o   o$$$$""
                     ""$$$$$oooo  "$$$o$$$$$$$$$"""
                        ""$$$$$$$oo $$$$$$$$$$
                                """"$$$$$$$$$$$
                                    $$$$$$$$$$$$
                                     $$$$$$$$$$"
                                      "$$$""""';
PRINT @smiley;
END

All this does is put a smiley in the messages tab.

You can do this all kinds of ways, so I’ll leave the details to you, but if you get the urge to be weird some day, this happens to be one way.

15 Feb

Regular Expressions, A Quickie

I wanted to toss a quickie into the mix, because it’s been sitting in the back of my mind, and that is a quick run through a simple regular expression example. Suppose you have the following data.

WITH cte AS
(
SELECT '123-456-7890' as phoneNumber
UNION ALL
SELECT '1234567890'
UNION ALL
SELECT '0987-654-3210'
UNION ALL
SELECT '123-4567890'
)
SELECT * FROM cte

Results
phoneNumber
123-456-7890
1234567890
0987-654-321
123-4567890

And you want to select the values that meet a specific criteria, which I’ll keep simple, and that criteria is that the value in the phone number is in a format like XXX-XXX-XXXX.

There are some hard ways to do this. For instance, you could write a really long WHERE clause, which could get it done or you could use a regular expression like the following code does.

WITH cte AS
(
SELECT '123-456-7890' as phoneNumber
UNION ALL
SELECT '1234567890'
UNION ALL
SELECT '0987-654-321'
UNION ALL
SELECT '123-456-78901'
)
SELECT * FROM cte
WHERE phoneNumber LIKE '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]';

Result
phoneNumber
123-456-7890

In essence, in a very inelegant way, you are saying give me any pattern with 3 numbers, followed by a dash, followed by 3 numbers, followed by a dash, followed by 4 numbers. It’s not pretty, and you might need to resort to other options in more complex searches but for something quick and dirty this will get it done.

MySQL, on the other hand, actually has a robust regular expression implementation. The above example could be written there as

mysql> mysql> SELECT *
    -> FROM
    -> (
    -> SELECT '123-456-7890' as phoneNumber
    -> UNION ALL
    -> SELECT '1234567890'
    -> UNION ALL
    -> SELECT '0987-654-321'
    -> UNION ALL
    -> SELECT '123-456-78901'
    -> ) as A
    -> WHERE phoneNumber REGEXP '^[0-9]{3}-[0-9]{3}-[0-9]{4}$';
+--------------+
| phoneNumber  |
+--------------+
| 123-456-7890 |
+--------------+
1 row in set (0.00 sec)

What we’re doing here is slightly different. The {n} means repeat the criteria for n spaces. The ^ symbol means search from the start of the string and the $ means search from the end of the string. In essence, I’m saying go 10 characters forward and 10 characters backward so that I don’t end up with a result that had too many numbers at the beginning or the end of the string.

I don’t write a lot of regular expressions, and really complex one’s definitely require a visit to supporting documentation of some sort, but they are handy to know about.

15 Feb

PROC MEANS Errors Out

When we left the PROC MEANS project it was throwing errors in some cases, which is not good, I bring the deep insights, so I’m going to fix that. Before I do that I’m going to make a very minor change to the PROC MEANS code by adding a TRY / CATCH block around the code that executes the dynamic SQL.

BEGIN TRY
	EXEC SP_EXECUTESQL @SQL
END TRY
BEGIN CATCH
	SELECT ERROR_MESSAGE()
END CATCH

Actually we can make this code a bit more descriptive by including the error number.

BEGIN TRY
	EXEC SP_EXECUTESQL @SQL
END TRY
BEGIN CATCH
	SELECT CAST(ERROR_NUMBER() AS varchar) + ' - ' + ERROR_MESSAGE()
END CATCH

What this code does is pretty simple, when an error occurs in the EXEC command processing moves to the CATCH block where it returns the error number, the error message. My code is actually quite happy to error when something goes wrong so I don’t need to do a whole lot with this but here is a complete list of error functions.

  1. ERROR_NUMBER() returns the error number.
  2. ERROR_MESSAGE() returns the complete text of the error message. The text includes the values supplied for any substitutable parameters such as lengths, object names, or times.
  3. ERROR_SEVERITY() returns the error severity.
  4. ERROR_STATE() returns the error state number.
  5. ERROR_LINE() returns the line number inside the routine that caused the error.
  6. ERROR_PROCEDURE() returns the name of the stored procedure or trigger where the error occurred

You can read more about them at the link below.

http://msdn.microsoft.com/en-us/library/ms179296%28v=sql.105%29.aspx

Now I need to create some errors so I can see what’s happening and provide a better error message for them.

EXEC procMeans @collist = 'PresentScore1~TasteScore', @table = 'cake';

207 - Invalid column name 'PresentScore1'.

This is a good descriptive error. It does exactly what I want.

EXEC procMeans @collist = 'PresentScore~TasteScore~', @table = 'cake';

156 - Incorrect syntax near the keyword 'FROM'.

That’s not so good! I’ll have to do something with this one.

EXEC procMeans @collist = 'PresentScore~~TasteScore', @table = 'cake';

156 - Incorrect syntax near the keyword 'FROM'.

I seem to like that one.

Finally, what about a weak attempt at an injection?

EXEC procMeans @collist = '; SELECT 1;PresentScore~TasteScore', @table = 'cake';

102 - Incorrect syntax near ';'.

I’d like that to be a bit more descriptive, but, I’m happy with that the code errors in this case. This isn’t really where the risk might be, in any case, but I’ll look at that in a later post.

EXEC procMeans @collist = 'PresentScore~TasteScore', @table = '';

156 - Incorrect syntax near the keyword 'UNION'.

and

EXEC procMeans @collist = 'PresentScore~TasteScore', @table = '123';

102 - Incorrect syntax near '123'.

and

EXEC procMeans @collist = 'PresentScore~TasteScore', @table = 'cake1';

208 - Invalid object name 'cake1'.

and

EXEC procMeans @collist = 'schema_id', @table = 'sys.all_objects';

Variable	N	Mean	Maximum	Minimum	Range	StdDev
schema_id	2092	3	4	1	3	0.1885

Oops! OK, this probably isn’t the end of the world, or even the beginning of an end, or even an error, but do we really want our users doing this? Well, I guess, the answer is, it depends on the user and the access we give them. I was able to do this because I’m logged in as a sysadmin. The real message in this is that you need to test odd things and see what happens. This is especially important when you don’t have a QA department and very, very, very hard when you are testing your own code.

Anyways, this is error testing, not security, so I’m going to ignore this one, for now.

Here are the obvious problems that 10 minutes of testing produced.

  • Tilde at the end of the collist variable.
  • Incorrectly spelled columns, this includes double tilde’s and spaces
  • Semi-colons in the code.
  • Bad table names (this can produce more than one error).

And probably a few more that I missed but I’ll trap those with a generic message.

Here is the modified error block.

BEGIN TRY
   EXEC SP_EXECUTESQL @SQL
END TRY
BEGIN CATCH
   IF @collist = '' or @table = ''
      SELECT '@collist or @table cannot be empty.'
   ELSE IF RIGHT(@collist, 1) = '~' OR RIGHT(@table, 1) = '~'
      SELECT 'The @collist or @table variables cannot end in a tilde.'
   ELSE IF CHARINDEX(@table,'~') > 0
      SELECT 'The @table variable does not recognize tildes'
   ELSE IF CHARINDEX(';',@collist) > 0 OR CHARINDEX(';',@table) > 0
      SELECT 'Please, ' + SYSTEM_USER + ' no semicolons in the variables.'
   ELSE IF ERROR_NUMBER() IN (102,156,208)
      SELECT 'Just to show it can be done.'
   ELSE
      SELECT CAST(ERROR_NUMBER() AS varchar) + ' - ' + ERROR_MESSAGE()
END CATCH

Obviously I’m not trying to catch every error and all I really wanted to show was what could be done. Here is a code break down.

IF @collist = '' or @table = ''
      SELECT '@collist or @table cannot be empty.'

Tests if the variables are empty.

ELSE IF LEN(@collist) > 255
	  SELECT 'Too many characters in the @collist variable.'
   ELSE IF LEN(@table) > 50
      SELECT 'Too many characters in the @table variable.

Tests if the variables are too long. I should probably just nvarchar(4000) those and not worry about the test.

ELSE IF RIGHT(@collist, 1) = '~' OR RIGHT(@table, 1) = '~'
      SELECT 'The @collist or @table variables cannot end in a tilde.'

I suspect users would do this, on a semi-regular basis, so I trapped the error to give them a decent error message.

ELSE IF CHARINDEX(@table,'~') > 0
      SELECT 'The @table variable does not recognize tildes'

Coders will hate this but casual users can react positively to it.

ELSE IF CHARINDEX(';',@collist) > 0 OR CHARINDEX(';',@table) > 0
      SELECT 'Please, ' + SYSTEM_USER + ' no semicolons in the variables.'

I only did this to show that you can have a personal touch with error messages. I firmly believe it’s good to confuse the user base as to what you can, or can’t, track.

ELSE IF ERROR_NUMBER() IN (102,156,208)
      SELECT 'Just to show it can be done.'

I wouldn’t have done this, well, maybe a trap for 208, but I wanted to show that you could, in fact, trap by error number and use the IN expression here.

ELSE
      SELECT CAST(ERROR_NUMBER() AS varchar) + ' - ' + ERROR_MESSAGE()

Finally, the catch all, for everything I didn’t test for. You can do fairly complex things with error trapping. For instance, you could fire a email message or log the error but that is way beyond the scope of this bad boy.

Finally, here is the current version of PROC MEANS,

IF (OBJECT_ID('procMeans', 'P') IS NOT NULL) DROP PROCEDURE procMeans
GO
CREATE PROCEDURE procMeans
@collist VARCHAR(255), -- List of column names.  I figure 255 characters is enough.
@table NVARCHAR(128) -- Table name.
AS
DECLARE @x int = 0; -- Counter for character position.
DECLARE @colname VARCHAR(50); -- Variable to build column names.
DECLARE @SQL NVARCHAR(4000)

WHILE @x <= LEN(@collist) -- Loops through column list.
BEGIN
    SET @x = @x + 1

    IF SUBSTRING(@collist, @x, 1) = '~' OR LEN(@collist) < @x	
    BEGIN
        IF LEN(@SQL) IS NULL
            SET @SQL = 'WITH cte AS (SELECT ''' + @colname + ''' AS Variable, ' + 
            @colname + ' AS Value FROM ' + @table
        ELSE
            SET @SQL = @SQL + ' UNION ALL SELECT ''' + @colname + ''', ' + @colname + 
            ' FROM ' + @table 

    IF LEN(@collist) >= @x
        SET @colname = '';

    END
    ELSE
        SET @colname = CONCAT(@colname,SUBSTRING(@collist, @x, 1)); -- Builds @colname.
END

SET @SQL = @SQL + ') 
SELECT Variable,
COUNT(*) AS N,
AVG(value) AS Mean,
MAX(value) AS Maximum,
MIN(value) AS Minimum,
MAX(value) - MIN(value) AS Range,
ROUND(STDEV(value),4) AS StdDev
FROM cte
GROUP BY Variable'

BEGIN TRY
	EXEC SP_EXECUTESQL @SQL
END TRY
BEGIN CATCH
    IF @collist = '' or @table = ''
        SELECT '@collist or @table cannot be empty.'
    ELSE IF RIGHT(@collist, 1) = '~' OR RIGHT(@table, 1) = '~'
        SELECT 'The @collist or @table variables cannot end in a tilde.'
    ELSE IF CHARINDEX(@table,'~') > 0
        SELECT 'The @table variable only accepts a single table.'
    ELSE IF CHARINDEX(';',@collist) > 0 OR CHARINDEX(';',@table) > 0
        SELECT 'Please, ' + SYSTEM_USER + ' no semicolons in the variables.'
    ELSE IF ERROR_NUMBER() IN (102,156,208)
        SELECT 'Just to show it can be done.'
    ELSE
        SELECT CAST(ERROR_NUMBER() AS varchar) + ' - ' + ERROR_MESSAGE()
END CATCH
GO

EXEC procMeans @collist = 'PresentScore~TasteScore', @table = 'cake';

Variable	N	Mean	Maximum	Minimum	Range	StdDev
PresentScore	20	76	93	56	37	9.3768
TasteScore	20	81	94	72	22	6.6116
14 Feb

What Did I Just Create?

So you create a stored procedure but you don’t have a handy website like this, where you post stored procedures, to look it up. You’ve got to go to the database. In SQL Server you have all kinds of options, although some of them kind of suck, whereas with MySQL you really only have one, well two if you count MySQL Workbench which I don’t even have installed.

In SQL Server Manager Studio you can navigate to your database / Programmability / Stored Procedures. This is probably the best format because it will open it right into the New Query window.

You can use the sp_helptext stored procedure as follows:

sp_helptext @objname = ‘dbo.procMeans’;

This is a decent and readable format as well.

You can also use either of the following two queries,

SELECT definition
FROM sys.sql_modules
WHERE object_id = (OBJECT_ID(N’sas.dbo.procMeans’));

SELECT OBJECT_DEFINITION (OBJECT_ID(N’sas.dbo.procMeans’));

But they both kind of suck because it drops the entire query on one line with no formatting. It’s very hard to read.

Now, congratulate me for that, I just copied the msdn, like a pro blogger.  All I need to do now is forget to credit the msdn and I’d be a super pro blogger.  You can find the msdn page here: http://msdn.microsoft.com/en-us/library/ms345443.aspx#TsqlProcedure

MySQL, on the hand, has an interesting alternative. You use the SHOW CREATE PROCEDURE command. It works like this,

mysql>mysql> DELIMITER //
mysql> CREATE PROCEDURE test()
    -> BEGIN
    -> SELECT 1,2,3;
    -> END
    -> //
Query OK, 0 rows affected (0.00 sec)

mysql> DELIMITER ;
mysql> SHOW CREATE PROCEDURE test;
+-----------+----------------------------------------------------------------+--
----------------------------------------------------------------------------+---
-------------------+----------------------+--------------------+
| Procedure | sql_mode                                                       | C
reate Procedure                                                             | ch
aracter_set_client | collation_connection | Database Collation |
+-----------+----------------------------------------------------------------+--
----------------------------------------------------------------------------+---
-------------------+----------------------+--------------------+
| test      | STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | C
REATE DEFINER=`root`@`localhost` PROCEDURE `test`()
BEGIN
SELECT 1,2,3;
END | latin1               | latin1_swedish_ci    | latin1_swedish_ci  |
+-----------+----------------------------------------------------------------+--
----------------------------------------------------------------------------+---
-------------------+----------------------+--------------------+
1 row in set (0.00 sec)

Yes, it is a thing of beauty.  It’s all cramped like that because it jams the procedure into a single field. But, on the good side, I didn’t need the manual to look that one up.

Oh, and just ignore that root@localhost bit.  Pretend it isn’t even there.  Because it isn’t.  I’d never do that……

13 Feb

Separating A String Based On A Delimiter

In PHP it’s really easy to split a string. You use the explode() function and before you know it you have a populated array. For instance,

$var = explode(“,”,”col1,col2,col3″);

Would produce an array as follows,

$var[0] = “col1”;
$var[1] = “col2”;
$var[2] = “col3”;

Simple, clean and straight-forward. Unfortunately it’s not like that with SQL.

Now, I’m certain there are better ways to do this, but this solution will work. What I do here is loop through a string using a WHILE loop and break it on the delimiter that is specified in the code, in this case the tilde (~). I like the tilde for delimiters because you don’t find it in the middle of your data and it doesn’t interfere with SQL in any way so you don’t have to worry much about it.

It’s really simple code so no one should have any trouble with it, except, perhaps, to bash it.

DECLARE @string varchar(50) = 'col1~col2~col3~col1 and col2';
DECLARE @x int = 0;
DECLARE @table varchar(50);

WHILE @x <= LEN(@string)
BEGIN
    SET @x = @x + 1

    IF SUBSTRING(@string, @x, 1) = '~' OR LEN(@string) < @x
    BEGIN
        PRINT @table;
	SET @table = '';
    END
    ELSE
    SET @table = CONCAT(@table,SUBSTRING(@string, @x, 1));
END

Results

col1
col2
col3
col1 and col2

All I wanted to do here was take a string and separate it based on the delimiter. In a later post I’ll show you why I did this and why I really don’t need anything more complicated than this.

12 Feb

Count Of A Specific Character In A String

I thought I’d look at something that comes up semi-regularly. In this case, how to count the number of times a string appears in a string.

Here is the string,

DECLARE @string nvarchar(50) = ‘Now is the time’;

Step 1

Determine the length of the string.

SELECT LEN(@string); — 15

Step 2

Replace the string you are searching for, in the string you are searching, with empty space.

SELECT REPLACE(@string, ‘ ‘, ”); — Nowisthetime

Step 3

Calculate the length of this new string.

SELECT LEN(REPLACE(@string, ‘ ‘, ”)); — 12

Step 4

Subtract the the new length of the string from it’s original length.

SELECT LEN(@string) – LEN(REPLACE(@string, ‘ ‘, ”)) AS ‘# Spaces’; — 3

This works the same in both MySQL and SQL Server. I’m kind of surprised that no has come up with count variant that does this, but for now, if you need this information this is how you get it.

11 Feb

PROC MEANS In T-SQL

SAS has a basic procedure called PROC MEANS.  It essentially takes a column of data and performs some basic statistical analysis on it.  Specifically it looks at the following data,

  • N – number of occurrences
  • Mean
  • Maximum
  • Minimum
  • Range (Maximum – Minimum)
  • Standard Deviation.

You can read more about PROC MEANS at this page: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473539.htm.

Because this site is for my entertainment I figured I’d use the cake data as found in the above example and create that result set. It’s actually fairly easy to do as the above examples map to T-SQL functions quite easily as follows,

  • COUNT(presentscore) as N
  • AVG(presentscore) as Mean
  • MAX(presentscore) as Maximum
  • MIN(presentscore) as Minimum
  • MAX(presentscore) – MIN(presentscore) as Range
  • ROUND(STDEV(presentscore),4) as ‘Std Dev’

So to reproduce the output generated by the PROC MEANS example on that page all I need to do is the following,

SELECT 'PresentScore' as Variable, 
COUNT(presentscore) as N, 
AVG(presentscore) as Mean,
MAX(presentscore) as Maximum,
MIN(presentscore) as Minimum,
MAX(presentscore) - MIN(presentscore) as Range,
ROUND(STDEV(presentscore),4) as 'Std Dev'
FROM cake
UNION ALL
SELECT 'TasteScore', 
COUNT(tastescore), 
AVG(tastescore),
MAX(tastescore),
MIN(tastescore),
MAX(tastescore) - MIN(tastescore),
ROUND(STDEV(tastescore),4)
FROM cake

Variable	N	Mean	Maximum	Minimum	Range	Std Dev
PresentScore	20	76	93	56	37	9.3768
TasteScore	20	81	94	72	22	6.6116

There isn’t much to note here.  I build two queries, alias them, add a UNION ALL and call it a day.  Perhaps the most interesting thing is that it lets you build aggregates against a character value.  Otherwise, it’s all very straight-forward.

However, there is an even easier way to do this.

WITH cte AS
(
SELECT 'PresentScore' as Variable, PresentScore as value
FROM cake
UNION ALL
SELECT 'TasteScore', TasteScore
FROM cake
)
SELECT Variable as Variable,
COUNT(*) as N,
AVG(value) as Mean,
MAX(value) as Maximum,
MIN(value) as Minimum,
MAX(value) - MIN(value) as Range,
ROUND(STDEV(value),4) as 'Std Dev'
FROM cte
GROUP BY Variable;

The change is that we build the CTE with a union then select against it.  It’s much easier to work with.  This can also be written as an inline-view.

SELECT Variable as Variable,
COUNT(*) as N,
AVG(value) as Mean,
MAX(value) as Maximum,
MIN(value) as Minimum,
MAX(value) - MIN(value) as Range,
ROUND(STDEV(value),4) as 'Std Dev'
FROM (
SELECT 'PresentScore' as Variable, PresentScore as value
FROM cake
UNION ALL
SELECT 'TasteScore', TasteScore
FROM cake ) as A
GROUP BY Variable

I’ll be revisiting this in a later post but for right now you have a working simulation of PROC MEANS.

08 Feb

Non Equi-Joins

This is something that not a lot of people think about because 98% of the time when we do a JOIN the ON clause we toss in an = sign and call it a day. But it doesn’t have to be that way.  Lets look at some data and do a couple of joins on it.

create table test
(
	id int
);
insert into junk values (1),(2),(3),(4),(5);

SELECT t1.id as 't1.id', t2.id as 't2.id'
FROM test t1
LEFT JOIN test t2
ON t1.id = t2.id

t1.id	t2.id
1	1
2	2
3	3
4	4
5	5

This is clean, a LEFT JOIN against itself, and because it’s based on the equal sign, it’s very clean. Now, lets change it a bit,

SELECT t1.id as 't1.id', t2.id as 't2.id'
FROM test t1
LEFT JOIN test t2
ON t1.id < t2.id

t1.id	t2.id
1	2
1	3
1	4
1	5
2	3
2	4
2	5
3	4
3	5
4	5
5	NULL

What this does is return everything from the right side of the join that is greater than the value on the left side. You can also do this with the <> clause and NULL tests.

SELECT t1.id as 't1.id', t2.id as 't2.id'
FROM test t1
LEFT JOIN test t2
ON t1.id <> t2.id

1	2
1	3
1	4
1	5
2	1
2	3
2	4
2	5
3	1
3	2
3	4
3	5
4	1
4	2
4	3
4	5
5	1
5	2
5	3
5	4

This query returns every value where the left part of the join does not match the right part.

This concept is kind of simple compared to some of my other posts, it’s Friday after all, but it’s something that I know I tend to forget about and every so often, it’s useful.

06 Feb

Binomial Distribution In SQL

A few years ago I ran across a coin toss example that played out as a binomial distribution. It was really nicely done and it always stayed with me as something to do as a small coding project although I’m not really coding any more. Unfortunately I couldn’t find the project, or I’d link to it, so I decided to do something like it with T-SQL.

However, before I start coding we need to understand what I’m doing here, and for that I turn to the source of all my lizard pictures: Wikipedia.

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

Source: http://en.wikipedia.org/wiki/Binomial_distribution

Got it, right? Well, to be honest, it’s been a long time since I took a statistics course too. Basically, a binomial distribution is a sequence of events (like a coin toss) and calculating the likelihood that a given sequence would occur. For instance, if you toss a coin 5 times, and repeat that sequence 1000 times, how many times would you toss 5 heads in a row? That, in effect, is what I’m doing here.

And just to be clear:

I’m being more than a little bit screwball with this. Do not try this without adult supervision or without a heavy dose of Red Bull, because as we all know, anything is possible when you are amped out on caffeine.

I’m also positive that I could clean this up a bit more but I’m somewhat time-constrained and spending 4 hours on a project like this is about 2 more than I want to commit.

I’ve included a number of comments in the code to explain the details but in a nutshell, I hope, here is what it’s doing.

  1. Creates a loop based on the length of the sequence and the number of times to run it.
  2. Each time the coin is tossed the row position and the column position is written to a temporary table (#Temp).
  3. A pivot table is generated from the data.  This why I have the @ctr set to 1.  There may be a better way of doing this but it happened to work.
  4. The PIVOT table is written to table #Temp1 so it can be more easily cleansed.
  5. Data is selected from #Temp1 and cleaned up by removing headers, NULL values from the result, and unneeded fields for the final output.

The key to the whole thing is that the actual sequence of the throws doesn’t matter to SQL.  We just need to know that throw 3000 produced a result for a specific row and column which we record in a table.  Once we have that information we simply need to produce a result set from it.  It’s very counter-intuitive in a sense because we tend to think of the process as a sequence of HTHHH and we try to record that.  What I’m doing instead is recording where on the table the first H would be, then the second, then the third, etc. and recording how each time the sequence landed in each position on the table.  Once, I have that data I aggregate it and can produce the final result set.

In a sense I’m playing Plinko with T-SQL.

Finally,  I’m still working on how to put code in WordPress.  I want it to look like how I write it, which is the case here, but I also want you to be able to easily copy/paste it, which is not so much the case, yet.  However, you should be able to copy/paste and have the code work or at least it tested out correctly in the latest version of FireFox.

I’m going back to shorter things for awhile.

IF object_id('binomial','P') IS NOT NULL
DROP PROCEDURE binomial
GO
CREATE PROCEDURE binomial
	@totSequences int = 1000,  -- The total sequences to run
	@seqLength int = 5         -- The sequence length
AS
DECLARE @col int = @seqLength + 1;  -- Center point of sequence
DECLARE @ctr int = 1;               -- Used for pivot calculation.  Is always 1.
DECLARE @random int;                -- Returns 1 or 2. If 1 it is +1.  If 2 it is -1.
DECLARE @row int;                   -- Row counter for pivot.  There are alternatives.

CREATE TABLE #Temp
(
	rowNum int,            -- Row number from @row counter.
	result int,            -- This is the current value of the running calculation, 
			       -- for instance, after the first toss it would be 5 or 7.
	ctr int default 1      -- Counter, always 1.
)

WHILE (@ctr <= @totSequences * @seqLength)  -- Total number of coin tosses.
BEGIN
	SET @random = rand() * 2;  -- Generates H/T toss.
	IF @random = 1
		SET @col = @col + 1;   -- Heads
	ELSE
		SET @col = @col - 1;   -- Tails

	SET @row = @ctr % @seqLength;  -- Determines the row number.  The default settings 
                                       -- are 0 to 4.
				       -- If 0 it is set to 5 in the next section.
	IF @row = 0
		SET @row = @seqLength;

	INSERT INTO #Temp VALUES (@row, @col, 1);  
	SET @ctr += 1;                             -- Increments loop counter
	IF @row = @seqLength
		SET @col = @seqLength + 1;
END
SELECT *
INTO #Temp1
FROM (
SELECT *
FROM #Temp 
PIVOT
(
COUNT(ctr)
FOR result IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11]) 
) as A ) as B
SELECT 
IIF([1] = 0, ' ', cast([1] as nvarchar)) AS '-5', -- This part is not self-sustaining.  If you 
IIF([2] = 0, ' ', cast([2] as nvarchar)) AS '-4', -- expand the sequence range you need to add 
IIF([3] = 0, ' ', cast([3] as nvarchar)) AS '-3', -- more columns here and to the pivot table.
IIF([4] = 0, ' ', cast([4] as nvarchar)) AS '-2',  
IIF([5] = 0, ' ', cast([5] as nvarchar)) AS '-1', -- The CAST and IIF are done to remove the 
IIF([6] = 0, ' ', cast([6] as nvarchar)) AS '0',  -- NULLs or a 0 from the final result.  It 
IIF([7] = 0, ' ', cast([7] as nvarchar)) AS '+1', -- just looks better that way.
IIF([8] = 0, ' ', cast([8] as nvarchar)) AS '+2',
IIF([9] = 0, ' ', cast([9] as nvarchar)) AS '+3',   
IIF([10] = 0, ' ', cast([10] as nvarchar)) AS '+4', 
IIF([11] = 0, ' ', cast([11] as nvarchar)) AS '+5' 

FROM #Temp1 ORDER BY rowNum
go
exec binomial 

Clean results as produced by the code above.
-5	-4	-3	-2	-1	0	+1	+2	+3	+4	+5
 	 	 	 	515	 	485	 	 	 	 
 	 	 	248	 	498	 	254	 	 	 
 	 	127	 	375	 	380	 	118	 	 
 	59	 	263	 	376	 	254	 	48	 
22	 	169	 	311	 	320	 	149	 	29

Unclean results that hopefully, maybe, make a bit more sense.

rowNum	1	2	3	4	5	6	7	8	9	10	11
1	0	0	0	0	516	0	484	0	0	0	0
2	0	0	0	265	0	481	0	254	0	0	0
3	0	0	128	0	374	0	373	0	125	0	0
4	0	56	0	265	0	362	0	247	0	70	0
5	32	0	154	0	317	0	302	0	165	0	30
05 Feb

T-SQL Craps

I guess sometimes I’m still a programmer at heart.  Maybe not a great one but one nonetheless because I decided to write a SQL stored procedure that plays Craps.  Craps is an easy game so it’s easy to write a program for it.  All you need is a loop, a few variables and a few IF statements.  The only mildly complex thing I did was to create a temporary table to track the results of each roll.  Otherwise, it’s just roll, test the results for a W/L, write the results to a temporary table and then use a SELECT to return the results to the screen.

Here’s the code:

IF object_id(‘craps’,’P’) IS NOT NULL
DROP PROCEDURE craps
GO
CREATE PROCEDURE craps
AS
DECLARE @dice1 int;
DECLARE @dice2 int;
DECLARE @roll int = 1; — Roll counter, also used for sequencing the results.
DECLARE @exit int = 0; — Set to 1 when result determined
DECLARE @point int; — If first roll not a W/L point is target roll
DECLARE @throw int; — dice1 + dice2
DECLARE @result varchar(50); — When W/L determined this is the result

CREATE TABLE #Temp — Temporary table will suicide when leaves scope of stored proceduer so no DROP TABLE.
(
roll int,
dice1 int,
dice2 int,
result varchar(50)
);

WHILE (@exit <> 1) — Runs until result determined
BEGIN
SET @dice1 = ceiling(rand() * 6); — Set roll result.
SET @dice2 = ceiling(rand() * 6);
SET @throw = @dice1 + @dice2;

IF @throw IN (7,11) AND @roll = 1 — If 7 or 11 on first roll the player wins.
BEGIN
SET @exit = 1;
SET @result = ‘Natural Win’;
END
ELSE IF @throw IN (2,3,12) AND @roll = 1 — If 2, 3 or 12 on first roll the player loses.
BEGIN
SET @exit = 1;
SET @result = ‘Craps Lose’;
END
ELSE IF @roll = 1 — Sets the point that the player must roll before a 7 to win in future rolls.
BEGIN
SET @point = @throw
END
ELSE IF (@throw = 7) — Player loses.
BEGIN
SET @result = ‘Loser’;
SET @exit = 1
END
ELSE IF (@throw = @point) — Player Wins.
BEGIN
SET @result = ‘Winner’;
SET @exit = 1
END
INSERT INTO #Temp VALUES (@roll,@dice1,@dice2,@result); — Inserts results into temporary table.
SET @roll = @roll + 1;
END
SELECT roll, dice1, dice2, COALESCE(result,”) as Result FROM #Temp ORDER BY roll;
GO

EXEC craps;

Results,

roll	dice1	dice2	Result
1	5	3	
2	3	6	
3	3	3	
4	3	3	
5	6	6	
6	6	6	
7	3	5	Winner