11 Sep

SQL Server: Parsing A String Into Rows

I haven’t written a SQL article in, well, quite awhile, and I guess this won’t really count as one either.  Anyways, I tossed this up over on the SQL Team forum and I liked it so I’m putting it here.

The solution relied on a couple of arcane concepts and functions.  First, I used PARSENAME which parses a string separated by periods.  It’s basically the same idea as SUBSTRING_INDEX in MySQL only a bit less flexible because of the period.

I’ve talked about walking the tree before but, in essence, it breaks down a delimited string by joining a table of numbers and using that sequence to get, say, the third delimited value.

You have the following string.

12345678 SCINC, SCNRQ, SRPPR

The first part is the ORDER_ID and the rest of the string is the SKU’s attached to that order.  Yeah, it’s a bit weird but you see things like this in converted older system or in some really ad hoc reporting tools.

Now, we want to convert that string as follows:

Order_ID	SKU
12345678 	SRPPR             
12345678 	SCNRQ
12345678 	SCINC

Obviously, that isn’t easy but here’s one way to do it.

DECLARE @var char(30);
DECLARE @var1 char(30);
SET @var = '12345678 SCINC, SCNRQ, SRPPR';
SET @var1 = REPLACE(REPLACE(@var,SUBSTRING(@var,1,CHARINDEX(' ',@var)),''),', ','.')

SELECT SUBSTRING(@var,1,CHARINDEX(' ',@var)-1) AS Order_ID,
PARSENAME(@var1,zombie) AS SKU
FROM walkers
WHERE LEN(@var) - LEN(REPLACE(@var,' ','')) >= zombie

First, I created a walkers table as follows:

CREATE TABLE walkers (zombie int);

Which I populated with a running sequence of numbers starting at 1.

SELECT * FROM walkers

zombie
1
2
3
4
5
6
7
8
9
10
.....

Here’s what it does. @var1 does two things: it converts the comma/space to a period so PARSENAME can use it and it also removes the ORDER_ID field and using the variable makes it easier to explain. This is what it does:

SELECT REPLACE(REPLACE(@var,SUBSTRING(@var,1,CHARINDEX(' ',@var)),''),', ','.')

SCINC.SCNRQ.SRPPR

This code:

SUBSTRING(@var,1,CHARINDEX(' ',@var)-1) AS Order_ID

is a straight-forward grab of the order id from the string. Basically everything from the first space backwards.

Note: My original post or the forum didn’t include the -1.

Finally, this code

PARSENAME(@var1,zombie) AS SKU

uses the PARSENAME function and the zombie field from the walkers table.

This might make more sense if we change the code slightly to include the zombies field.

SELECT SUBSTRING(@var,1,CHARINDEX(' ',@var)-1) AS Order_ID,
PARSENAME(@var1,zombie) AS SKU, zombie
FROM walkers
WHERE LEN(@var) - LEN(REPLACE(@var,' ','')) >= zombie

Order_ID	SKU	zombie
12345678	SRPPR   1
12345678	SCNRQ	2
12345678	SCINC	3

Finally, this was for a single record. If we had full order entry system you would simply join the walkers table to the order data and limit the results to the number SKU’s for each record. My code does this but since it’s only for a single record it might seem confusing. Anyways, a full outer join or a non equi join would fit the bill.

I hope this isn’t too confusing. I did it on the fly and didn’t sequence it as well as I have in most of my examples. Honestly, I just liked the solution and wanted to add it here. Too often, I think, you see people run to a function to solve this kind of problem, especially on the SQL Server side of things when there are other ways to solve the problem.

12 Feb

Count Of A Specific Character In A String

I thought I’d look at something that comes up semi-regularly. In this case, how to count the number of times a string appears in a string.

Here is the string,

DECLARE @string nvarchar(50) = ‘Now is the time’;

Step 1

Determine the length of the string.

SELECT LEN(@string); — 15

Step 2

Replace the string you are searching for, in the string you are searching, with empty space.

SELECT REPLACE(@string, ‘ ‘, ”); — Nowisthetime

Step 3

Calculate the length of this new string.

SELECT LEN(REPLACE(@string, ‘ ‘, ”)); — 12

Step 4

Subtract the the new length of the string from it’s original length.

SELECT LEN(@string) – LEN(REPLACE(@string, ‘ ‘, ”)) AS ‘# Spaces’; — 3

This works the same in both MySQL and SQL Server. I’m kind of surprised that no has come up with count variant that does this, but for now, if you need this information this is how you get it.

06 Feb

Order By Without Order By

This is a MySQL only trick as SQL Server will toss an error if you try it on that side of the house. Here is the question, sort the following data, in a guaranteed sort order, with NULLs last, without using ORDER BY.

mysql> DESC test;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id    | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)

mysql> INSERT INTO test VALUES (1),(NULL),(2);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM test;
+------+
| id   |
+------+
|    1 |
| NULL |
|    2 |
+------+

You can actually get trickier and put the NULL last so someone might assume that sort order is guaranteed but that isn’t my purpose and the question itself is a tricky cheat anyway so why be really mean when you can just be mean? Anyways, here is how you do it, step by step, although it’s really only a single step, use GROUP BY.

mysql> SELECT id, SUM(id) FROM test GROUP BY id;
+------+---------+
| id   | sum(id) |
+------+---------+
| NULL |    NULL |
|    1 |       1 |
|    2 |       2 |
+------+---------+
3 rows in set (0.00 sec)

GROUP BY in MySQL, and it may work this way in other SQL variants as well, automatically orders the columns in the GROUP BY. In my first step above you can see that NULLs are first and the result is reacting exactly as if it had an ORDER BY but we still need to get the code to sort with NULLs last. To do that we change the GROUP BY clause.

mysql> SELECT id, SUM(id) FROM test GROUP BY COALESCE(id,3);
+------+---------+
| id   | sum(id) |
+------+---------+
|    1 |       1 |
|    2 |       2 |
| NULL |    NULL |
+------+---------+
3 rows in set (0.00 sec)

This is kind of a cheat as it only works because we know that the data doesn’t have any values higher than 2, but, it does work.

mysql> SELECT id FROM test GROUP BY COALESCE(id,3);
+------+
| id   |
+------+
|    1 |
|    2 |
| NULL |
+------+
3 rows in set (0.00 sec)

Finally, we remove the SUM, leave the COALESCE and we have successfully sorted the data with an implicit ORDER BY.

Look, I’m not saying this is something you should do, EVER, it’s just something that happens to work, and it’s quirky, which I also happen to like, and for me that is fun.

MySQL page on ORDER BY OPTIMIZATION:

 

21 Jan

FIZZ BUZZ

Fizz Buzz is a simple programming test that can come up on occasion. I don’t know how often it’s used today but I saw a reference to it on another blog so I figured I’d write a post about it.

The logic here is very simple. You cycle through a sequence of numbers and if a number is divisible by 3 then you print FIZZ, if it’s divisible by 5 then you print BUZZ, and if it’s divisible by both (15) then you print FIZZ BUZZ.

In order to calculate whether an item is divisible or not you need the modulus function, also called the remainder function in some circles. The modulus function uses the % sign and is used in place of the division symbol. For instance,

SELECT 3 % 3, — returns 0
4 % 3, — returns 1
5 % 3; — returns 2

Once you have this information it’s relatively easy to set up a simple FIZZ BUZZ program. Here’s a SQL Server version,

declare @tot int = 20;
declare @ctr int = 1;
declare @fizzbuzz varchar(15);

while (@ctr <= @tot)
begin
set @fizzbuzz = @ctr;
if @ctr % 3 = 0
set @fizzbuzz += ‘ FIZZ’;
if @ctr % 5 = 0
set @fizzbuzz += ‘ BUZZ’;
print @fizzbuzz;
set @ctr = @ctr + 1;
end

Which produces,

1
2
3 FIZZ
4
5 BUZZ
6 FIZZ
7
8
9 FIZZ
10 BUZZ
11
12 FIZZ
13
14
15 FIZZ BUZZ
16
17
18 FIZZ
19
20 BUZZ

Also, be aware, that this won’t work in MySQL, for several reasons, but the main one is that you can’t roll a while loop outside of a procedure. That, and it has a different while syntax, and if syntax, and declare syntax, and, and…..

18 Jan

NULLIF

NULLIF is a relatively clean function but when I did some online searching on it I noticed that almost every explanation misses something that can burn you on a certification test.

What NULLIF does is perform a test between two values. If they match the function returns a NULL value and if they don’t it returns the first value. This is pretty common knowledge but it also does one more thing which I’ll discuss in a minute.

So here are a few samples of NULLIF at work.

mysql> SELECT NULLIF('Erik','Haselhofer');
+-----------------------------+
| NULLIF('Erik','Haselhofer') |
+-----------------------------+
| Erik                        |
+-----------------------------+
1 row in set (0.00 sec)

mysql> SELECT NULLIF('Haselhofer','Erik');
+-----------------------------+
| NULLIF('Haselhofer','Erik') |
+-----------------------------+
| Haselhofer                  |
+-----------------------------+
1 row in set (0.00 sec)

mysql> SELECT NULLIF('Erik','Erik');
+-----------------------+
| NULLIF('Erik','Erik') |
+-----------------------+
| NULL                  |
+-----------------------+
1 row in set (0.00 sec)

As you can see NULLIF returned the first value when the comparison values did not match and NULL when they did. This is all perfectly normal and every article on NULLIF mentions this and as far as I know most databases work this way. Now watch what happens in the following example:

SQL Server 2012
SELECT NULLIF(‘Erik’,1);
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value ‘erik’ to data type int.

MySQL, on the other hand, returns a warning but will process the statement.

mysql> SELECT NULLIF('Erik',1);
+------------------+
| NULLIF('Erik',1) |
+------------------+
| Erik             |
+------------------+
1 row in set, 1 warning (0.00 sec)

And one more example from SQL Server,

SELECT NULLIF(1,’2′);
Returns the value 1.

and

SELECT NULLIF(‘1’,1);
Returns NULL

What happens is that SQL Server tries to convert the data types but if you have a string that can’t be converted to a numeric type it throws an error.

Now, is this earth-shattering super secret knowledge that you must have? Nope! In fact, you’d probably catch the error very quickly. But if you are taking a SQL certification test of some sort, and you see a NULLIF, make sure that there isn’t type conversion involved because it can trip you up.

17 Jan

COALESCE

So, this scenario was actually thrown at me in an interview.  It was buried amongst a bunch of stuff and the overall process was really nasty.  Don’t worry, this won’t be nasty.

COALESCE per MSDN: Returns the first nonnull expression among its arguments.

In it’s simplest version it would work something like this,

mysql> SELECT COALESCE(NULL,NULL,'I AM NOT A NULL',NULL) as 'COALESCE';
+-----------------+
| COALESCE        |
+-----------------+
| I AM NOT A NULL |
+-----------------+
1 row in set (0.00 sec)

Yes, I’m using MySQL, again, because it’s easy to copy/paste.

As you can see it returns the string “I AM NOT A NULL” because that’s the first value that isn’t a NULL in the string.

Now that isn’t terribly useful but lets suppose you get hit with a table that looks something like this:

mysql> DESC junk;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| name  | varchar(10) | YES  |     | NULL    |       |
| yr1   | int(11)     | YES  |     | NULL    |       |
| yr2   | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

And it contains the following values:

mysql> SELECT * FROM junk;
+-------+------+------+
| name  | yr1  | yr2  |
+-------+------+------+
| Erik  | NULL |  100 |
| Frank |  100 |  200 |
+-------+------+------+
2 rows in set (0.00 sec)

Now assume this is sales data and your boss wants to see how each individual performed in the first year that they had sales. In this example you could work an IF statement and hammer it because they are only 2 possibilities. But what if those were months and the table kept 24 months of history? Yep, it’s a mess, you are free to swear, but you wouldn’t want to use an IF statement. This is one place where COALESCE can save you with the following query,

mysql> SELECT name, COALESCE(yr1,yr2) as 'First Year Sales' FROM junk;
+-------+------------------+
| name  | First Year Sales |
+-------+------------------+
| Erik  |              100 |
| Frank |              100 |
+-------+------------------+
2 rows in set (0.01 sec)

If you wanted to see his last years sales you would just reverse the order of the COALESCE.

15 Jan

MySQL 5.0 Certification Study Guide

$38.41 from Amazon.com

I’ve read a fair amount of certification books over the years and to be honest, this was, by far, the best book I’ve read. Everything you need to know is here, the sample questions are excellent, and it should be more than enough to pass the test.

Unfortunately, it’s been almost 2 years since I’ve read the book, and I no longer have a copy, but here is what I remember. The book is broken into 4 sections, one for each exam. Each section runs from 100 to 150 pages and the sections are remarkably focused on each section. Additionally, and I believe I downloaded these online, but there are a series of sample questions that come with the book. They are opened ended questions, rather than T/F or pick one of the four, and if you know them, nail them, you will pass the exams.

Without giving anything away, the developer side was harder than the administrative side. I pretty much cruised the administration part, which could have been because I did them last, or the fact that there is a lot less material there. They could bury you in minutiae but all things considered they were pretty fair in that regard. Plus, everything that I saw in the test was covered in the book. There won’t be any surprises because you didn’t read every line of the online documentation.

Also, and I have to be honest about this, this isn’t the hardest exam in the game. By contrast, the Oracle SQL Expert exam is many times harder and it’s only dealing with SQL, no administration at all. I can’t say much more but there were questions on the MySQL exam that I think you could answer correctly with no MySQL experience, conversely, doing that with the Oracle exam you would be murdered, so to speak.

Finally, the book is a bit dated a this time. We’ve moved beyond 5.0 so if you get it, get it with that caveat.

All that said, if you want to sit this exam, and I believe it’s still offered, and you don’t have 10 years experience with the product, by all means get the book. It really will help you get through the exam as it’s rock-solid test preparation.

Available at amazon.com.

11 Jan

Explaining NULL values

I haven’t written much in a very long time so I thought I should toss something this way even if it was originally written somewhere else.

The content below is my attempt at explaining a NULL value to a SQL beginner. Some of the semantics aren’t 100% beautiful but I think it mostly conveys the concepts and how to work with a NULL in a basic SELECT.

=============================

A field in SQL can consist of two things, a value, or an unknown.

A value is something like a letter, number, combination of those, an image, even a space, and even, just to make it confusing, an empty field.

An unknown, designated with a NULL, means that the field has no value that we can know. NULL is a marker for that kind of field. The reason an empty field is not a NULL is because we know that the field is empty.

When comparing values and the NULL designation there are three possible combinations: a value vs a value, a value vs a NULL and a NULL vs a NULL. Lets look at each of these.

Value vs. Value (Can return true or false)
1 = 1 — True
1 = 2 — False
1 <> 1 — False
1 <> 2 — True
” = 1 — False

Value vs. Null (Always false)
1 = NULL — False
1 <> NULL — False
” <> NULL — False as empty does not, not equal NULL
” = NULL — False as empty does not equal NULL

NULL vs. NULL (Always false)
NULL = NULL — False
NULL <> NULL — False

In other words, comparing an unknown, to anything else whether it be known, or unknown, will always have a result of unknown.

If we try to add a NULL, to either a value or another unknown, the result will also always be unknown. For example:

1 + NULL = NULL
NULL – 1 = NULL
1 * NULL = NULL
1 / NULL = NULL
NULL / 1 = NULL
NULL + NULL = NULL

A NULL, because it’s unknowable, is always unknowable even if you add a known value to it. So how do you deal with a NULL in data? By Testing if a field “IS NULL” (unknown), or “IS NOT NULL” (known/value).

1 IS NULL — False
1 IS NOT NULL — True
NULL IS NULL — True
NULL IS NOT NULL — False

In a way, you have two things going on with the data: known values, which work one way, and unknowns, which work another way and are tested for in a different way.

=============================

Also, something I didn’t include in the post, is what happens with SET operators. SET operators work in an opposite fashion so that a NULL will equal a NULL. For example in SQL Server:

SELECT NULL
UNION
SELECT NULL

will return a single NULL value, however, in theory, it should return two NULL values because a NULL cannot equal a NULL.

SELECT NULL
INTERSECT
SELECT NULL

Should return nothing but it will, in fact, return a single NULL value.

I don’t know why this is done this way, I’ll have to do some research one of these days, but as far as NULL values go, just know that it is so.

11 Jan

My Reporting Structure At A Prior Company

For a recent job interview I put together a short 2-page summary of the things I’d done at Higgins. Mainly this is a data flow diagram showing where the data started and ended. I was involved in all of it (ok, not every single piece, I had an IT manager who liked command line programming) but for the most part I did all of the design, programming, testing and implementation for our reporting.

Reporting (PDF)

What follows is an explanation that I sent to a hiring manager.

The main business ERP was developed by a company called Activant (now Epicor) Falcon. It was a lumber based ERP (used by Higgins and Golden State) that was developed in and ran on Progress (the database and language are tied together in many ways). I don’t have the total database size but it would have been several hundred gigabytes. This is where the majority of our data was stored. The Pella side of the business was much smaller and did most of their analysis in an Access database. This was merged with our data.

Data was pulled from the Falcon ERP in multiple ways and used in multiple systems as follows:

  • Ad-hoc Reporting using Cyberquery – This was standard reporting (Excel, PDF, command line, automated emails, etc.).
  • Cyberquery hold files – These were flat-file based data extracts that were very fast. We used these extensively.
  • Cyberquery data extracts – These were extracted CSV files that were eventually imported into our intranet (Horde application framework – LAMP based) for reporting, CRM (purchasing used SugarCRM), Sharepoint (KPI’s displayed there) and web-based reporting for our on-line customers.
  • We did occasionally use an ODBC connection to access this data directly from PHP but it was awkward so in general we stuck to exporting it into MySQL.

We moved the data around, for the most part, using UNIX/Linux commands (rsync, cat, mysqlimport). Cyberquery also allowed you to create macros which ran AIX commands.

This data wound up in two main interfaces (company websites and our intranet which ran on the Horde application framework). The following are all LAMP based except for some minor reporting done using SQL Server and ASP.

  • Our company websites (www.goldenstateflooring.com and www.higginshardwoods.com) where customers could see their statements, invoice history, payment history and view product information on our blowout sku’s. Everything on those sites was designed and built by me except the appearance.
  • One-Stop Reporting (see the attachment for more information). The concept behind One-Stop was to take multiple reports from our ERP (which often had very poor indexing) and combine them into a single interface (LAMP based) improving efficiency and speed (in some cases reducing a report from hours to seconds). It was focused on four specific areas: purchasing, sales, credit and customer data. For instance, One-Stop Sales allowed you to select from 15+ parameters (inside rep, outside rep, warehouse, sku) and run 6 different reports based around some of the same factors. So it was possible to see sales by rep by warehouse, sales by rep for a given sku, sales by inside rep for a sku, etc. Data was updated once/day.
  • There was also customized sales reporting for the outside sales team, including automated emails, that allowed a salesperson to view all of his sales, customer and credit information. Management could access it to see a salesperson’s performance.
  • We also did some work with exporting data to Sharepoint and SugarCRM. The latter was used by purchasing and we exported PO data to it and the former was for some simple KPI’s for individual warehouses.

This is a bit long but it does clarify the process we were using.

11 Jan

Creating a UNION statement in Cyberquery

I’m going to venture into something that is a small bit of risk, only because I no longer have access to Cyberquery documentation and that is how to perform a UNION using Cyberquery.  The reason for this is that, well, unless it’s been added, there is no such thing in Cyberquery.

In case you don’t know a UNION done to combine the results of two queries.  For instance, you had monthly historical sales data that you had broken off from the main table and now you needed to combine 2005 and 2006 data for some reason.  In MySQL the syntax for a join is the following:

<pre>
mysql> SELECT 1, 2, 3 UNION SELECT 1, 2, 3;
+—+—+—+
| 1 | 2 | 3 |
+—+—+—+
| 1 | 2 | 3 |
+—+—+—+
1 row in set (0.00 sec)

mysql> SELECT 1, 2, 3 UNION ALL SELECT 1, 2, 3;
+—+—+—+
| 1 | 2 | 3 |
+—+—+—+
| 1 | 2 | 3 |
| 1 | 2 | 3 |
+—+—+—+
2 rows in set (0.00 sec)
</pre>

The first example uses the default syntax (you can also use the keyword DISTINCT after the UNION keyword) to produce the same result.  The ALL keyword combines all of the results and omitting it, or using the DISTINCT keyword, returns only unique results.  Also, to confuse the issue a tiny bit, Oracle has additional keywords like INTERSECT, MINUS and you can even do a MERGE instead of a UNION.

So, back to Cyberquery and creating a UNION in that environment.  Here is how I would go about creating the same queries in Cyberquery.  This example is based on a UNIX environment.  If you using it on a different platform the concepts will be the same but the commands will be different.

I would write 2 Cyberqueries:

var1 = 1;
var2 = 2;
var3 = 3;

List/hold=”query1″
var1
var2
var3

var1 = 1;
var2 = 2;
var3 = 3;

List/hold=”query2″
var1
var2
var3

This will create 2 files called query1.hf and query2.hf in, well, wherever your environment puts them.

To produce a UNION ALL from the 2 hold files.

mv query1.hf query1.bak
cat query2.hf >> query1.hf

This simply appends the data from query2 to query1 and any Cyberquery written against query1 would contain both results.

To produce a UNION DISTINCT.

mv query1.hf query1.bak
cat query2.hf >> query1.hf
uniq query1.hf > query3.hf
mv query3.hf query1.hf

In this case you are appending the duplicate data to the first file, then using the uniq command to create a new file with unique data in it, and then overwriting the first file with the uniq file.  Assuming the permissions are set correctly you shouldn’t have any problems doing this.  Oh, and the first mv command creates a backup in case you screw something up.

And, in case you are wondering, all a hold file is in the Cyberquery environment is a flat file with a definition file (hfd).  The definition file provides variable type and positional information so that a Cyberquery can read it and know what the data at column 7 means.

One last point.  Do not do this with different file layouts because the .hfd file won’t understand the positioning and you’ll get something akin to garbage when you try to run the Cyberquery.  The line lengths, and the data in each line, needs to be identical as far as positioning and type are concerned.  It will allow you to write the query, probably, based on the .hfd file but the results will be, at best, interesting.

Hope this helps the few remaining Cyberquery users out there.