11 Sep

SQL Server: Parsing A String Into Rows

I haven’t written a SQL article in, well, quite awhile, and I guess this won’t really count as one either.  Anyways, I tossed this up over on the SQL Team forum and I liked it so I’m putting it here.

The solution relied on a couple of arcane concepts and functions.  First, I used PARSENAME which parses a string separated by periods.  It’s basically the same idea as SUBSTRING_INDEX in MySQL only a bit less flexible because of the period.

I’ve talked about walking the tree before but, in essence, it breaks down a delimited string by joining a table of numbers and using that sequence to get, say, the third delimited value.

You have the following string.


The first part is the ORDER_ID and the rest of the string is the SKU’s attached to that order.  Yeah, it’s a bit weird but you see things like this in converted older system or in some really ad hoc reporting tools.

Now, we want to convert that string as follows:

Order_ID	SKU
12345678 	SRPPR             
12345678 	SCNRQ
12345678 	SCINC

Obviously, that isn’t easy but here’s one way to do it.

DECLARE @var char(30);
DECLARE @var1 char(30);
SET @var = '12345678 SCINC, SCNRQ, SRPPR';
SET @var1 = REPLACE(REPLACE(@var,SUBSTRING(@var,1,CHARINDEX(' ',@var)),''),', ','.')

SELECT SUBSTRING(@var,1,CHARINDEX(' ',@var)-1) AS Order_ID,
PARSENAME(@var1,zombie) AS SKU
FROM walkers
WHERE LEN(@var) - LEN(REPLACE(@var,' ','')) >= zombie

First, I created a walkers table as follows:

CREATE TABLE walkers (zombie int);

Which I populated with a running sequence of numbers starting at 1.

SELECT * FROM walkers


Here’s what it does. @var1 does two things: it converts the comma/space to a period so PARSENAME can use it and it also removes the ORDER_ID field and using the variable makes it easier to explain. This is what it does:

SELECT REPLACE(REPLACE(@var,SUBSTRING(@var,1,CHARINDEX(' ',@var)),''),', ','.')


This code:

SUBSTRING(@var,1,CHARINDEX(' ',@var)-1) AS Order_ID

is a straight-forward grab of the order id from the string. Basically everything from the first space backwards.

Note: My original post or the forum didn’t include the -1.

Finally, this code

PARSENAME(@var1,zombie) AS SKU

uses the PARSENAME function and the zombie field from the walkers table.

This might make more sense if we change the code slightly to include the zombies field.

SELECT SUBSTRING(@var,1,CHARINDEX(' ',@var)-1) AS Order_ID,
PARSENAME(@var1,zombie) AS SKU, zombie
FROM walkers
WHERE LEN(@var) - LEN(REPLACE(@var,' ','')) >= zombie

Order_ID	SKU	zombie
12345678	SRPPR   1
12345678	SCNRQ	2
12345678	SCINC	3

Finally, this was for a single record. If we had full order entry system you would simply join the walkers table to the order data and limit the results to the number SKU’s for each record. My code does this but since it’s only for a single record it might seem confusing. Anyways, a full outer join or a non equi join would fit the bill.

I hope this isn’t too confusing. I did it on the fly and didn’t sequence it as well as I have in most of my examples. Honestly, I just liked the solution and wanted to add it here. Too often, I think, you see people run to a function to solve this kind of problem, especially on the SQL Server side of things when there are other ways to solve the problem.

07 May

Returning The Bottom Two Items (No One Will Like This)

In an interview, guys like to do slightly exotic things.  Often, they’re hitting you with a problem that is solved pretty easily on one platform, but not so easily on another.  For instance, finding the top, or bottom 2, of an ordered result is pretty easy with the ROW_NUMBER function, but it’s less easy in MySQL.  I guess I’m feeling persnickety (it’s good that I have a spell check) because I’m going to come up with an exotic answer to the problem.

By the way, this is, really, a very unfair interview question on the MySQL side of the house, except, maybe in rare cases.  The kind of guy who knows this sort of thing, especially this approach, in an interview, is a freak.  Period.  I had to process this in the background for awhile until I sat down and came up with it.  In fact, really, until the SQL Cookbook entered my life I had never really even used GROUP_CONCAT.

Note: This is a MySQL thing. In SQL Server you’d just throw out a ROW_NUMBER and move on.

So, we’ll be using my old military spending data.  Here it is, all normalized, clean, and showing incredible attention to detail (I copied it off the SIRI site, tossed it into an editor and created a table with it because I’m pro).

mysql> SELECT * FROM militaryspending;
| country                | spending | gdpperc | region        | id  |
| Seychelles             |      9.3 |     1.3 | Africa        |   1 |
| Cape Verde             |      9.7 |     0.5 | Africa        |   2 |
| Mauritius              |     10.1 |     0.1 | Africa        |   3 |
| Belize                 |     15.7 |     1.1 | North America |   4 |
| Moldova                |     20.8 |     0.3 | Europe        |   5 |
| France                 |    62535 |     2.3 | Europe        | 122 |
| United Kingdom         |    62685 |     2.6 | Europe        | 123 |
| Russia                 |    71853 |     3.9 | Asia          | 124 |
| China                  |   142859 |     2.1 | Asia          | 125 |
| United States          |   711421 |     4.8 | North America | 126 |
126 rows in set (0.01 sec)

Actually, I did clean it up.  It didn’t have an ID/primary key field before.

Anyways, we have 6 regions and we want the result set to only show the lowest two for each region.

To do this we’re first going to do something completely weird, use the GROUP_CONCAT function combined with the SUBSTRING_INDEX function. The code and the result will look like this.

mysql> SELECT region,
    -> SUBSTRING_INDEX(GROUP_CONCAT(country ORDER BY spending DESC),',',-2) AS b
    -> FROM militaryspending
    -> GROUP BY region;
| region        | bottomTwo                    |
| Africa        | Cape Verde,Seychelles        |
| Asia          | Mongolia,Timor Leste         |
| Australia     | New Zealand,Papua New Guinea |
| Europe        | Malta,Moldova                |
| North America | Nicaragua,Belize             |
| South America | Paraguay,Guyana              |
6 rows in set (0.00 sec)

So, technically, you have provided an answer to a sloppy interview question. These are, the two lowest military spenders by region.

What you still need to do is provide the results, in an ordered list, with spending data included.  They will ask for it, immediately.  Here’s how I did this although it’s not really an optimal solution by any means.

mysql> SELECT a.region, a.country, a.spending
    -> FROM militaryspending a
    -> (
    -> SELECT region,
    -> SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY spending DESC),',',-2) AS bottomTwo
    -> FROM militaryspending
    -> GROUP BY region
    -> ) b
    -> ON a.region = b.region
    -> WHERE FIND_IN_SET(a.id, b.bottomtwo) > 0
    -> ORDER BY a.region, a.spending;
| region        | country          | spending |
| Africa        | Seychelles       |      9.3 |
| Africa        | Cape Verde       |      9.7 |
| Asia          | Timor Leste      |     27.3 |
| Asia          | Mongolia         |     81.4 |
| Australia     | Papua New Guinea |     61.1 |
| Australia     | New Zealand      |     1792 |
| Europe        | Moldova          |     20.8 |
| Europe        | Malta            |     63.5 |
| North America | Belize           |     15.7 |
| North America | Nicaragua        |     54.7 |
| South America | Guyana           |     29.9 |
| South America | Paraguay         |      250 |
12 rows in set (0.00 sec)

I should probably explain this.

First, I join the earlier query to the militaryspending table in a straight old inner join. This give me 126 results with the country and my GROUP_CONCAT tied together. Then all I do is write the WHERE clause by using FIND_IN_SET to go hunting for a country in the GROUP_CONCAT results. If it finds something it will return a result greater than 1, and there you go.

If you do this with variables, and some IF magic, that will produce a better result because you won’t need the join. On the other hand, if you toss this at an interviewer, unless they are very good, they will have no idea what you just hit them with.  You’ll either get big points for teaching them something, or, they’ll bounce you, for teaching them something.

Have fun!