16 May

SAS: NMISS and another installment of SAS vs. The World

First, lets look at a specific function, NMISS.  NMISS is used to count the number of missing records.  Think of it as a NULL counter. First, some data:

proc sql;
SELECT *
FROM mysql.demog
quit;

ID         Age  Sex      Date
-----------------------------
A001        21  m    05/22/07
A002        32  m    06/15/06
A003        24  f    08/17/07
A004         .       01/27/06
A005        44  f    02/24/05
A007        39  m    11/11/05

Here is NMISS in action.

proc sql;
SELECT COUNT(age), NMISS(age), COUNT(*) - COUNT(age)
FROM mysql.demog
quit;

----------------------------
5         1         1

Don’t get mad, I, intentionally, didn’t alias the columns. So, each part of the SELECT did the following:

COUNT(age)

Counted all columns that did not contain a NULL.

NMISS(age)

Counted all NULL values.

COUNT(*) – COUNT(age)

This part used COUNT(*), which counts all the rows in the table, including NULLs and then subtracts all the records in the age column that are not NULL. It is the equivalent of the NMISS function.

So, why didn’t I alias the columns? Because my first try didn’t work. Here’s a sample that I ran in MySQL (SQL Server does the same thing.

mysql> select count(age) AS 'count(age)' FROM junk;
+------------+
| count(age) |
+------------+
|         21 |
+------------+
1 row in set (0.00 sec)

All is good in the world. Now, let’s try that with SAS, because, as we all know by now, something weird is about to happen.

proc sql;
SELECT count(*) AS 'count(*)'
FROM mysql.demog
quit;

                              __________
                              22
ERROR 22-322: Expecting a name.

Yup, it errors. I guess it doesn’t like aliasing a function. So, what’s the workaround? You use the label command, which, I guess, explains why they have it.

proc sql;
SELECT count(*) label='count(*)'
FROM mysql.demog
quit;

count(*)
--------
       6

Man, SAS, is just a special little snowflake.

02 May

Days Between Two Dates (Excluding Weekends)

I’m still having fun with the SQL Cookbook (I need to review it and I think I’ve already said that) and he talks about calculating the difference between two dates, which is easy, but then excluding weekends, which is less easy, but not that hard.

Because I’m a Monday Morning SQL writer, does that analogy really work here, I’m going to offer a different solution to what the author proposed.  Technically, all I’m doing is changing the SQL, as this is basically what he proposes.  I’m not going to show his code in this case as what he did is a bit long and he’s doing a couple of other things in it so I’ll just show you how to do it.

This will go through a number of steps and follow a similar pattern to my last SQL example in that it will join to a sequence of numbers.

1. Calculate the number of days between two dates.

mysql> SELECT DATEDIFF('2007-08-01','2007-01-01') as diff;
+------+
| diff |
+------+
|  212 |
+------+

So, there are 212 days between the two dates. Simple enough but we still have to get rid of weekends, which will take a few more steps.

One other thing that is important is to consider whether you want to include the first date in our data. The answer is yes in this case so our numbers table will need to include a zero. I’ve added it to the table, however, yesterday’s example needed to start at 1, so it did not have a zero.  In other words, if you include the two date ranges there are actually 213 dates in the list.

2. Select data from the numbers table.

In a normal example we would have to do a join but because we’re calculating the difference between two dates all we need to do is a simple select.

mysql> SELECT DATEDIFF('2007-08-01','2007-01-01') as totDays,
    -> id as diff
    -> FROM numbers
    -> WHERE id <= DATEDIFF('2007-08-01','2007-01-01');
+---------+------+
| totDays | diff |
+---------+------+
|     212 |    0 |
|     212 |    1 |
|     212 |    2 |
|     212 |    3 |
|     212 |    4 |

Snipped...

|     212 |  208 |
|     212 |  209 |
|     212 |  210 |
|     212 |  211 |
|     212 |  212 |
+---------+------+
212 rows in set (0.00 sec)

So now we have a list of 212 numbers. All we need to do is convert those to dates as follows.

3. Build a list of dates.

mysql> SELECT DATEDIFF('2007-08-01','2007-01-01') as totDays,
    -> id as diff,
    -> ADDDATE('2007-01-01',id) as newDate
    -> FROM numbers
    -> WHERE id <= DATEDIFF('2007-08-01','2007-01-01');
+---------+------+------------+
| totDays | diff | newDate    |
+---------+------+------------+
|     212 |    0 | 2007-01-01 |
|     212 |    1 | 2007-01-02 |
|     212 |    2 | 2007-01-03 |
|     212 |    3 | 2007-01-04 |
|     212 |    4 | 2007-01-05 |

Snipped...

|     212 |  208 | 2007-07-28 |
|     212 |  209 | 2007-07-29 |
|     212 |  210 | 2007-07-30 |
|     212 |  211 | 2007-07-31 |
|     212 |  212 | 2007-08-01 |
+---------+------+------------+
212 rows in set (0.00 sec)

4. Evaluate each date using the DAYOFWEEK function.

We’re going to clean this up in a second, but for now we’ll simply add the DAYOFWEEK function to the code. Note that 1 and 7 are Sunday and Saturday.

mysql> SELECT DATEDIFF('2007-08-01','2007-01-01') as totDays,
    -> id as diff,
    -> ADDDATE('2007-01-01',id) as newDate,
    -> DAYOFWEEK(ADDDATE('2007-01-01',id)) as dayOfWeek
    -> FROM numbers
    -> WHERE id <= DATEDIFF('2007-08-01','2007-01-01');
+---------+------+------------+-----------+
| totDays | diff | newDate    | dayOfWeek |
+---------+------+------------+-----------+
|     212 |    0 | 2007-01-01 |         2 |
|     212 |    1 | 2007-01-02 |         3 |
|     212 |    2 | 2007-01-03 |         4 |
|     212 |    3 | 2007-01-04 |         5 |
|     212 |    4 | 2007-01-05 |         6 |

Snipped...

|     212 |  208 | 2007-07-28 |         7 |
|     212 |  209 | 2007-07-29 |         1 |
|     212 |  210 | 2007-07-30 |         2 |
|     212 |  211 | 2007-07-31 |         3 |
|     212 |  212 | 2007-08-01 |         4 |
+---------+------+------------+-----------+
212 rows in set (0.00 sec)

5. Remove Saturdays and Sundays.

In order to do this all we need to do is add a WHERE clause that checks for the values of 1 and 7 which are Saturday and Sunday with the DAYOFWEEK function.

mysql> SELECT DATEDIFF('2007-08-01','2007-01-01') as totDays,
    -> id as diff,
    -> ADDDATE('2007-01-01',id) as newDate,
    -> DAYOFWEEK(ADDDATE('2007-01-01',id)) as dayOfWeek
    -> FROM numbers
    -> WHERE id <= DATEDIFF('2007-08-01','2007-01-01')     
    -> AND DAYOFWEEK(ADDDATE('2007-01-01',id)) NOT IN (1,7);
+---------+------+------------+-----------+
| totDays | diff | newDate    | dayOfWeek |
+---------+------+------------+-----------+
|     212 |    0 | 2007-01-01 |         2 |
|     212 |    1 | 2007-01-02 |         3 |
|     212 |    2 | 2007-01-03 |         4 |
|     212 |    3 | 2007-01-04 |         5 |
|     212 |    4 | 2007-01-05 |         6 |

Snipped...

|     212 |  206 | 2007-07-26 |         5 |
|     212 |  207 | 2007-07-27 |         6 |
|     212 |  210 | 2007-07-30 |         2 |
|     212 |  211 | 2007-07-31 |         3 |
|     212 |  212 | 2007-08-01 |         4 |
+---------+------+------------+-----------+
153 rows in set (0.01 sec)

Notice that we’ve returned 153 rows in the set which is how many working days there are between the two dates. Now we just need to clean up the code so we can get the number of days between the two.

6. Clean up the code.

This is super easy. Just remove the entire SELECT and replace it with a COUNT(*).

mysql> SELECT COUNT(*) as numWorkingDays
    -> FROM numbers
    -> WHERE id <= DATEDIFF('2007-08-01','2007-01-01')     
    -> AND DAYOFWEEK(ADDDATE('2007-01-01',id)) NOT IN (1,7);
+----------------+
| numWorkingDays |
+----------------+
|            153 |
+----------------+
1 row in set (0.00 sec)

You can do the same thing in SQL Server, however, the code, is a bit different. Also, the DATEFIRST value can be set so that Sunday is not the first day in a week. I won’t cover that but just be aware that it can happen. The SQL Server default is the same as it is in MySQL.

Here is the SQL Server code.

SELECT COUNT(*) as numWorkingDays
FROM numbers
WHERE id <= DATEDIFF(day,'2007-01-01','2007-08-01')
AND DATEPART(weekday,DATEADD(day,id,'2007-01-01')) NOT IN (1,7);

numWorkingDays
153

As you can see the DATEDIFF is different, it requires you to specify day (days), and you have to use DATEPART to get the specific day from a date, and not just using DAYOFWEEK in MySQL. Otherwise, it’s the same general concept.

I really didn’t add anything beyond the book except that my example is a bit cleaner and isn’t trying to do two things.

Hope it helps as this does come up periodically in business calculations.

09 Mar

Word Count

Absent any weird collations, or other unexpected oddities, getting a word count is very easy.  Here’s the code.

DECLARE @string varchar(50) = 'Now   is the time''s for all good men';

SET @string = REPLACE(@string, '''','');

SELECT LEN(REPLACE(LTRIM(RTRIM(@string)),'  ','')) -
LEN(REPLACE(@string,' ','')) + 1 AS wordCount;

wordCount
8

What it does.

  1. The SET removes any apostrophes from the data.
  2. Trims leading and trailing spaces from the string.
  3. Removes all multiple space instances from the first operation but leaves single spaces.
  4. Removes all spaces from the second operation.
  5. Subtracts the length of the first operation from the second..
  6. Adds one to the result.

On the MySQL side of the house,

mysql> SET @string = 'Now   is\' the time for all good men';
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> SET @string = REPLACE(@string, '\'', '');
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> SELECT LENGTH(REPLACE(LTRIM(RTRIM(@string)),'  ','')) -
    -> LENGTH(REPLACE(@string,' ','')) + 1 AS wordCount;
+-----------+
| wordCount |
+-----------+
|         8 |
+-----------+
1 row in set (0.00 sec)

Pretty much the same thing except that LEN, is LENGTH, you use SET instead of DECLARE and the apostrophes are escaped with \’ rather than ”.