01 Jan

Sticky: The SAS Section

I’ve had the SAS Base certification study guide for more than a year but I didn’t go “all the way” with it. It’s not a hard language but it is abstract and I’m not certain I’ll ever use it.  However, I’m burned out on SSIS writing, and by extension the next 2 projects on the list (SSAS and SSRS) so I’m going to slay the SAS Base beast, as a break.  Since the “write all kinds of blog posts” worked so well for SSIS that’s what I’m going to do here.  If everything goes well, I’ll be in and out in less than 2-weeks.

These posts should be shorter than what I did in the SSIS section.  SAS, at least at this level, doesn’t need nearly as much step-by-step explanation as SSIS.  I can imagine doubling my post count, yet, reducing my word count  by half.  At least I hope that is how it works.

Also, because I’m writing this as I go through the Study Guide, it will tend to follow that sequence.  As such I may skip things that you feel deserve more detail or even repeat some topics.

Finally, this is test preparation, so I’m not attempting to delve into every detail of a topic.  If it helps you, great, I hope it does, but, if it’s not as deep as you’d like, there are other options available.

26 May

CREATE TABLE AS

Creating a table is, more or less, similar among the three variants that I write about.  You create a table and you roll with it.  However, the CTAS statement (CREATE TABLE AS) is handled somewhat differently amongst them.  So, I’m going to hit that one up.

Really, I’m just writing this up because outside of MySQL I always forget this syntax. Also, when you create a table this way, it won’t include

MySQL

Create a copy of a a table with all of its columns included.

mysql> CREATE TABLE myjunk1 AS SELECT * FROM myjunk;
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM myjunk1;
+------+--------+
| name | number |
+------+--------+
| A    |     11 |
| B    |     22 |
| C    |     33 |
+------+--------+
3 rows in set (0.00 sec)

Create a copy of a table, however, do not include any of its columns.

mysql> CREATE TABLE myjunk1 AS SELECT * FROM myjunk WHERE 1 = 2;
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM myjunk1;
Empty set (0.00 sec)

mysql> DESC myjunk1;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | YES  |     | NULL    |       |
| number | int(11) | YES  |     | NULL    |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Also constraints are not included, for example.

mysql> ALTER TABLE myjunk ADD PRIMARY KEY(number);
Query OK, 3 rows affected (0.19 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE myjunk1 AS SELECT * FROM myjunk;
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> DESC myjunk1;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | YES  |     | NULL    |       |
| number | int(11) | NO   |     | 0       |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.01 sec)

mysql> desc myjunk;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | YES  |     | NULL    |       |
| number | int(11) | NO   | PRI | 0       |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Note that the primary key is not included in myjunk1.

I could go on but this is to help me remember the syntax.  You can read more here:

http://dev.mysql.com/doc/refman/5.6/en/create-table.html

SQL Server

Conceptually, SQL Server works in a similar fashion to MySQL, except, that they have their own syntax for the process. This won’t work:

CREATE TABLE myjunk1 AS SELECT * FROM myjunk;

Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'SELECT'.

Instead you need do something like this:

SELECT *
INTO myjunk1
FROM myjunk
WHERE number > 22;
SELECT * FROM myjunk1;

Name	Number
C	33

I had a really hard time remembering this syntax at first because I was so used to the CTAS variant.

http://msdn.microsoft.com/en-us/library/ms188029.aspx

SAS

SAS allows the CTAS variant, so I won’t repeat that but it also does something else. When you are creating a copy of a table, but don’t want to include any of it’s data, you can do the following:

proc sql;
CREATE TABLE work.myjunk
(
	name char(1),
	number int
);
INSERT INTO work.myjunk VALUES ('A',1)
                        VALUES ('B',2)
                        VALUES ('C',3);
CREATE TABLE work.myjunk1
LIKE work.myjunk;

DESCRIBE TABLE work.myjunk1;
quit;

From the log.

31         DESCRIBE TABLE work.myjunk1;
NOTE: SQL table WORK.MYJUNK1 was created like:

create table WORK.MYJUNK1( bufsize=131072 )
  (
   name char(1),
   number num
  );

Constraints, more or less, appear to follow the same rules as they do in the other variants.

All Variants – Adding A New Column

You can also create a new column simply by adding it to the result set. Be careful about using apostrophes to name the new column as it might not work.

mysql> CREATE TABLE myjunk1 AS
    -> SELECT name, number, 123 AS newCol
    -> FROM myjunk;
Query OK, 3 rows affected (0.10 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> DESCRIBE myjunk1;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | NO   |     | NULL    |       |
| number | int(11) | NO   |     | 0       |       |
| newCol | int(3)  | NO   |     | 0       |       |
+--------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM myjunk1;
+------+--------+--------+
| name | number | newCol |
+------+--------+--------+
| A    |     11 |    123 |
| B    |     22 |    123 |
| C    |     33 |    123 |
+------+--------+--------+
3 rows in set (0.00 sec)

You can also CAST / CONVERT the data type. For example.

mysql> CREATE TABLE myjunk1 AS
    -> SELECT name, number, CAST(123 AS char) AS newCol
    -> FROM myjunk;
Query OK, 3 rows affected (0.12 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> DESC myjunk1;
+--------+------------+------+-----+---------+-------+
| Field  | Type       | Null | Key | Default | Extra |
+--------+------------+------+-----+---------+-------+
| name   | char(1)    | NO   |     | NULL    |       |
| number | int(11)    | NO   |     | 0       |       |
| newCol | varchar(3) | YES  |     | NULL    |       |
+--------+------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

I’m pretty sure that SAS doesn’t work that way but I’m not ready to write that today as I’m still learning that side of the house.  Another day, another post.

25 May

SAS: INSERT / VALUES

Originally, I’d planned this post to be about using CASE in an UPDATE.  However, strangely, unexpectedly,  it turned out that the big 3 all used the same syntax.  So, this will just be a quick overview of the differences between SAS and MySQL / SQL Server using INSERT.

Here’s some SAS code that creates a table, inserts some data and then updates it.

proc sql;
CREATE TABLE work.myjunk
(
	name CHAR(1),
	number INT
);
INSERT INTO work.myjunk VALUES ('A',1)
                        VALUES ('B',2)
                        VALUES ('C',3);
UPDATE work.myjunk
SET number =
CASE WHEN name = 'A' then 11
     WHEN name = 'B' then 22
	 ELSE 33
	 END;
SELECT * FROM work.myjunk;
quit;

name    number
--------------
A           11
B           22
C           33

There are two things worth noting in this code. First the INT data type. SAS accepts it, 9 others according to the Study Guide which it will then convert to one of the two SAS data types: number and character. In this case, INT is converted to a number.

I, finally, got smart and created a work database for MySQL and work schema for SQL Server.  It should help my code look a bit more consistent, even if work. means one thing in SAS, another in MySQL and still another in SQL Server.

The other thing, which is what led me to write this post, is that SAS uses a VALUES clause before each row in an INSERT. As you’ll see in the MySQL code (it’s the same with SQL Server) we only use a single VALUES clause.

mysql> CREATE TABLE work.myjunk
    -> (
    ->  name char(1),
    ->  number int
    -> );
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO work.myjunk VALUES ('A',1),('B',2),('C',3);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> UPDATE work.myjunk
    -> SET number =
    -> CASE WHEN name = 'A' then 11
    ->      WHEN name = 'B' then 22
    ->   ELSE 33
    ->   END;
Query OK, 3 rows affected (0.02 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> SELECT * FROM work.myjunk;
+------+--------+
| name | number |
+------+--------+
| A    |     11 |
| B    |     22 |
| C    |     33 |
+------+--------+
3 rows in set (0.00 sec)

Hopefully, this makes sense.  You use a single VALUES clause for SQL Server / MySQL and a VALUES clause for each row with SAS.  And, just be aware that you can use a CASE statement in an UPDATE.

24 May

INSERT with a SET

This is something I did not know you could do, but you can use SET for an INSERT statement.  I guess I’m a noob.  Anyways, here is how it works in the three SQL variants that I write about.

SQL Server

INSERT INTO myjunk1
SET id = 1;

Msg 156, Level 15, State 1, Line 2
Incorrect syntax near the keyword 'set'.

I think it’s time I deleted the tables I’ve named junk. Anyways, no surprise that SQL Server doesn’t like it.

MySQL

mysql> INSERT INTO myjunk
    -> SET name = 'E',
    -> number = 6;
Query OK, 1 row affected (0.04 sec)

Hmmm, another junk. Anyways, it works fine here. I’ll give more detail in the next section as I’m really writing this article for SAS.

SAS

proc sql;
CREATE TABLE work.myjunk
(
	id num
);
INSERT INTO myjunk
SET id = 2;
SELECT * 
FROM work.myjunk;
quit;

      id
--------
       2

You can also add multiple values to the table as follows:

proc sql;
CREATE TABLE work.myjunk
(
	id num
);
INSERT INTO myjunk
SET id = 3
SET id = 4
SET id = 5;
SELECT * 
FROM work.myjunk;
quit;

      id
--------
       3
       4
       5

Interestingly, you can recreate a table, albeit this is a temporary table, by running a CREATE TABLE on top of it, which is why there are only 3 values in this table. If you have multiple columns in the table, you treat them exactly like you would an UPDATE statement.

proc sql;
CREATE TABLE work.myjunk
(
	id num,
	id1 num
);
INSERT INTO myjunk
SET id = 3,
   id1 = 13
SET id = 4,
   id1 = 14
SET id = 5,
   id1 = 15;
SELECT * 
FROM work.myjunk;
quit;

      id       id1
------------------
       3        13
       4        14
       5        15

Anyways, that’s pretty much it for INSERT / SET. It’ll probably come up on the SAS Advanced test, assuming I get there, and I have to admit that I’d have blown this one.

23 May

SET Operators: The SAS Special

SAS has a couple of twists worth mentioning in a bit more detail.

ALL

The difference between ALL, and an operator without it, is that ALL returns all records and does not uniquify (my spell check says that isn’t a word, but, we know better) the result set.  This includes duplicate records in either result set that it applies to.

It cannot be used with an OUTER UNION.

CORR / CORRESPONDING

Corr is something I’d never seen before and depending on the operator it behaves differently.

  • EXCEPT, UNION and INTERSECT: It removes any column that does not have the same name.
  • OUTER UNION: It overlays columns with the same name.

Corr will also use an alias when it’s assigned to a column.

Here are some examples of CORR in action.

This is a UNION and CORR removes both the age and the sex columns from the report.

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
UNION CORR
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F';
quit;

Name
--------------
Almers, C     
Bonaventure, T
Eberhardt, S  
Ivan, H       
Johnson, R    
Nunnelly, A   
Oberon, M     
Quigley, M    
Reberson, P   
Takahashi, Y  
Wilcox, E

CORR can be used with ALL. Here, because we use CORR the age and sex columns are removed.

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
UNION ALL CORR
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F';
quit;

Name
--------------
Almers, C     
Bonaventure, T
Eberhardt, S  
Ivan, H       
Johnson, R    
Nunnelly, A   
Oberon, M     
Quigley, M    
Reberson, P   
Takahashi, Y  
Wilcox, E  
Almers, C     
Bonaventure, T
Eberhardt, S  
Ivan, H       
Johnson, R    
Nunnelly, A   
Oberon, M     
Quigley, M    
Reberson, P   
Takahashi, Y  
Wilcox, E

Now, so it’s clear, here is an OUTER JOIN with CORR, written as CORRESPONDING.

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
OUTER UNION CORRESPONDING
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F';
quit;

Name                 Age  Sex
-----------------------------
Almers, C             34     
Bonaventure, T        31     
Johnson, R            43     
Reberson, P           32     
Eberhardt, S          49     
Nunnelly, A           44     
Oberon, M             28     
Quigley, M            40     
Takahashi, Y          43     
Ivan, H               22     
Wilcox, E             41     
Almers, C              .  F  
Bonaventure, T         .  F  
Johnson, R             .  F  
Reberson, P            .  F  
Eberhardt, S           .  F  
Nunnelly, A            .  F  
Oberon, M              .  F  
Quigley, M             .  F  
Takahashi, Y           .  F  
Ivan, H                .  F  
Wilcox, E              .  F

Here is a link that goes into a lot more detail on the keywords and SET operators from the SAS support site.

http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/viewer.htm#a001361224.htm

22 May

SET Operators: Intersect

INTERSECT, is essentially the reverse of EXCEPT. What it does is return unique records that match in the two tables that intersect. For example,

SELECT name, sex, age
FROM junk
INTERSECT
SELECT name, sex, age
FROM allMale

name	        sex	age
Cameron, L	M	47
Derber, B	M	25
Jones, M	M	29
King, E	        M	35
LaMance, K	M	51
Murray, W	M	27
Peterson, V	M	30
Pitts, D	M	34
Underwood, K	M	60
Warren, C	M	54

In reality it’s simply an inner join. In fact, that’s one way we would write this query in MySQL.

mysql> SELECT j.name, j.sex, j.age
    -> FROM junk j
    -> INNER JOIN allMale a
    -> USING (name, sex, age);
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

That can also be written a couple of other ways.

mysql> SELECT j.name, j.sex, j.age
    -> FROM junk j, allMale a
    -> WHERE j.name = a.name AND j.sex = a.sex AND j.age = a.age;
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

Or,

mysql> SELECT DISTINCT j.name, j.sex, j.age
    -> FROM junk j
    -> WHERE EXISTS (SELECT name FROM allMale WHERE j.name = name AND j.sex = sex 
       and j.age = age);
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

One other note, because SAS, always is just so cute. You can write an INTERSECT with SAS as follows:

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'M'
INTERSECT ALL
SELECT name, age
FROM mysql.admit;
quit;

Name                 Age
------------------------
Cameron, L            47
Derber, B             25
Jones, M              29
King, E               35
LaMance, K            51
Murray, W             27
Peterson, V           30
Pitts, D              34
Underwood, K          60
Warren, C             54

“We’re developing a new citizenry. One that will be very selective about cereals and automobiles, but won’t be able to think.”
Rod Serling

SAS is particularly vague about what ALL does in the case of INTERSECT but it’s essentially the same as UNION and UNION ALL. In other words, it returns all records that match both tables in the INTERSECT rather than only unique records which is what a simple INTERSECT would do.

SQL Server does not support INTERSECT ALL and MySQL does not support INTERSECT at all.

I hope this is clear to anyone reading this. I always found SET operators really clean. Unfortunately, we’re about to enter the Twilight Zone as SAS has one that is unique to them, but then, that’s “situation normal” for SAS.

21 May

SET Operators: OUTER UNION

This is something that I’ve never seen before.  It’s a SAS operator, well, at least in the context of this article, maybe another platform has used it as some point.

What it does is concatenate tables.  It’s not complex, and I don’t see too many tricks to it, which is a pleasant change with SAS, so a couple of examples should suffice.

proc sql;
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F'
OUTER UNION 
SELECT name, age
FROM mysql.admit
WHERE sex = 'F';
quit;

Name            Sex  Name                 Age
---------------------------------------------
Almers, C       F                           .
Bonaventure, T  F                           .
Johnson, R      F                           .
Reberson, P     F                           .
Eberhardt, S    F                           .
Nunnelly, A     F                           .
Oberon, M       F                           .
Quigley, M      F                           .
Takahashi, Y    F                           .
Ivan, H         F                           .
Wilcox, E       F                           .
                     Almers, C             34
                     Bonaventure, T        31
                     Johnson, R            43
                     Reberson, P           32
                     Eberhardt, S          49
                     Nunnelly, A           44
                     Oberon, M             28
                     Quigley, M            40
                     Takahashi, Y          43
                     Ivan, H               22
                     Wilcox, E             41

As you can see OUTER UNION simply appends the data and creates new columns for the second table. The other thing it does is set values to missing in the new, and old, rows. Bizarrely, the result set doesn’t seem to care if you have the same name. It just happily returns the new column. Fortunately, SAS has a solution for this: CORR.

CORR

I’m going to deal more with CORR in a future post but this is a useful place to put it because it’s about the only way that a OUTER UNION makes sense. What CORR does is to combine records that have the same column name. For example, we could change the prior query as follows:

proc sql;
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F'
OUTER UNION CORR
SELECT name, age
FROM mysql.admit
WHERE sex = 'F';
quit;

Name            Sex       Age
-----------------------------
Almers, C       F           .
Bonaventure, T  F           .
Johnson, R      F           .
Reberson, P     F           .
Eberhardt, S    F           .
Nunnelly, A     F           .
Oberon, M       F           .
Quigley, M      F           .
Takahashi, Y    F           .
Ivan, H         F           .
Wilcox, E       F           .
Almers, C                  34
Bonaventure, T             31
Johnson, R                 43
Reberson, P                32
Eberhardt, S               49
Nunnelly, A                44
Oberon, M                  28
Quigley, M                 40
Takahashi, Y               43
Ivan, H                    22
Wilcox, E                  41

CORR will also happily duplicate records. For example if we modify the earlier examples to use the same query it will do the following:

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
OUTER UNION CORR
SELECT name, age
FROM mysql.admit
WHERE sex = 'F';
quit;

Name                 Age
------------------------
Almers, C             34
Bonaventure, T        31
Johnson, R            43
Reberson, P           32
Eberhardt, S          49
Nunnelly, A           44
Oberon, M             28
Quigley, M            40
Takahashi, Y          43
Ivan, H               22
Wilcox, E             41
Almers, C             34
Bonaventure, T        31
Johnson, R            43
Reberson, P           32
Eberhardt, S          49
Nunnelly, A           44
Oberon, M             28
Quigley, M            40
Takahashi, Y          43
Ivan, H               22
Wilcox, E             41

That’s pretty much it for a OUTER UNION. But, just for kicks, I did a cheesy version of it in MySQL. I decided to not get cute and did it with a UNION ALL.

mysql> SELECT name, age, '', ''
    -> FROM junk
    -> WHERE sex = 'F'
    -> UNION ALL
    -> SELECT '', '', name, age
    -> FROM junk
    -> WHERE sex = 'F';
+----------------+------+----------------+------+
| name           | age  |                |      |
+----------------+------+----------------+------+
| Almers, C      | 34   |                |      |
| Bonaventure, T | 31   |                |      |
| Johnson, R     | 43   |                |      |
| Reberson, P    | 32   |                |      |
| Eberhardt, S   | 49   |                |      |
| Nunnelly, A    | 44   |                |      |
| Oberon, M      | 28   |                |      |
| Quigley, M     | 40   |                |      |
| Takahashi, Y   | 43   |                |      |
| Ivan, H        | 22   |                |      |
| Wilcox, E      | 41   |                |      |
|                |      | Almers, C      | 34   |
|                |      | Bonaventure, T | 31   |
|                |      | Johnson, R     | 43   |
|                |      | Reberson, P    | 32   |
|                |      | Eberhardt, S   | 49   |
|                |      | Nunnelly, A    | 44   |
|                |      | Oberon, M      | 28   |
|                |      | Quigley, M     | 40   |
|                |      | Takahashi, Y   | 43   |
|                |      | Ivan, H        | 22   |
|                |      | Wilcox, E      | 41   |
+----------------+------+----------------+------+
22 rows in set (0.00 sec)

And, if you decide to use CORR, it’s can literally be the same thing as a UNION ALL.

mysql> SELECT name, age
    -> FROM junk
    -> WHERE sex = 'F'
    -> UNION ALL
    -> SELECT name, age
    -> FROM junk
    -> WHERE sex = 'F';
+----------------+------+
| name           | age  |
+----------------+------+
| Almers, C      |   34 |
| Bonaventure, T |   31 |
| Johnson, R     |   43 |
| Reberson, P    |   32 |
| Eberhardt, S   |   49 |
| Nunnelly, A    |   44 |
| Oberon, M      |   28 |
| Quigley, M     |   40 |
| Takahashi, Y   |   43 |
| Ivan, H        |   22 |
| Wilcox, E      |   41 |
| Almers, C      |   34 |
| Bonaventure, T |   31 |
| Johnson, R     |   43 |
| Reberson, P    |   32 |
| Eberhardt, S   |   49 |
| Nunnelly, A    |   44 |
| Oberon, M      |   28 |
| Quigley, M     |   40 |
| Takahashi, Y   |   43 |
| Ivan, H        |   22 |
| Wilcox, E      |   41 |
+----------------+------+
22 rows in set (0.00 sec)

Happy Hunting!

20 May

SET Operators: EXCEPT

I was thinking this would be hard going into it, but, it’s actually really easy, except, that EXCEPT doesn’t exist in SQL Server.

EXCEPT is very simple.  What it does is return every unique row in the first result set, that is not in the second result set.  Here are two very simple examples:

SELECT 1 AS r1,2 AS r2,3 AS r3
EXCEPT
SELECT 3,4,5;

r1	r2	r3
1	2	3

SELECT 1 AS r1,2 AS r2,3 AS r3
EXCEPT
SELECT 1,2,3;

r1	r2	r3

Those are simple queries but they show how the EXCEPT operator works. Now, what happens if we have records that are duplicates in the first table? Well, this happens:

SELECT 1 AS r1,2 AS r2,3 AS r3
UNION ALL
SELECT 1,2,3
r1	r2	r3
1	2	3
1	2	3

(
SELECT 1 AS r1,2 AS r2,3 AS r3
UNION ALL
SELECT 1,2,3
)
EXCEPT
SELECT 3,4,5

r1	r2	r3
1	2	3

That’s really clean compared to most of my posts, and, it’s going to stay that way. The reason? I’m not going to mess with SAS and I’m going to let MySQL go right ahead not doing it. If you decide to do it in MySQL, and one of these days, I’ll do it, you need to get a bit creative with a JOIN or a couple of other options such as a correlated subquery and NOT EXISTS.

OK, I talked myself into it, but only because I literally just did this the other day.

First, I’ll create a new table that has just the records where sex = ‘M’

mysql> CREATE TABLE allMale AS SELECT * FROM junk WHERE sex = 'M';
Query OK, 10 rows affected (0.11 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT name, sex, age FROM allMale;
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

Maybe I shouldn’t have called these tables junk?

Anyways, here is a query in SQL Server using the EXCEPT operator.

SELECT name, sex, age
FROM junk
EXCEPT
SELECT name, sex, age
FROM allMale

name	        sex	age
Almers, C	F	34
Bonaventure, T	F	31
Eberhardt, S	F	49
Ivan, H	        F	22
Johnson, R	F	43
Nunnelly, A	F	44
Oberon, M	F	28
Quigley, M	F	40
Reberson, P	F	32
Takahashi, Y	F	43
Wilcox, E	F	41

Now, for the MySQL version.

mysql> SELECT DISTINCT name, sex, age
    -> FROM JUNK j
    -> WHERE NOT EXISTS (SELECT name FROM allMale WHERE j.name = name AND j.
age = age AND j.weight = weight);
+----------------+------+------+
| name           | sex  | age  |
+----------------+------+------+
| Almers, C      | F    |   34 |
| Bonaventure, T | F    |   31 |
| Johnson, R     | F    |   43 |
| Reberson, P    | F    |   32 |
| Eberhardt, S   | F    |   49 |
| Nunnelly, A    | F    |   44 |
| Oberon, M      | F    |   28 |
| Quigley, M     | F    |   40 |
| Takahashi, Y   | F    |   43 |
| Ivan, H        | F    |   22 |
| Wilcox, E      | F    |   41 |
+----------------+------+------+
11 rows in set (0.00 sec)

I had to write that about 10 times because I kept forgetting to change the SELECT clause. Yeah, for me, SELECT = ‘HARD’, EXISTS = ‘Easy’.

The only part of this query that really matters is the WHERE clause. Essentially it uses a correlated subquery to compare the two data sets and where the record NOT EXISTS in allMale, it is returned to the result set. It’s not very clean, and all things being equal, I’d rather use EXCEPT.

OK, the SELECT matters too.  I have to use the DISTINCT there so the query won’t return duplicate values, or, as the documentation I’ve read for EXCEPT says “unique values only”.

And just for kicks, Oracle doesn’t have an EXCEPT operator either. Over on that part of the ranch, they roll with MINUS.

Fortunately, I’m done with EXCEPT, except, until I do the “SAS is special” operator article.

19 May

SET Operators: UNION

The easiest set operator to deal with is a UNION.  It’s basically a data set appended to a data set.  It has a couple of minor quirks, column naming, ordering, and then, of course, SAS picked the path least chosen, but for the most part, anytime you see a UNION, or a UNION ALL, it’s going to work the same way.

The code for this example, except where noted, is in MySQL.

Here is a very simple example of a UNION.

mysql> SELECT 1,2,3
    -> UNION
    -> SELECT 4,5,6;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+---+---+---+
2 rows in set (0.00 sec)

As you can see the second SELECT is added to the first set. Typically, the data types do not have to match, as automatic type conversion will take place. Except, of course, if you are working with SAS, where the following will happen:

proc sql;
SELECT name, age
FROM mysql.admit
UNION
SELECT age, name
FROM mysql.admit;
quit;

ERROR: Column 1 from the first contributor of UNION is not the same type as its counterpart from the second.
ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.

A straight UNION will remove duplicates similar to what a DISTINCT does in a SELECT statement.

mysql> SELECT 1,2,3
    -> UNION
    -> SELECT 1,2,3
    -> UNION
    -> SELECT 3,4,5;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 3 | 4 | 5 |
+---+---+---+
2 rows in set (0.00 sec)

If you, instead, write the query as a UNION ALL it will include all rows, including any duplicates. This applies in SAS, although, they approach the ALL keyword slightly differently but I’ll deal with that in a later article.

mysql> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1,2,3;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
+---+---+---+
3 rows in set (0.00 sec)

You can also mix a UNION and a UNION ALL.

mysql> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1,2,3
    -> UNION
    -> SELECT 1,2,3;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
1 row in set (0.00 sec)

Each UNION only applies to the SELECT statements that it separates.

You can also alias a UNION, however, the alias must be applied to the first SELECT statement’s columns. You cannot apply it to later columns, well, you can, but as you can see it won’t work. For example:

mysql> SELECT 1 as one, 2 as two, 3 as three
    -> UNION
    -> Select 4,5,6;
+-----+-----+-------+
| one | two | three |
+-----+-----+-------+
|   1 |   2 |     3 |
|   4 |   5 |     6 |
+-----+-----+-------+
2 rows in set (0.00 sec)

mysql> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1 as one, 2 as two, 3 as three;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 1 | 2 | 3 |
+---+---+---+
2 rows in set (0.00 sec)

ORDER BY statements must be applied at the end of the UNION. For example:

mysql> SELECT name FROM junk ORDER BY name
    -> UNION
    -> SELECT name FROM junk;
ERROR 1221 (HY000): Incorrect usage of UNION and ORDER BY
mysql>

mysql> SELECT name FROM junk
    -> UNION
    -> SELECT name FROM junk ORDER BY name;
+----------------+
| name           |
+----------------+
| Almers, C      |
| Bonaventure, T |
| Cameron, L     |
| Derber, B      |
| Eberhardt, S   |
| Ivan, H        |
| Johnson, R     |
| Jones, M       |
| King, E        |
| LaMance, K     |
| Murray, W      |
| Nunnelly, A    |
| Oberon, M      |
| Peterson, V    |
| Pitts, D       |
| Quigley, M     |
| Reberson, P    |
| Takahashi, Y   |
| Underwood, K   |
| Warren, C      |
| Wilcox, E      |
+----------------+
21 rows in set (0.04 sec)

And now, it gets a bit funky, because SAS joined the party.

This is viable in SAS:

proc sql;
SELECT name, age
FROM mysql.admit
UNION
SELECT name
FROM mysql.admit;
quit;

Name                 Age
------------------------
Almers, C              .
Almers, C             34
Bonaventure, T         .
Bonaventure, T        31
Cameron, L             .
Cameron, L            47
Derber, B              .
Derber, B             25
Eberhardt, S           .
Eberhardt, S          49
Ivan, H                .
Ivan, H               22
Johnson, R             .
Johnson, R            43

MySQL and SQL Server will both error.

mysql> SELECT 1,2
-> UNION
-> SELECT 1,2,3;
ERROR 1222 (21000): The used SELECT statements have a different number of column

In other words, SQL Server, and MySQL, expect that a UNION operator will have the same number of columns.  SAS, doesn’t care, and simply NULLs, or in SAS terminology, sets those fields to missing.

SET operators, all things considered, are really easy to use, assuming SAS doesn’t get involved and even there, as long as you stay with a UNION or UNION ALL, it’s sort of, maybe, kind of, close.

As far as the SAS Advanced test goes, I think I’ll keep in mind that SAS allows a different number of columns, and, it doesn’t allow different data types in a column. It also has other quirks but I’ll deal with those in later posts.

18 May

GROUP BY, Again, This Time It’s Final, No Really It Is.

I think this will be the last of these.  I’ve gotten four posts out of GROUP BY and this time I’m going to double-dip, with the HAVING clause.  So, lets start with SQL Server and blow things up.

SQL Server

SELECT actlevel AL, SUM(height) AS 'sumHeight'
FROM junk
GROUP BY AL
HAVING SUM(height) > 400;

Msg 207, Level 16, State 1, Line 3
Invalid column name 'AL'.

SELECT actlevel AL, SUM(height) AS 'sumHeight'
FROM junk
GROUP BY actlevel
HAVING sumHeight > 400;

Msg 207, Level 16, State 1, Line 4
Invalid column name 'sumHeight'.

Both of these errors are saying that you can’t use an alias in a GROUP BY or HAVING clause. Honestly, this is kind of the expected behavior if you understand how SQL theoretically works (another post).

However, the other two guys I write about are just fine with it.

MySQL and SAS

mysql> SELECT actlevel AL, sum(height) as mysum
    -> FROM junk
    -> GROUP BY AL
    -> HAVING mysum > 400;
+------+-------+
| AL   | mysum |
+------+-------+
| HIGH |   491 |
| LOW  |   465 |
| MOD  |   477 |
+------+-------+
3 rows in set (0.00 sec)

proc sql;
SELECT actlevel AL, sum(height) AS mysum
FROM mysql.admit
GROUP BY AL
HAVING mysum > 400;
quit;

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', 
/, <, <=, <>, =, >, >=, ?, AND, AS, CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, 
LTT, NE, NET, OR, ^=, |, ||, ~=.

Ah, SAS, the 4th grade comedian that drove the teacher crazy. SAS requires the AS keyword.

proc sql;
SELECT actlevel AS AL, sum(height) as mysum
FROM mysql.admit
GROUP BY AL
HAVING mysum > 400;
quit;

AL       mysum
--------------
HIGH       491
LOW        465
MOD        477

So, that’s all good. I can end now, right. I mean, I wouldn’t be tempted to do something exotic, just to see what SAS, or maybe even MySQL, does with it, would I? Nope, not me.

mysql> SELECT actlevel 'AL', sum(height) as mysum
    -> FROM junk
    -> GROUP BY 'AL'
    -> HAVING mysum > 400;
+------+-------+
| AL   | mysum |
+------+-------+
| HIGH |  1433 |
+------+-------+
1 row in set (0.00 sec)

Err, OK. Maybe this will be better.

mysql> SELECT actlevel AL, sum(height) as 'mysum'
    -> FROM junk
    -> GROUP BY AL
    -> HAVING 'mysum' > 400;
Empty set, 1 warning (0.00 sec)

Oops! MySQL, ever the professionals at making it clear what went wrong. But, I wonder how SAS will handle this code.

proc sql;
SELECT actlevel 'AL', sum(height) as mysum
FROM mysql.admit
GROUP BY 'AL'
HAVING mysum > 400;
quit;

AL       mysum
--------------
HIGH      1433
HIGH      1433
LOW       1433
MOD       1433
LOW       1433
HIGH      1433
MOD       1433
MOD       1433
LOW       1433
LOW       1433
HIGH      1433
LOW       1433
MOD       1433
HIGH      1433
MOD       1433
LOW       1433
MOD       1433
HIGH      1433
LOW       1433
HIGH      1433
MOD       1433
Okey, dokey.

And,
proc sql;
SELECT actlevel AL, sum(height) as 'mysum'
FROM junk
GROUP BY AL
HAVING 'mysum' > 400;
quit;

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, 
'.', /, <, <=, <>, =, >, >=, ?, AND, AS, CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, 
LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

Lets try this, instead:

proc sql;
SELECT actlevel AL, sum(height) as 'mybum'
FROM junk
GROUP BY AL
HAVING 'mybum' > 400;
quit;

ERROR: Expression using greater than (>) has components that are of different data types.

Really? No, Really? SAS is just a giant box of endless posts.

Well, I guess it has to be this way.

proc sql;
SELECT actlevel AS AL, sum(height) AS mybum
FROM mysql.admit
GROUP BY AL
HAVING mybum > 400;
quit;

AL       mysum
--------------
HIGH       491
LOW        465
MOD        477

Or, I suppose, you could just write it like you are supposed to.

proc sql;
SELECT actlevel AS AL, sum(height) AS mybum
FROM mysql.admit
GROUP BY actlevel
HAVING sum(height) > 400;
quit;

AL       mysum
--------------
HIGH       491
LOW        465
MOD        477

I just wish I could trust SAS to write the SAS Advanced certification that cleanly but after the Base I simply don’t trust them not to use all of this stuff.

I’m out!

16 May

SAS: NMISS and another installment of SAS vs. The World

First, lets look at a specific function, NMISS.  NMISS is used to count the number of missing records.  Think of it as a NULL counter. First, some data:

proc sql;
SELECT *
FROM mysql.demog
quit;

ID         Age  Sex      Date
-----------------------------
A001        21  m    05/22/07
A002        32  m    06/15/06
A003        24  f    08/17/07
A004         .       01/27/06
A005        44  f    02/24/05
A007        39  m    11/11/05

Here is NMISS in action.

proc sql;
SELECT COUNT(age), NMISS(age), COUNT(*) - COUNT(age)
FROM mysql.demog
quit;

----------------------------
5         1         1

Don’t get mad, I, intentionally, didn’t alias the columns. So, each part of the SELECT did the following:

COUNT(age)

Counted all columns that did not contain a NULL.

NMISS(age)

Counted all NULL values.

COUNT(*) – COUNT(age)

This part used COUNT(*), which counts all the rows in the table, including NULLs and then subtracts all the records in the age column that are not NULL. It is the equivalent of the NMISS function.

So, why didn’t I alias the columns? Because my first try didn’t work. Here’s a sample that I ran in MySQL (SQL Server does the same thing.

mysql> select count(age) AS 'count(age)' FROM junk;
+------------+
| count(age) |
+------------+
|         21 |
+------------+
1 row in set (0.00 sec)

All is good in the world. Now, let’s try that with SAS, because, as we all know by now, something weird is about to happen.

proc sql;
SELECT count(*) AS 'count(*)'
FROM mysql.demog
quit;

                              __________
                              22
ERROR 22-322: Expecting a name.

Yup, it errors. I guess it doesn’t like aliasing a function. So, what’s the workaround? You use the label command, which, I guess, explains why they have it.

proc sql;
SELECT count(*) label='count(*)'
FROM mysql.demog
quit;

count(*)
--------
       6

Man, SAS, is just a special little snowflake.