19 May

SET Operators: UNION

The easiest set operator to deal with is a UNION.  It’s basically a data set appended to a data set.  It has a couple of minor quirks, column naming, ordering, and then, of course, SAS picked the path least chosen, but for the most part, anytime you see a UNION, or a UNION ALL, it’s going to work the same way.

The code for this example, except where noted, is in MySQL.

Here is a very simple example of a UNION.

mysql> SELECT 1,2,3
    -> UNION
    -> SELECT 4,5,6;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+---+---+---+
2 rows in set (0.00 sec)

As you can see the second SELECT is added to the first set. Typically, the data types do not have to match, as automatic type conversion will take place. Except, of course, if you are working with SAS, where the following will happen:

proc sql;
SELECT name, age
FROM mysql.admit
UNION
SELECT age, name
FROM mysql.admit;
quit;

ERROR: Column 1 from the first contributor of UNION is not the same type as its counterpart from the second.
ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.

A straight UNION will remove duplicates similar to what a DISTINCT does in a SELECT statement.

mysql> SELECT 1,2,3
    -> UNION
    -> SELECT 1,2,3
    -> UNION
    -> SELECT 3,4,5;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 3 | 4 | 5 |
+---+---+---+
2 rows in set (0.00 sec)

If you, instead, write the query as a UNION ALL it will include all rows, including any duplicates. This applies in SAS, although, they approach the ALL keyword slightly differently but I’ll deal with that in a later article.

mysql> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1,2,3;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 1 | 2 | 3 |
| 1 | 2 | 3 |
+---+---+---+
3 rows in set (0.00 sec)

You can also mix a UNION and a UNION ALL.

mysql> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1,2,3
    -> UNION
    -> SELECT 1,2,3;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
1 row in set (0.00 sec)

Each UNION only applies to the SELECT statements that it separates.

You can also alias a UNION, however, the alias must be applied to the first SELECT statement’s columns. You cannot apply it to later columns, well, you can, but as you can see it won’t work. For example:

mysql> SELECT 1 as one, 2 as two, 3 as three
    -> UNION
    -> Select 4,5,6;
+-----+-----+-------+
| one | two | three |
+-----+-----+-------+
|   1 |   2 |     3 |
|   4 |   5 |     6 |
+-----+-----+-------+
2 rows in set (0.00 sec)

mysql> SELECT 1,2,3
    -> UNION ALL
    -> SELECT 1 as one, 2 as two, 3 as three;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
| 1 | 2 | 3 |
+---+---+---+
2 rows in set (0.00 sec)

ORDER BY statements must be applied at the end of the UNION. For example:

mysql> SELECT name FROM junk ORDER BY name
    -> UNION
    -> SELECT name FROM junk;
ERROR 1221 (HY000): Incorrect usage of UNION and ORDER BY
mysql>

mysql> SELECT name FROM junk
    -> UNION
    -> SELECT name FROM junk ORDER BY name;
+----------------+
| name           |
+----------------+
| Almers, C      |
| Bonaventure, T |
| Cameron, L     |
| Derber, B      |
| Eberhardt, S   |
| Ivan, H        |
| Johnson, R     |
| Jones, M       |
| King, E        |
| LaMance, K     |
| Murray, W      |
| Nunnelly, A    |
| Oberon, M      |
| Peterson, V    |
| Pitts, D       |
| Quigley, M     |
| Reberson, P    |
| Takahashi, Y   |
| Underwood, K   |
| Warren, C      |
| Wilcox, E      |
+----------------+
21 rows in set (0.04 sec)

And now, it gets a bit funky, because SAS joined the party.

This is viable in SAS:

proc sql;
SELECT name, age
FROM mysql.admit
UNION
SELECT name
FROM mysql.admit;
quit;

Name                 Age
------------------------
Almers, C              .
Almers, C             34
Bonaventure, T         .
Bonaventure, T        31
Cameron, L             .
Cameron, L            47
Derber, B              .
Derber, B             25
Eberhardt, S           .
Eberhardt, S          49
Ivan, H                .
Ivan, H               22
Johnson, R             .
Johnson, R            43

MySQL and SQL Server will both error.

mysql> SELECT 1,2
-> UNION
-> SELECT 1,2,3;
ERROR 1222 (21000): The used SELECT statements have a different number of column

In other words, SQL Server, and MySQL, expect that a UNION operator will have the same number of columns.  SAS, doesn’t care, and simply NULLs, or in SAS terminology, sets those fields to missing.

SET operators, all things considered, are really easy to use, assuming SAS doesn’t get involved and even there, as long as you stay with a UNION or UNION ALL, it’s sort of, maybe, kind of, close.

As far as the SAS Advanced test goes, I think I’ll keep in mind that SAS allows a different number of columns, and, it doesn’t allow different data types in a column. It also has other quirks but I’ll deal with those in later posts.

Leave a Reply

Your email address will not be published. Required fields are marked *