31 May

SAS: Indexes

This site needed a photo..

The site needed a photo..

This is a very broad overview of indexes in SAS.  Honestly, for the most part, indexes seem to work like you would expect indexes to work and they aren’t complicated at least in the sense that I’m looking at them.

And, just a reminder, I write these as Study Guides for me, that means that I won’t, usually, go step-by-step in these things.  That will, especially, be the case here.

SAS has a handful of guidelines for creating indexes.  They are:

  • Use fewer indexes to save on storage and update costs.
  • Don’t create indexes for small tables.  In those cases, sequential access, otherwise known as a table scan, is faster.
  • Create indexes on columns that have a large number of distinct values.
  • Use indexes when a relatively small number of rows will be returned.  They specifically state “less than 15%”.

Creating indexes is, almost, straight-forward.  The syntax is

CREATE  INDEX  ON table (column(s))

The only hiccup with indexes is that an index must be named the same as the column, if, you only index a single column. In other words, you can name composite indexes but not a simple index. For example:

CREATE INDEX idx_myjunk ON work.myjunk (number);

ERROR: For a simple index, the index name must be the same as the column name.

On the other hand, here are 3 successfully created indexes.  It’s very, very, very, unlikely that you would create all 3 of these indexes on a table.  However, I haven’t been able to determine if SAS will use a partial composite index, say, if you queried the number1 column.  My guess, based on what I’m reading, and the way I think things work, is that it would not.  But, don’t take that as gospel.  I guess all of that means that I could see you doing it in a specific scenario of some sort.

proc sql;
CREATE TABLE work.myjunk
(
	number int,
	number1 int
);
CREATE INDEX number ON work.myjunk (number);
CREATE UNIQUE INDEX number1 ON work.myjunk(number1);
CREATE INDEX idx_composite ON work.myjunk(number, number1);
DESCRIBE TABLE work.myjunk;
quit;

From the log.

28         DESCRIBE TABLE work.myjunk;
NOTE: SQL table WORK.MYJUNK was created like:

create table WORK.MYJUNK( bufsize=131072 )
  (
   number num,
   number1 num
  );
create index idx_composite on WORK.MYJUNK(number,number1);
create unique index number1 on WORK.MYJUNK(number1);
create index number on WORK.MYJUNK(number);

That is a big values-add on my part.  It’s also OK to create a composite index with the same name as a simple index, however, you cannot name a composite index with the same name as a column in the index.

You can tell if SAS is using an index by using the MSGLEVEL= option. You use it as follows:

options msglevel= i or n;

Yes, it’s a global option and it will stay in effect for the session. There are two setting for message level:

  • N: Displays notes, errors and warnings.  This is the default.
  • I:  Displays additional information regarding index usage, sorts and merges.

You can force SAS to use a specific index or to process indexes in a specific fashion.  To tell SAS to use, or not use an index by using IDXWHERE.  It has two options:

  • IDXWHERE=No: Tells SAS to not use an index and instead to do a sequential scan.
  • IDXWHERE=Yes: Tells SAS to use an index and to not consider a sequential scan.

You use IDXWHERE as follows:

proc sql;
SELECT *
FROM work.myjunk (idxwhere=no)
WHERE number < 5;
quit;

You can force SAS to use a specific index with the IDXNAME= option. You simply name the index and use it the same place you use IDXWHERE as follows:

proc sql;
SELECT *
FROM work.myjunk (idxname=number)
WHERE number < 5;
quit;

And, once again, I’d like to point out that I bring incredible value to the process because the WHERE clause would pretty much guarantee that SAS used that index.

One other note on IDXNAME is that the index you specify must exist, and, it must have its first, or only, column match the condition in the WHERE clause.

SAS indexes are like a clustered index, in a very loose sense, in that they are sorted, in ascending order. However, you cannot create a SAS index in descending order, and unlike a true clustered index, because you can have multiple indexes on a table, the underlying data is not sorted as you would have in a clustered index. Frankly, they aren’t a clustered index, I just wanted to mention them in this article because I think the cool kids would do that.

Finally, you can delete an index as follows:

28         DROP INDEX number, number1 ON myjunk;
NOTE: Index number has been dropped.
NOTE: Index number1 has been dropped.

I figured it would be easier to post the log on this one. As you can see it’s acceptable to drop multiple indexes on a table. You just separate each index name with a comma.

I probably skipped a few things but it looks like, at least at a basic level, that indexes are pretty straight-forward in SAS.

30 May

SAS: Constraint Messages

SAS allows you to create custom messages when a constraint occurs.  There are two options for a constraint:

  • MESSAGE=:  This is a custom message up to 250 characters.
  • MSGTYPE=:  This has two options.  NEWLINE means that it is displayed in addition to the original error message.  USER means that only the custom message is displayed.

Here’s a couple of examples that should make it clear.

proc sql UNDO_POLICY=REQUIRED;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5) Message='***This is my error***' Msgtype=USER
);
INSERT INTO work.myjunk
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: ***This is my error*** 
NOTE: This insert failed while attempting to add data from VALUES clause 1 to the data set.

proc sql UNDO_POLICY=REQUIRED;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5) Message='***This is my error***' Msgtype=NEWLINE
);
INSERT INTO work.myjunk
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: ***This is my error*** Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with integrity
constraint _CK0001_.

NEWLINE is the default option.

29 May

SAS: UNDO_POLICY

This option tells the proc sql statement what to do if an INSERT or UPDATE fails for some reason. Usually this would be a constraint violation but it could also occur because of data type mismatch or probably something else that I’m not thinking about right this second.

This option has 3 settings: REQUIRED, OPTIONAL and NONE

REQUIRED

Proc sql will undo an INSERT / UPDATE if it fails.

This is the default option.

NONE

Skips INSERT / UPDATE records that fail.  In other words, if you have 3 statements, and the second fails, statements one and three will still run.

When this happens a warning is written to the log.

OPTIONAL

In this case, if SAS can undo the statement, it will.  If, for some reason, it cannot then it will not.

There are two cases where UNDO is not reliable, which I don’t really understand, at this time.  For now, it appears they have something to do with SAS/SHARE and SAS/ACCESS.

SAS likens UNDO_POLICY to a rollback in a transaction, which is probably an OK comparison but this isn’t a rollback and SAS doesn’t support the ROLLBACK statement.

Here’s an example:

proc sql UNDO_POLICY=OPTIONAL;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5)
);
INSERT INTO work.myjunk
   VALUES (1,1)
   VALUES (2,2)
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with 
integrity constraint _CK0001_.
NOTE: This insert failed while attempting to add data from VALUES clause 3 to the data set.
NOTE: Deleting the successful inserts before error noted above to restore table to a consistent 
state.

proc sql UNDO_POLICY=NONE;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5)
);
INSERT INTO work.myjunk
   VALUES (1,1)
   VALUES (2,2)
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with 
integrity constraint _CK0001_.
NOTE: This insert failed while attempting to add data from VALUES clause 3 to the data set.
NOTE: 3 rows were inserted into WORK.MYJUNK -- of these 1 row was rejected as an ERROR, leaving 
2 rows that were inserted successfully.

proc sql UNDO_POLICY=REQUIRED;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5)
);
INSERT INTO work.myjunk
   VALUES (1,1)
   VALUES (2,2)
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with 
integrity constraint _CK0001_.
NOTE: This insert failed while attempting to add data from VALUES clause 3 to the data set.
NOTE: Deleting the successful inserts before error noted above to restore table to a consistent 
state.

As you can see they all fail, but, the one where the UNDO_POLICY was set to NONE did not rollback the two inserts that were successful.

28 May

SAS: Constraints

This will be ugly.  I’m looking at the Study Guide and it does a terrible job on constraints.  For instance, it mentions referential integrity (foreign keys) but it does not show an example.  However, it also lists several keywords that would apply to it.  So, do I need to know it for the test?

There are about 6 pages on the guide so I’m going to try and dig out what I think I would need to know.  Since I already know most of this, from other databases, I’ll surely skip useful parts.  And, no way am I doing each database version. This post is only about SAS.

Anyways, if it helps, great,  If it doesn’t, well, you got what you paid for, I guess.

CHECK

A check constraint is similar to a WHERE clause but it’s applied at the column level.  For example,

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) CHECK (name IN ('Billy','Bob','Thornton')),
	number int CHECK (number < 10)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;     Integrity              Where
#    Constraint    Type     Clause
----------------------------------------------------------------
1    _CK0001_      Check    name in ('Billy', 'Bob', 'Thornton')
2    _CK0002_      Check    number

This syntax creates two CHECK constraints:

  • name: It must be one of Billy, Bob or Thornton.
  • number: It must be less than 10.

Because we didn’t name the constraints, SAS assigned a default name to them. I’m not going to list them as they’ll be in the sample code for each constraint. You can also name a constraint through the ALTER TABLE statement or adding the constraint to the end of the CREATE TABLE syntax as follows:

proc sql;
CREATE TABLE work.myjunk
(
	name char(10),
	number int,
	CONSTRAINT mycheck CHECK (number < 10)
);

DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;
     Integrity               Where
#    Constraint     Type     Clause
-----------------------------------------------------------------
1    mycheck        Check    number

I won’t do this every time. This is already going to be a very long, and it’s already disorganized enough, but I’ll try to sprinkle the variations throughout the post so you are comfortable seeing them both ways.

A CHECK constraint can also apply to a different column in the same row.

proc sql;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number1 < number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity              Where
#    Constraint    Type     Clause
------------------------------------------
1    _CK0001_      Check    number1<number

This is precisely the type of thing I would expect SAS to ask you on a test.

NOT NULL

NOT NULL simply says no NULL, or missing (this is SAS) values, in the column. You write the constraint as follows:

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) NOT NULL,
	number int,
	CONSTRAINT myconstraint NOT NULL (number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint      Type        Variables
------------------------------------------
1    _NM0001_        Not Null    name     
2    myconstraint    Not Null    number

I thought you could add this with an ALTER TABLE / MODIFY statement but apparently not.

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) NOT NULL,
	number int
);
ALTER TABLE work.myjunk MODIFY number int NOT NULL;
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type        Variables
----------------------------------------
1    _NM0001_      Not Null    name

To be honest, I was sure it, would work. In MySQL:

mysql> CREATE TABLE work.myjunk
    -> (
    ->  name char(10) NOT NULL,
    ->  number int
    -> );
Query OK, 0 rows affected (0.11 sec)

mysql> ALTER TABLE work.myjunk MODIFY number int NOT NULL;
Query OK, 0 rows affected (0.28 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc work.myjunk;
+--------+----------+------+-----+---------+-------+
| Field  | Type     | Null | Key | Default | Extra |
+--------+----------+------+-----+---------+-------+
| name   | char(10) | NO   |     | NULL    |       |
| number | int(11)  | NO   |     | NULL    |       |
+--------+----------+------+-----+---------+-------+

The Null column shows NO for Nulls so it worked just fine on that side of the house.  I guess the answer is to only define a NOT NULL constraint in the CREATE TABLE statement for SAS.  You should probably also make a mental note of _NM0001_.  The automatic naming policy of SAS uses NM, not NN, for NOT NULL.  Yeah, that’s logical.  I suppose that’s how they keep the trick questions flowing.

UNIQUE

This constraint simply specifies that all values in the column must be unique. The study guide doesn’t mention this but UNIQUE usually allows for a NULL / missing value, but, only one as two would violate the constraint.

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) UNIQUE,
	number int
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type      Variables
--------------------------------------
1    _UN0001_      Unique    name

The constraint seems to work similarly to the NOT NULL constraint in terms of how you define it, however, you may want to apply it to multiple columns as follows:

proc sql;
CREATE TABLE work.myjunk
(
	name char(10),
	number int,
	CONSTRAINT myconstraint unique (number, name)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;
     Integrity
#    Constraint      Type      Variables
------------------------------------------
1    myconstraint    Unique    number name

PRIMARY KEY

SAS defines the PRIMARY KEY as a combination of the UNIQUE and NOT NULL constraints. I’m assuming, and have not verified this, yet, that unlike SQL Server, or INNODB tables in MySQL that it is not clustered. However, I haven’t hit the index part of the Study Guide so SAS may do something like that.

There can only be one primary key on a table. ALL of the following can be used to create a primary key.

proc sql;
CREATE TABLE work.myjunk
(
	name char(10),
	number int PRIMARY KEY
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type           Variables
-------------------------------------------
1    _PK0001_      Primary Key    number

proc sql;
CREATE TABLE work.myjunk
(
name char(10),
number int,
CONSTRAINT mypk PRIMARY KEY (number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

Integrity
# Constraint Type Variables
-------------------------------------------
1 mypk Primary Key number

proc sql;
CREATE TABLE work.myjunk
(
name char(10),
number int,
CONSTRAINT mypk PRIMARY KEY (name,number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type           Variables
---------------------------------------------
1    mypk          Primary Key    name number

The last version specifies that both the name and number fields are part of the primary key and that the restrictions for a primary key apply to both columns. For instance, the combination of the two columns must be unique, but you can repeat a value in a column. For example, this is ok,

name  number
A     1
A     2

But this would not be,

name  number
A     1
A     1

This also applies to a unique constraint that isn’t a primary key.

I’m going to stop now. I find myself getting a bit mixed up on the syntax with all the copy/pasting I’m doing and I fear I’m going to blow it. I think, however, this should give a decent starting point to constraints even if it feels woefully incomplete to me.

27 May

Inserting with a SELECT

Honestly, I don’t think I’ve ever done this outside of a book, but, it’s something that comes up so it’s worth mentioning.  Fortunately, everyone happily works the same way so I’m just going to do this in MySQL.

Suppose we have two identical tables and we want to copy data from one to another. We could use an external tool, like BCP, or mysqldump, or we could just use a INSERT / SELECT as follows;

mysql> DESC myjunk;
+——–+———+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——–+———+——+—–+———+——-+
| name | char(1) | NO | | NULL | |
| number | int(11) | NO | | 0 | |
+——–+———+——+—–+———+——-+
2 rows in set (0.00 sec)

mysql> DESC myjunk1;
+——–+———+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——–+———+——+—–+———+——-+
| name | char(1) | NO | | NULL | |
| number | int(11) | NO | | 0 | |
+——–+———+——+—–+———+——-+
2 rows in set (0.00 sec)

mysql> SELECT * FROM myjunk;
+——+——–+
| name | number |
+——+——–+
| A | 11 |
| B | 22 |
| C | 33 |
+——+——–+
3 rows in set (0.00 sec)

mysql> SELECT * FROM myjunk1;
Empty set (0.00 sec)

mysql> INSERT INTO myjunk1
-> SELECT * FROM myjunk;
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM myjunk1;
+——+——–+
| name | number |
+——+——–+
| A | 11 |
| B | 22 |
| C | 33 |
+——+——–+
3 rows in set (0.00 sec)

This, thankfully, is super clean and for the most part it just works. However, there are a couple of tweaks to it.

mysql> INSERT INTO myjunk1
    -> SELECT number, name FROM myjunk;
ERROR 1406 (22001): Data too long for column 'name' at row 1

This was caused because the Name column is a char(1) so it can’t handle the longer number types.

mysql> ALTER TABLE myjunk1 MODIFY name varchar(10);
Query OK, 0 rows affected (0.17 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> INSERT INTO myjunk1
    -> SELECT number, name FROM myjunk;
ERROR 1366 (HY000): Incorrect integer value: 'A' for column 'number' at row 1

It would have been nice if it had dropped this error first, before I modified the column, but MySQL isn’t know for solid/clean errors. Anyways, just make sure that if you are switching columns around, for some reason, that you don’t bonk the data types.

Also, you must have an equal number of columns. For example, this will not work.

mysql> desc myjunk1;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name   | varchar(10) | YES  |     | NULL    |       |
| number | int(11)     | NO   |     | 0       |       |
| newCol | varchar(10) | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> desc myjunk;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | NO   |     | NULL    |       |
| number | int(11) | NO   |     | 0       |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> INSERT INTO myjunk1
    -> SELECT * FROM myjunk;
ERROR 1136 (21S01): Column count doesn't match value count at row 1

This, correctly, tossed an error in SQL Server, SAS and MySQL.

Good luck, and hope it helps someone, someday.

26 May

CREATE TABLE AS

Creating a table is, more or less, similar among the three variants that I write about.  You create a table and you roll with it.  However, the CTAS statement (CREATE TABLE AS) is handled somewhat differently amongst them.  So, I’m going to hit that one up.

Really, I’m just writing this up because outside of MySQL I always forget this syntax. Also, when you create a table this way, it won’t include

MySQL

Create a copy of a a table with all of its columns included.

mysql> CREATE TABLE myjunk1 AS SELECT * FROM myjunk;
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM myjunk1;
+------+--------+
| name | number |
+------+--------+
| A    |     11 |
| B    |     22 |
| C    |     33 |
+------+--------+
3 rows in set (0.00 sec)

Create a copy of a table, however, do not include any of its columns.

mysql> CREATE TABLE myjunk1 AS SELECT * FROM myjunk WHERE 1 = 2;
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM myjunk1;
Empty set (0.00 sec)

mysql> DESC myjunk1;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | YES  |     | NULL    |       |
| number | int(11) | YES  |     | NULL    |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Also constraints are not included, for example.

mysql> ALTER TABLE myjunk ADD PRIMARY KEY(number);
Query OK, 3 rows affected (0.19 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE myjunk1 AS SELECT * FROM myjunk;
Query OK, 3 rows affected (0.08 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> DESC myjunk1;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | YES  |     | NULL    |       |
| number | int(11) | NO   |     | 0       |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.01 sec)

mysql> desc myjunk;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | YES  |     | NULL    |       |
| number | int(11) | NO   | PRI | 0       |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Note that the primary key is not included in myjunk1.

I could go on but this is to help me remember the syntax.  You can read more here:

http://dev.mysql.com/doc/refman/5.6/en/create-table.html

SQL Server

Conceptually, SQL Server works in a similar fashion to MySQL, except, that they have their own syntax for the process. This won’t work:

CREATE TABLE myjunk1 AS SELECT * FROM myjunk;

Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'SELECT'.

Instead you need do something like this:

SELECT *
INTO myjunk1
FROM myjunk
WHERE number > 22;
SELECT * FROM myjunk1;

Name	Number
C	33

I had a really hard time remembering this syntax at first because I was so used to the CTAS variant.

http://msdn.microsoft.com/en-us/library/ms188029.aspx

SAS

SAS allows the CTAS variant, so I won’t repeat that but it also does something else. When you are creating a copy of a table, but don’t want to include any of it’s data, you can do the following:

proc sql;
CREATE TABLE work.myjunk
(
	name char(1),
	number int
);
INSERT INTO work.myjunk VALUES ('A',1)
                        VALUES ('B',2)
                        VALUES ('C',3);
CREATE TABLE work.myjunk1
LIKE work.myjunk;

DESCRIBE TABLE work.myjunk1;
quit;

From the log.

31         DESCRIBE TABLE work.myjunk1;
NOTE: SQL table WORK.MYJUNK1 was created like:

create table WORK.MYJUNK1( bufsize=131072 )
  (
   name char(1),
   number num
  );

Constraints, more or less, appear to follow the same rules as they do in the other variants.

All Variants – Adding A New Column

You can also create a new column simply by adding it to the result set. Be careful about using apostrophes to name the new column as it might not work.

mysql> CREATE TABLE myjunk1 AS
    -> SELECT name, number, 123 AS newCol
    -> FROM myjunk;
Query OK, 3 rows affected (0.10 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> DESCRIBE myjunk1;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | NO   |     | NULL    |       |
| number | int(11) | NO   |     | 0       |       |
| newCol | int(3)  | NO   |     | 0       |       |
+--------+---------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM myjunk1;
+------+--------+--------+
| name | number | newCol |
+------+--------+--------+
| A    |     11 |    123 |
| B    |     22 |    123 |
| C    |     33 |    123 |
+------+--------+--------+
3 rows in set (0.00 sec)

You can also CAST / CONVERT the data type. For example.

mysql> CREATE TABLE myjunk1 AS
    -> SELECT name, number, CAST(123 AS char) AS newCol
    -> FROM myjunk;
Query OK, 3 rows affected (0.12 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> DESC myjunk1;
+--------+------------+------+-----+---------+-------+
| Field  | Type       | Null | Key | Default | Extra |
+--------+------------+------+-----+---------+-------+
| name   | char(1)    | NO   |     | NULL    |       |
| number | int(11)    | NO   |     | 0       |       |
| newCol | varchar(3) | YES  |     | NULL    |       |
+--------+------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

I’m pretty sure that SAS doesn’t work that way but I’m not ready to write that today as I’m still learning that side of the house.  Another day, another post.

25 May

SAS: INSERT / VALUES

Originally, I’d planned this post to be about using CASE in an UPDATE.  However, strangely, unexpectedly,  it turned out that the big 3 all used the same syntax.  So, this will just be a quick overview of the differences between SAS and MySQL / SQL Server using INSERT.

Here’s some SAS code that creates a table, inserts some data and then updates it.

proc sql;
CREATE TABLE work.myjunk
(
	name CHAR(1),
	number INT
);
INSERT INTO work.myjunk VALUES ('A',1)
                        VALUES ('B',2)
                        VALUES ('C',3);
UPDATE work.myjunk
SET number =
CASE WHEN name = 'A' then 11
     WHEN name = 'B' then 22
	 ELSE 33
	 END;
SELECT * FROM work.myjunk;
quit;

name    number
--------------
A           11
B           22
C           33

There are two things worth noting in this code. First the INT data type. SAS accepts it, 9 others according to the Study Guide which it will then convert to one of the two SAS data types: number and character. In this case, INT is converted to a number.

I, finally, got smart and created a work database for MySQL and work schema for SQL Server.  It should help my code look a bit more consistent, even if work. means one thing in SAS, another in MySQL and still another in SQL Server.

The other thing, which is what led me to write this post, is that SAS uses a VALUES clause before each row in an INSERT. As you’ll see in the MySQL code (it’s the same with SQL Server) we only use a single VALUES clause.

mysql> CREATE TABLE work.myjunk
    -> (
    ->  name char(1),
    ->  number int
    -> );
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO work.myjunk VALUES ('A',1),('B',2),('C',3);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> UPDATE work.myjunk
    -> SET number =
    -> CASE WHEN name = 'A' then 11
    ->      WHEN name = 'B' then 22
    ->   ELSE 33
    ->   END;
Query OK, 3 rows affected (0.02 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> SELECT * FROM work.myjunk;
+------+--------+
| name | number |
+------+--------+
| A    |     11 |
| B    |     22 |
| C    |     33 |
+------+--------+
3 rows in set (0.00 sec)

Hopefully, this makes sense.  You use a single VALUES clause for SQL Server / MySQL and a VALUES clause for each row with SAS.  And, just be aware that you can use a CASE statement in an UPDATE.

24 May

INSERT with a SET

This is something I did not know you could do, but you can use SET for an INSERT statement.  I guess I’m a noob.  Anyways, here is how it works in the three SQL variants that I write about.

SQL Server

INSERT INTO myjunk1
SET id = 1;

Msg 156, Level 15, State 1, Line 2
Incorrect syntax near the keyword 'set'.

I think it’s time I deleted the tables I’ve named junk. Anyways, no surprise that SQL Server doesn’t like it.

MySQL

mysql> INSERT INTO myjunk
    -> SET name = 'E',
    -> number = 6;
Query OK, 1 row affected (0.04 sec)

Hmmm, another junk. Anyways, it works fine here. I’ll give more detail in the next section as I’m really writing this article for SAS.

SAS

proc sql;
CREATE TABLE work.myjunk
(
	id num
);
INSERT INTO myjunk
SET id = 2;
SELECT * 
FROM work.myjunk;
quit;

      id
--------
       2

You can also add multiple values to the table as follows:

proc sql;
CREATE TABLE work.myjunk
(
	id num
);
INSERT INTO myjunk
SET id = 3
SET id = 4
SET id = 5;
SELECT * 
FROM work.myjunk;
quit;

      id
--------
       3
       4
       5

Interestingly, you can recreate a table, albeit this is a temporary table, by running a CREATE TABLE on top of it, which is why there are only 3 values in this table. If you have multiple columns in the table, you treat them exactly like you would an UPDATE statement.

proc sql;
CREATE TABLE work.myjunk
(
	id num,
	id1 num
);
INSERT INTO myjunk
SET id = 3,
   id1 = 13
SET id = 4,
   id1 = 14
SET id = 5,
   id1 = 15;
SELECT * 
FROM work.myjunk;
quit;

      id       id1
------------------
       3        13
       4        14
       5        15

Anyways, that’s pretty much it for INSERT / SET. It’ll probably come up on the SAS Advanced test, assuming I get there, and I have to admit that I’d have blown this one.

23 May

SET Operators: The SAS Special

SAS has a couple of twists worth mentioning in a bit more detail.

ALL

The difference between ALL, and an operator without it, is that ALL returns all records and does not uniquify (my spell check says that isn’t a word, but, we know better) the result set.  This includes duplicate records in either result set that it applies to.

It cannot be used with an OUTER UNION.

CORR / CORRESPONDING

Corr is something I’d never seen before and depending on the operator it behaves differently.

  • EXCEPT, UNION and INTERSECT: It removes any column that does not have the same name.
  • OUTER UNION: It overlays columns with the same name.

Corr will also use an alias when it’s assigned to a column.

Here are some examples of CORR in action.

This is a UNION and CORR removes both the age and the sex columns from the report.

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
UNION CORR
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F';
quit;

Name
--------------
Almers, C     
Bonaventure, T
Eberhardt, S  
Ivan, H       
Johnson, R    
Nunnelly, A   
Oberon, M     
Quigley, M    
Reberson, P   
Takahashi, Y  
Wilcox, E

CORR can be used with ALL. Here, because we use CORR the age and sex columns are removed.

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
UNION ALL CORR
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F';
quit;

Name
--------------
Almers, C     
Bonaventure, T
Eberhardt, S  
Ivan, H       
Johnson, R    
Nunnelly, A   
Oberon, M     
Quigley, M    
Reberson, P   
Takahashi, Y  
Wilcox, E  
Almers, C     
Bonaventure, T
Eberhardt, S  
Ivan, H       
Johnson, R    
Nunnelly, A   
Oberon, M     
Quigley, M    
Reberson, P   
Takahashi, Y  
Wilcox, E

Now, so it’s clear, here is an OUTER JOIN with CORR, written as CORRESPONDING.

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'F'
OUTER UNION CORRESPONDING
SELECT name, sex
FROM mysql.admit
WHERE sex = 'F';
quit;

Name                 Age  Sex
-----------------------------
Almers, C             34     
Bonaventure, T        31     
Johnson, R            43     
Reberson, P           32     
Eberhardt, S          49     
Nunnelly, A           44     
Oberon, M             28     
Quigley, M            40     
Takahashi, Y          43     
Ivan, H               22     
Wilcox, E             41     
Almers, C              .  F  
Bonaventure, T         .  F  
Johnson, R             .  F  
Reberson, P            .  F  
Eberhardt, S           .  F  
Nunnelly, A            .  F  
Oberon, M              .  F  
Quigley, M             .  F  
Takahashi, Y           .  F  
Ivan, H                .  F  
Wilcox, E              .  F

Here is a link that goes into a lot more detail on the keywords and SET operators from the SAS support site.

http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/viewer.htm#a001361224.htm

22 May

SET Operators: Intersect

INTERSECT, is essentially the reverse of EXCEPT. What it does is return unique records that match in the two tables that intersect. For example,

SELECT name, sex, age
FROM junk
INTERSECT
SELECT name, sex, age
FROM allMale

name	        sex	age
Cameron, L	M	47
Derber, B	M	25
Jones, M	M	29
King, E	        M	35
LaMance, K	M	51
Murray, W	M	27
Peterson, V	M	30
Pitts, D	M	34
Underwood, K	M	60
Warren, C	M	54

In reality it’s simply an inner join. In fact, that’s one way we would write this query in MySQL.

mysql> SELECT j.name, j.sex, j.age
    -> FROM junk j
    -> INNER JOIN allMale a
    -> USING (name, sex, age);
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

That can also be written a couple of other ways.

mysql> SELECT j.name, j.sex, j.age
    -> FROM junk j, allMale a
    -> WHERE j.name = a.name AND j.sex = a.sex AND j.age = a.age;
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

Or,

mysql> SELECT DISTINCT j.name, j.sex, j.age
    -> FROM junk j
    -> WHERE EXISTS (SELECT name FROM allMale WHERE j.name = name AND j.sex = sex 
       and j.age = age);
+--------------+------+------+
| name         | sex  | age  |
+--------------+------+------+
| Murray, W    | M    |   27 |
| LaMance, K   | M    |   51 |
| Jones, M     | M    |   29 |
| King, E      | M    |   35 |
| Pitts, D     | M    |   34 |
| Peterson, V  | M    |   30 |
| Cameron, L   | M    |   47 |
| Underwood, K | M    |   60 |
| Derber, B    | M    |   25 |
| Warren, C    | M    |   54 |
+--------------+------+------+
10 rows in set (0.00 sec)

One other note, because SAS, always is just so cute. You can write an INTERSECT with SAS as follows:

proc sql;
SELECT name, age
FROM mysql.admit
WHERE sex = 'M'
INTERSECT ALL
SELECT name, age
FROM mysql.admit;
quit;

Name                 Age
------------------------
Cameron, L            47
Derber, B             25
Jones, M              29
King, E               35
LaMance, K            51
Murray, W             27
Peterson, V           30
Pitts, D              34
Underwood, K          60
Warren, C             54

“We’re developing a new citizenry. One that will be very selective about cereals and automobiles, but won’t be able to think.”
Rod Serling

SAS is particularly vague about what ALL does in the case of INTERSECT but it’s essentially the same as UNION and UNION ALL. In other words, it returns all records that match both tables in the INTERSECT rather than only unique records which is what a simple INTERSECT would do.

SQL Server does not support INTERSECT ALL and MySQL does not support INTERSECT at all.

I hope this is clear to anyone reading this. I always found SET operators really clean. Unfortunately, we’re about to enter the Twilight Zone as SAS has one that is unique to them, but then, that’s “situation normal” for SAS.