01 Jan

Sticky: The SAS Section

I’ve had the SAS Base certification study guide for more than a year but I didn’t go “all the way” with it. It’s not a hard language but it is abstract and I’m not certain I’ll ever use it.  However, I’m burned out on SSIS writing, and by extension the next 2 projects on the list (SSAS and SSRS) so I’m going to slay the SAS Base beast, as a break.  Since the “write all kinds of blog posts” worked so well for SSIS that’s what I’m going to do here.  If everything goes well, I’ll be in and out in less than 2-weeks.

These posts should be shorter than what I did in the SSIS section.  SAS, at least at this level, doesn’t need nearly as much step-by-step explanation as SSIS.  I can imagine doubling my post count, yet, reducing my word count  by half.  At least I hope that is how it works.

Also, because I’m writing this as I go through the Study Guide, it will tend to follow that sequence.  As such I may skip things that you feel deserve more detail or even repeat some topics.

Finally, this is test preparation, so I’m not attempting to delve into every detail of a topic.  If it helps you, great, I hope it does, but, if it’s not as deep as you’d like, there are other options available.

04 Jun

SAS: Macro Variables

I’m going to commit the cardinal sin and admit that my heart isn’t hugely into writing this.  I’m plowing the SAS Advanced Study Guide, I’m at page 300, and I’m exactly a third of the way through it, and that is the easy part.  And the truth, the hard truth, is that I should be plowing SSRS and SSAS over this.  But, I’m in, and so we continue the process.

Macro variables allow you to dynamically write code in your programs.  For instance, you can use them to define a WHERE clause in a PROC SQL statement or even define the name of a data set.  SAS likes to demo them for writing footers and titles.

There are two kinds of macro variables: automatic and user-defined.  Automatic are system defined and do things like return the date, time, SAS release or other system information.

In order to use a macro variable you define it with a %LET statement and then you use it by preceding it with a & symbol (this is actually more involved, but that’s in another article).

  • They are character data, always.
  • Because they are characters, expressions, too, are characters, and, as such, are not evaluated.
  • Case is retained.
  • Quotation marks are retained.
  • Leading and trailing blanks are removed.

So, when you defined a macro variable, these are all perfectly acceptable.

LET %myvar = 123; /* 123 */
LET %myvar = abraham lincoln; /* abraham lincoln */
LET %myvar = Abraham Lincoln; /* Abraham Lincoln */
LET %myvar = Abraham's horse; /* Abraham's horse */
LET %myvar = Abraham + Horse * Cow; /* Abraham + Horse * Cow */
LET %myvar = 1 + 2 + 3 / 4; /* 1 + 2 + 3 / 4*/

You can also define a macro variable with a macro variable.

LET %var = "My Variable"; /* "My Variable" */
LET %var1 = &var; /* "My Variable" */

LET %var = var1; /* var1 */
LET &var = Billy Bob; /* var1 will equal Billy Bob */

So, you can not only assign a variable the contents in a variable, you can create a variable based on the value in a variable.

Here is a relatively simple example (that somehow vanished from WordPress) of macro variables.

%LET mysex = F;

proc sql outobs=5;
SELECT name, age, sex, weight, "&sysdate"
FROM mysql.admit
WHERE sex = '&mysex";
quit;

Name                 Age  Sex    Weight  
------------------------------------------------
Almers, C             34  F         152  20APR13
Bonaventure, T        31  F         123  20APR13
Johnson, R            43  F         137  20APR13
Reberson, P           32  F         151  20APR13
Eberhardt, S          49  F         172  20APR13

Here is what the code does. It creates a macro variable name mysex and sets its values to F. In the SELECT statement, the automatic variable &sysdate is used to generate the column with all the 20APR13 values. Then, finally, the WHERE clause is set to WHERE sex = F.

One thing to note, the examples in the book all show a single-quote, macro variable, double-quote in the code samples. Neither that, or single-quotes worked for me. I had to use double-quotes to get macro variables to work. I can’t guarantee that this is correct. It might be some quirk of where I’m writing this code that I don’t know about. Anyways, my examples will all use double-quotes.

I’m going to stop at this point. The second half of this second lists a number of options and they make a good post by themselves. Fortunately, most of them should be familiar from SAS BASE.

At page 315, only 591 more to go. Sheesh!

Oh, and the best part of this post, WordPress ate my sample code the first time I posted it.  Yee haa!

03 Jun

SAS: Last Post For A While

This will, probably, be the last SAS post, for a bit, maybe a long bit.  I’m going to back away from the SAS certification.

I’m doing this for the following reasons:

  1. I don’t see a lot of SAS programming in the Bay Area.  It’s out there, even here, but there isn’t a lot.
  2. The work I do see often comes with the caveats of wanting more SAS experience than I’m going to have, and, they often want experience in an industry.  The industry is almost always health care  / clinical trials.
  3. The SAS jobs almost always, at least around here, come attached with a statistics requirement.  I could probably get around that but it would force me into a direction that I’m not quite ready to pursue.

So, I’m going to keep SAS on the back burner, maybe finish it off in the future, but for now, it’ll be something else.

So, if by some chance any of this was helping someone learn SAS, I wish you luck, but for now I’ve going to redirect my efforts.

03 Jun

SAS: PROC SQL, A Few Options

PROC SQL has a handful of options which apparently we should know so I’m going to roll through them.

INOBS vs. OUTOBS

These limit your source data or your output.  One works when reading data and the other works when outputting data.  I’ll let you guess which one is which.  You use them as follows:

proc sql inobs=5 outobs=10;
	SELECT *
	mysql.admit;
quit;

                                                                   Act
ID    Name            Sex       Age      Date    Height    Weight  Level      Fee
---------------------------------------------------------------------------------
2458  Murray, W       M          27         1        72       168  HIGH     85.20
2462  Almers, C       F          34         3        66       152  HIGH    124.80
2501  Bonaventure, T  F          31        17        61       123  LOW     149.75
2523  Johnson, R      F          43        31        63       137  MOD     149.75
2539  LaMance, K      M          51         4        71       158  LOW     124.80

This is another of those award-winning examples that I give. This query would be the same, regardless of which one I set, however, it is useful, I think, to show that both can be used in a PROC SQL statement. This is, exactly, the kind of thing SAS will test for.

Here’s a more interesting example:

proc sql inobs=3 outobs=4;
	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex = 'M'
	UNION
	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex='F';
quit;

 Name                 Age  Sex
 -----------------------------
 Almers, C             34  F  
 Bonaventure, T        31  F  
 Johnson, R            43  F  
 Jones, M              29  M

What happened is that SAS read 3 rows from the two union statements, totaling 6 records, however because the output was limited to only 4 records that’s all you get. Just to prove that I’ll change the outobs to 6.

proc sql inobs=3 outobs=6;
	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex = 'M'
	UNION
	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex='F';
quit;

Name                 Age  Sex
-----------------------------
Almers, C             34  F  
Bonaventure, T        31  F  
Johnson, R            43  F  
Jones, M              29  M  
LaMance, K            51  M  
Murray, W             27  M

PROMPT vs. NOPROMPT

I'm looking for the code, that should be here, but isn't, because SAS flaked.

I’m looking for the code, that should be here, but isn’t.

If you add the PROMPT option to a PROC SQL statement, it should, because it doesn’t work for me, prompt you to verify that you are limiting the records. In other words, it will allow you to bypass the limits you specify.  The default, which worked really well on my system is NOPROMPT.

DOUBLE vs NODOUBLE

All this does is fluff up your output by double-spacing it.  The default is NODOUBLE and it only works in LIST output.

NUMBER vs NONUMBER

This is another of those exciting options.  It adds row numbers to your queries.  The default, is NONUMBER and just because I haven’t done an example in 100 words or so, here you go. Oh, and I gave you bonus content, free, see how generous I am, and added a DOUBLE to it.

proc sql outobs=5 double number;
	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex = 'M';
quit;

   Row  Name                 Age  Sex
-------------------------------------
     1  Murray, W             27  M  

     2  LaMance, K            51  M  

     3  Jones, M              29  M  

     4  King, E               35  M  

     5  Pitts, D              34  M

STIMER vs NOSTIMER

The default is NOSTIMER.  All this does is write query processing time to the log. It must be enabled as a system option.

It looks something like this,

NOTE: SQL Statement used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds

I tried, but was unable to, turn it off.

FLOW

FLOW is another one of those, or these, things that impact only list output. Here’s to the good examples that I’m not going to be able to provide.

Let’s look at each option.

NOFLOW, which is the default, means that if you generate a result, that is longer than the page width, that it overflows into the next line.

FLOW = x, means that a columns maximum width is x. In other words, if x=15, that a column will wrap when 15 characters are in a column.

FLOW = x y, means that a column will have a width between x and y. SAS, specifically uses the word, BETWEEN, so I don’t know if they really mean inclusive of x and y, or greater than and less than. For example, FLOW = 10, 15 means that a columns length will be between 10 and 15 characters.

FLOW, means, FLOW = 12 200.

Essentially FLOW is telling SAS to not flow into the next line, and what to do when that might happen.

I really need to get some samples for this but I probably won’t.

You can reset options by using the RESET command.  It works as follows:

proc sql double outobs=5;
	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex = 'F';

	reset outobs= nodouble;

	SELECT name, age, sex
	FROM mysql.admit
	WHERE sex = 'F';
quit;

Name                 Age  Sex
-----------------------------
Almers, C             34  F  

Bonaventure, T        31  F  

Johnson, R            43  F  

Reberson, P           32  F  

Eberhardt, S          49  F  

Name                 Age  Sex
-----------------------------
Almers, C             34  F  
Bonaventure, T        31  F  
Johnson, R            43  F  
Reberson, P           32  F  
Eberhardt, S          49  F  
Nunnelly, A           44  F  
Oberon, M             28  F  
Quigley, M            40  F  
Takahashi, Y          43  F  
Ivan, H               22  F  
Wilcox, E             41  F

I almost skipped this one but I’m glad I didn’t. I never would have expected that you reset the outobs as outobs=. Of course, if you wanted to you could simply invoke a second PROC SQL statement, with the options you wish, but RESET allows you to do it within a single PROC SQL, which probably makes things run a bit faster and keeps the code a bit cleaner.

Finally, you can view “system” option settings with the PROC OPTIONS statement. Here’s a quick example. Note the word system.

proc options option=number value;
run;

In the log,

Option Value Information For SAS Option NUMBER
    Option Value: NUMBER
    Option Scope: Default 
    How option value set:  Shipped Default

If you don’t include the values keyword this is all you get,

proc options option=number;
run;

 NUMBER            Print page number on each page of SAS output

I think two things. One, this post went longer than I had planned. I say that because I know anyone reading something like this wants to hear me bitch about the writing I’m doing. Two, I think I’m just about done with PROC SQL.

02 Jun

SAS: PROC SQL, Dictionary Tables And sashelp

There are, a lot, of dictionary tables. Here’s three queries that list a few of them.

proc sql;
	SELECT *
	FROM dictionary.options;
	SELECT *
	FROM dictionary.tables;
	SELECT *
	FROM dictionary.extfiles;
quit;

You don’t want to see the results, it’s really long, but the queries would list the current system options, table list and the currently assigned filerefs. You can see what the tables look like with the following queries

DESCRIBE TABLE dictionary.options;

create table DICTIONARY.OPTIONS
  (
   optname char(32) label='Option Name',
   opttype char(8) label='Option type',
   setting char(1024) label='Option Setting',
   optdesc char(160) label='Option Description',
   level char(8) label='Option Location',
   group char(32) label='Option Group'
  );

You can also access the same information with the following code.

proc print data=sashelp.voption;
run;

Note that this one is voption, because it’s a view, while the other variant is options. +1 for SAS consistency with the “s”. The result is also really long so I’ll spare you that one also.

The link below has more information.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002300185.htm

01 Jun

SAS: Views

This post will be similar to the earlier one on indexes.  Views look fairly clean in SAS and, for the most part, I think I know what is going on here, so I’m only going to hit parts where I feel some doubt, however small.  As such, this won’t be the most useful post for someone, who isn’t me, to read.

Creating a view in SAS is pretty much like every other view you’ve created.

libname mysql '....';

ods listing;

proc sql;
CREATE VIEW myview AS
	SELECT name, age, sex, height, weight
        FROM mysql.admit;
quit;

proc sql outobs=5;
SELECT * FROM myview;
quit;

Name                 Age  Sex    Height    Weight
-------------------------------------------------
Murray, W             27  M          72       168
Almers, C             34  F          66       152
Bonaventure, T        31  F          61       123
Johnson, R            43  F          63       137
LaMance, K            51  M          71       158

You can display the definition of a view by using DESCRIBE VIEW which will write the view’s definition to the log.

proc sql;
DESCRIBE VIEW myview;
quit;

28         DESCRIBE VIEW myview;
NOTE: SQL view WORK.MYVIEW is defined as:

        select name, age, sex, height, weight
          from MYSQL.ADMIT;

SAS offers up a handful of suggestions on how, and when to use a view.

  • They recommend against using an ORDER BY (you can do it) because other users of the view may want to sort it differently.
  • If the data is reused, a lot, they recommend creating a table instead of a view.
  • If the table structure changes a view can become unusable.

You may have noticed that I did not use a LIBREF in the view examples I created earlier.  SAS, actually, recommends that you do that if the view resides in the same resides in the same library as the table.  In my case, they’re both in WORK.

You can assign a LIBREF in the view and not use a separate LIBREF statement.  For example:

proc sql;
CREATE VIEW myview AS
	SELECT name, age, sex, height, weight
    FROM mysql.admit
	USING LIBNAME mysql '....';
quit;

I think just knowing that, that can be done, would be enough for the test.

You can, just like in a regular database, UPDATE, DELETE or INSERT into a view, in the right situations. Per SAS,

  • You can only update a single table in a view.  You cannot update a query that has a join or a subquery.
  • You cannot update a derived column.
  • You can update a view with a WHERE clause.  The book doesn’t state this but I would assume that inserts would also be restricted by the where clause, which will act like a constraint in a view.
  • You cannot update a view that has a HAVING, ORDER BY or GROUP BY clause.

You get rid of a view the old-fashioned way.

25         DROP VIEW myview;
NOTE: View WORK.MYVIEW has been dropped.

One last note, you cannot join a view.

I’m kind of scared when it comes to views because the book’s chapter on them is really short. I’m afraid that I’m missing things. At the same time, this book has a different feel to it compared to the SAS Base exam. Base felt really petty, even in the book’s sample questions, but I get the sense that the scope here is large enough that they “may” not hammer as much minutiae on this exam.

I guess I’ll have to take it to find out.

31 May

SAS: Indexes

This site needed a photo..

The site needed a photo..

This is a very broad overview of indexes in SAS.  Honestly, for the most part, indexes seem to work like you would expect indexes to work and they aren’t complicated at least in the sense that I’m looking at them.

And, just a reminder, I write these as Study Guides for me, that means that I won’t, usually, go step-by-step in these things.  That will, especially, be the case here.

SAS has a handful of guidelines for creating indexes.  They are:

  • Use fewer indexes to save on storage and update costs.
  • Don’t create indexes for small tables.  In those cases, sequential access, otherwise known as a table scan, is faster.
  • Create indexes on columns that have a large number of distinct values.
  • Use indexes when a relatively small number of rows will be returned.  They specifically state “less than 15%”.

Creating indexes is, almost, straight-forward.  The syntax is

CREATE  INDEX  ON table (column(s))

The only hiccup with indexes is that an index must be named the same as the column, if, you only index a single column. In other words, you can name composite indexes but not a simple index. For example:

CREATE INDEX idx_myjunk ON work.myjunk (number);

ERROR: For a simple index, the index name must be the same as the column name.

On the other hand, here are 3 successfully created indexes.  It’s very, very, very, unlikely that you would create all 3 of these indexes on a table.  However, I haven’t been able to determine if SAS will use a partial composite index, say, if you queried the number1 column.  My guess, based on what I’m reading, and the way I think things work, is that it would not.  But, don’t take that as gospel.  I guess all of that means that I could see you doing it in a specific scenario of some sort.

proc sql;
CREATE TABLE work.myjunk
(
	number int,
	number1 int
);
CREATE INDEX number ON work.myjunk (number);
CREATE UNIQUE INDEX number1 ON work.myjunk(number1);
CREATE INDEX idx_composite ON work.myjunk(number, number1);
DESCRIBE TABLE work.myjunk;
quit;

From the log.

28         DESCRIBE TABLE work.myjunk;
NOTE: SQL table WORK.MYJUNK was created like:

create table WORK.MYJUNK( bufsize=131072 )
  (
   number num,
   number1 num
  );
create index idx_composite on WORK.MYJUNK(number,number1);
create unique index number1 on WORK.MYJUNK(number1);
create index number on WORK.MYJUNK(number);

That is a big values-add on my part.  It’s also OK to create a composite index with the same name as a simple index, however, you cannot name a composite index with the same name as a column in the index.

You can tell if SAS is using an index by using the MSGLEVEL= option. You use it as follows:

options msglevel= i or n;

Yes, it’s a global option and it will stay in effect for the session. There are two setting for message level:

  • N: Displays notes, errors and warnings.  This is the default.
  • I:  Displays additional information regarding index usage, sorts and merges.

You can force SAS to use a specific index or to process indexes in a specific fashion.  To tell SAS to use, or not use an index by using IDXWHERE.  It has two options:

  • IDXWHERE=No: Tells SAS to not use an index and instead to do a sequential scan.
  • IDXWHERE=Yes: Tells SAS to use an index and to not consider a sequential scan.

You use IDXWHERE as follows:

proc sql;
SELECT *
FROM work.myjunk (idxwhere=no)
WHERE number < 5;
quit;

You can force SAS to use a specific index with the IDXNAME= option. You simply name the index and use it the same place you use IDXWHERE as follows:

proc sql;
SELECT *
FROM work.myjunk (idxname=number)
WHERE number < 5;
quit;

And, once again, I’d like to point out that I bring incredible value to the process because the WHERE clause would pretty much guarantee that SAS used that index.

One other note on IDXNAME is that the index you specify must exist, and, it must have its first, or only, column match the condition in the WHERE clause.

SAS indexes are like a clustered index, in a very loose sense, in that they are sorted, in ascending order. However, you cannot create a SAS index in descending order, and unlike a true clustered index, because you can have multiple indexes on a table, the underlying data is not sorted as you would have in a clustered index. Frankly, they aren’t a clustered index, I just wanted to mention them in this article because I think the cool kids would do that.

Finally, you can delete an index as follows:

28         DROP INDEX number, number1 ON myjunk;
NOTE: Index number has been dropped.
NOTE: Index number1 has been dropped.

I figured it would be easier to post the log on this one. As you can see it’s acceptable to drop multiple indexes on a table. You just separate each index name with a comma.

I probably skipped a few things but it looks like, at least at a basic level, that indexes are pretty straight-forward in SAS.

30 May

SAS: Constraint Messages

SAS allows you to create custom messages when a constraint occurs.  There are two options for a constraint:

  • MESSAGE=:  This is a custom message up to 250 characters.
  • MSGTYPE=:  This has two options.  NEWLINE means that it is displayed in addition to the original error message.  USER means that only the custom message is displayed.

Here’s a couple of examples that should make it clear.

proc sql UNDO_POLICY=REQUIRED;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5) Message='***This is my error***' Msgtype=USER
);
INSERT INTO work.myjunk
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: ***This is my error*** 
NOTE: This insert failed while attempting to add data from VALUES clause 1 to the data set.

proc sql UNDO_POLICY=REQUIRED;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5) Message='***This is my error***' Msgtype=NEWLINE
);
INSERT INTO work.myjunk
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: ***This is my error*** Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with integrity
constraint _CK0001_.

NEWLINE is the default option.

29 May

SAS: UNDO_POLICY

This option tells the proc sql statement what to do if an INSERT or UPDATE fails for some reason. Usually this would be a constraint violation but it could also occur because of data type mismatch or probably something else that I’m not thinking about right this second.

This option has 3 settings: REQUIRED, OPTIONAL and NONE

REQUIRED

Proc sql will undo an INSERT / UPDATE if it fails.

This is the default option.

NONE

Skips INSERT / UPDATE records that fail.  In other words, if you have 3 statements, and the second fails, statements one and three will still run.

When this happens a warning is written to the log.

OPTIONAL

In this case, if SAS can undo the statement, it will.  If, for some reason, it cannot then it will not.

There are two cases where UNDO is not reliable, which I don’t really understand, at this time.  For now, it appears they have something to do with SAS/SHARE and SAS/ACCESS.

SAS likens UNDO_POLICY to a rollback in a transaction, which is probably an OK comparison but this isn’t a rollback and SAS doesn’t support the ROLLBACK statement.

Here’s an example:

proc sql UNDO_POLICY=OPTIONAL;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5)
);
INSERT INTO work.myjunk
   VALUES (1,1)
   VALUES (2,2)
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with 
integrity constraint _CK0001_.
NOTE: This insert failed while attempting to add data from VALUES clause 3 to the data set.
NOTE: Deleting the successful inserts before error noted above to restore table to a consistent 
state.

proc sql UNDO_POLICY=NONE;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5)
);
INSERT INTO work.myjunk
   VALUES (1,1)
   VALUES (2,2)
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with 
integrity constraint _CK0001_.
NOTE: This insert failed while attempting to add data from VALUES clause 3 to the data set.
NOTE: 3 rows were inserted into WORK.MYJUNK -- of these 1 row was rejected as an ERROR, leaving 
2 rows that were inserted successfully.

proc sql UNDO_POLICY=REQUIRED;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number < 5)
);
INSERT INTO work.myjunk
   VALUES (1,1)
   VALUES (2,2)
   VALUES (6,6);
SELECT * FROM work.myjunk;
quit;

ERROR: Add/Update failed for data set WORK.MYJUNK because data value(s) do not comply with 
integrity constraint _CK0001_.
NOTE: This insert failed while attempting to add data from VALUES clause 3 to the data set.
NOTE: Deleting the successful inserts before error noted above to restore table to a consistent 
state.

As you can see they all fail, but, the one where the UNDO_POLICY was set to NONE did not rollback the two inserts that were successful.

28 May

SAS: Constraints

This will be ugly.  I’m looking at the Study Guide and it does a terrible job on constraints.  For instance, it mentions referential integrity (foreign keys) but it does not show an example.  However, it also lists several keywords that would apply to it.  So, do I need to know it for the test?

There are about 6 pages on the guide so I’m going to try and dig out what I think I would need to know.  Since I already know most of this, from other databases, I’ll surely skip useful parts.  And, no way am I doing each database version. This post is only about SAS.

Anyways, if it helps, great,  If it doesn’t, well, you got what you paid for, I guess.

CHECK

A check constraint is similar to a WHERE clause but it’s applied at the column level.  For example,

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) CHECK (name IN ('Billy','Bob','Thornton')),
	number int CHECK (number < 10)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;     Integrity              Where
#    Constraint    Type     Clause
----------------------------------------------------------------
1    _CK0001_      Check    name in ('Billy', 'Bob', 'Thornton')
2    _CK0002_      Check    number

This syntax creates two CHECK constraints:

  • name: It must be one of Billy, Bob or Thornton.
  • number: It must be less than 10.

Because we didn’t name the constraints, SAS assigned a default name to them. I’m not going to list them as they’ll be in the sample code for each constraint. You can also name a constraint through the ALTER TABLE statement or adding the constraint to the end of the CREATE TABLE syntax as follows:

proc sql;
CREATE TABLE work.myjunk
(
	name char(10),
	number int,
	CONSTRAINT mycheck CHECK (number < 10)
);

DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;
     Integrity               Where
#    Constraint     Type     Clause
-----------------------------------------------------------------
1    mycheck        Check    number

I won’t do this every time. This is already going to be a very long, and it’s already disorganized enough, but I’ll try to sprinkle the variations throughout the post so you are comfortable seeing them both ways.

A CHECK constraint can also apply to a different column in the same row.

proc sql;
CREATE TABLE work.myjunk
(
	number int,
	number1 int CHECK (number1 < number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity              Where
#    Constraint    Type     Clause
------------------------------------------
1    _CK0001_      Check    number1<number

This is precisely the type of thing I would expect SAS to ask you on a test.

NOT NULL

NOT NULL simply says no NULL, or missing (this is SAS) values, in the column. You write the constraint as follows:

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) NOT NULL,
	number int,
	CONSTRAINT myconstraint NOT NULL (number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint      Type        Variables
------------------------------------------
1    _NM0001_        Not Null    name     
2    myconstraint    Not Null    number

I thought you could add this with an ALTER TABLE / MODIFY statement but apparently not.

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) NOT NULL,
	number int
);
ALTER TABLE work.myjunk MODIFY number int NOT NULL;
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type        Variables
----------------------------------------
1    _NM0001_      Not Null    name

To be honest, I was sure it, would work. In MySQL:

mysql> CREATE TABLE work.myjunk
    -> (
    ->  name char(10) NOT NULL,
    ->  number int
    -> );
Query OK, 0 rows affected (0.11 sec)

mysql> ALTER TABLE work.myjunk MODIFY number int NOT NULL;
Query OK, 0 rows affected (0.28 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc work.myjunk;
+--------+----------+------+-----+---------+-------+
| Field  | Type     | Null | Key | Default | Extra |
+--------+----------+------+-----+---------+-------+
| name   | char(10) | NO   |     | NULL    |       |
| number | int(11)  | NO   |     | NULL    |       |
+--------+----------+------+-----+---------+-------+

The Null column shows NO for Nulls so it worked just fine on that side of the house.  I guess the answer is to only define a NOT NULL constraint in the CREATE TABLE statement for SAS.  You should probably also make a mental note of _NM0001_.  The automatic naming policy of SAS uses NM, not NN, for NOT NULL.  Yeah, that’s logical.  I suppose that’s how they keep the trick questions flowing.

UNIQUE

This constraint simply specifies that all values in the column must be unique. The study guide doesn’t mention this but UNIQUE usually allows for a NULL / missing value, but, only one as two would violate the constraint.

proc sql;
CREATE TABLE work.myjunk
(
	name char(10) UNIQUE,
	number int
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type      Variables
--------------------------------------
1    _UN0001_      Unique    name

The constraint seems to work similarly to the NOT NULL constraint in terms of how you define it, however, you may want to apply it to multiple columns as follows:

proc sql;
CREATE TABLE work.myjunk
(
	name char(10),
	number int,
	CONSTRAINT myconstraint unique (number, name)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;
     Integrity
#    Constraint      Type      Variables
------------------------------------------
1    myconstraint    Unique    number name

PRIMARY KEY

SAS defines the PRIMARY KEY as a combination of the UNIQUE and NOT NULL constraints. I’m assuming, and have not verified this, yet, that unlike SQL Server, or INNODB tables in MySQL that it is not clustered. However, I haven’t hit the index part of the Study Guide so SAS may do something like that.

There can only be one primary key on a table. ALL of the following can be used to create a primary key.

proc sql;
CREATE TABLE work.myjunk
(
	name char(10),
	number int PRIMARY KEY
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type           Variables
-------------------------------------------
1    _PK0001_      Primary Key    number

proc sql;
CREATE TABLE work.myjunk
(
name char(10),
number int,
CONSTRAINT mypk PRIMARY KEY (number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

Integrity
# Constraint Type Variables
-------------------------------------------
1 mypk Primary Key number

proc sql;
CREATE TABLE work.myjunk
(
name char(10),
number int,
CONSTRAINT mypk PRIMARY KEY (name,number)
);
DESCRIBE TABLE CONSTRAINTS work.myjunk;
quit;

     Integrity
#    Constraint    Type           Variables
---------------------------------------------
1    mypk          Primary Key    name number

The last version specifies that both the name and number fields are part of the primary key and that the restrictions for a primary key apply to both columns. For instance, the combination of the two columns must be unique, but you can repeat a value in a column. For example, this is ok,

name  number
A     1
A     2

But this would not be,

name  number
A     1
A     1

This also applies to a unique constraint that isn’t a primary key.

I’m going to stop now. I find myself getting a bit mixed up on the syntax with all the copy/pasting I’m doing and I fear I’m going to blow it. I think, however, this should give a decent starting point to constraints even if it feels woefully incomplete to me.

27 May

Inserting with a SELECT

Honestly, I don’t think I’ve ever done this outside of a book, but, it’s something that comes up so it’s worth mentioning.  Fortunately, everyone happily works the same way so I’m just going to do this in MySQL.

Suppose we have two identical tables and we want to copy data from one to another. We could use an external tool, like BCP, or mysqldump, or we could just use a INSERT / SELECT as follows;

mysql> DESC myjunk;
+——–+———+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——–+———+——+—–+———+——-+
| name | char(1) | NO | | NULL | |
| number | int(11) | NO | | 0 | |
+——–+———+——+—–+———+——-+
2 rows in set (0.00 sec)

mysql> DESC myjunk1;
+——–+———+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——–+———+——+—–+———+——-+
| name | char(1) | NO | | NULL | |
| number | int(11) | NO | | 0 | |
+——–+———+——+—–+———+——-+
2 rows in set (0.00 sec)

mysql> SELECT * FROM myjunk;
+——+——–+
| name | number |
+——+——–+
| A | 11 |
| B | 22 |
| C | 33 |
+——+——–+
3 rows in set (0.00 sec)

mysql> SELECT * FROM myjunk1;
Empty set (0.00 sec)

mysql> INSERT INTO myjunk1
-> SELECT * FROM myjunk;
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM myjunk1;
+——+——–+
| name | number |
+——+——–+
| A | 11 |
| B | 22 |
| C | 33 |
+——+——–+
3 rows in set (0.00 sec)

This, thankfully, is super clean and for the most part it just works. However, there are a couple of tweaks to it.

mysql> INSERT INTO myjunk1
    -> SELECT number, name FROM myjunk;
ERROR 1406 (22001): Data too long for column 'name' at row 1

This was caused because the Name column is a char(1) so it can’t handle the longer number types.

mysql> ALTER TABLE myjunk1 MODIFY name varchar(10);
Query OK, 0 rows affected (0.17 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> INSERT INTO myjunk1
    -> SELECT number, name FROM myjunk;
ERROR 1366 (HY000): Incorrect integer value: 'A' for column 'number' at row 1

It would have been nice if it had dropped this error first, before I modified the column, but MySQL isn’t know for solid/clean errors. Anyways, just make sure that if you are switching columns around, for some reason, that you don’t bonk the data types.

Also, you must have an equal number of columns. For example, this will not work.

mysql> desc myjunk1;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name   | varchar(10) | YES  |     | NULL    |       |
| number | int(11)     | NO   |     | 0       |       |
| newCol | varchar(10) | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> desc myjunk;
+--------+---------+------+-----+---------+-------+
| Field  | Type    | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| name   | char(1) | NO   |     | NULL    |       |
| number | int(11) | NO   |     | 0       |       |
+--------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> INSERT INTO myjunk1
    -> SELECT * FROM myjunk;
ERROR 1136 (21S01): Column count doesn't match value count at row 1

This, correctly, tossed an error in SQL Server, SAS and MySQL.

Good luck, and hope it helps someone, someday.