11 May

SAS: Nearing The Finish Line #3

First, a table comparing the various data combining methods.

 DescriptionFormatTotal SetsOptions
One-To-OneSecond set variable overwrites first when matched. New variables added. Smallest set wins for total observations in new set.set mydata1;
set mydata2;
ConcatenatingAdds second set to the end of the first. No data is overwritten. Matching variables are merged together and new not matched variables are added to the new data set.set mydata1 mydata2;
AppendingAdds the second set to the first. Only like information is appended. Variable length is determined by the base set. If their type is different, base survives and data is set to missing.PROC APPEND base= data = ;2Must use FORCE option when variables in base and data not the same, in general.
InterleavingData must be sorted.set mydata mydata1;
by ;
Match MergingData sets are merged on a common variable. BY variable must be same in all data sets. Data sets must be sorted. All variables are included in the new data set.merge mydata mydata1;
by var;
Descending

RENAME

Can be used to rename a variable:

data mydata(rename=(oldvar=newvar));

data mydata;
   set moredata(rename=(oldvar=newvar));

proc print data=mydata (rename=(oldvar=newvar));

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000131169.htm

But wait, there’s more. This also works.

data mydata;
	infile '....';
	input name $9. earnings 10-13;
	rename name=myname;
run;

IN

IN creates a temporary variable that you use to verify if a record is in a data set, specifically with a merge.

WHERE

Does, not, perform automatic type conversion.

Numeric Automatic Conversion

  • Assigned to a numeric variable.
  • Used in arithmetic operation.
  • Specified in a function that requires numeric data.

iNput

Converts character data to numeric.  Uses an INformat.  Spelling intentional.

Character Automatic Conversion

  • Assigned to a character value.
  • Use a numeric value in an operator that requires a character value (concat).
  • Specified in a function that requires character data.

INTCK

Difference between.  Time interval.

INTNX

Adds interval to a date.

PUT

Converts numeric data to character.

SCAN

Returns specific word number based on a delimiter.

Assigns length of 200.

TRIM

Does not alter original variable.

FIND vs INDEX

Find, left only.

Index can go left or right and can have a modifier specifying lowercase (i).

ROUND

Is weird, so weird I won’t forget it.

DO WHILE

May not execute as it is evaluated at the top of the loop.

DO UNTIL

Will always execute at least once as it evaluates at bottom of loop.

DIM

Counts the elements in an array.

Character Array

When created you must specify dollar($) sign.

Initial array elements need parenthesis when assigned as a group.

DLM

Used in INFILE to specify delimiter.  Can also be used with the FILE statement to write data to a file.

semicolon

Used to read nonstandard data longer than 8 characters.

Ampersand

Reads data with spaces in it.

MMDDYY

Can be separated by spaces or other delimiters.

10 May

PROC FREQ vs. PROC MEANS

I was going back over these 2 and I realized I wasn’t 100% clear on the two.  I’m probably not but it would be easy to forget, for instance, that one uses TABLES, and the other uses VAR.  So I made a quick table to, I hope, remember, them.

And, for the record, this is post 200.  I don’t even want to know how many of those I wrote in March, but by the time the weekend is over, it might be more than 150.

 PROC MEANSPROC FREQ
What it does.Calculates descriptive statistics.Calculates frequency statistics.
Format.PROC MEANS data='....' options;
var .....;
class ....;
run;
PROC FREQ data=....;
TABLES .....;
run;
Specify Statistics.In the PROC MEANS statement, or, any output statements used.
Options.MAXDEC, NOPRINT suppresses default output.NOFREQ, NOPERCENT, NOROW, NOCOL
Restricting variables.VAR.TABLES.
GROUP Processing.BY: data must be sorted or indexed. CLASS: same idea but doesn't separate by variable into a page format, and doesn't need to be sorted.TABLES row * col.
OtherCan add _TYPE_ and _FREQ_ to output./LIST everything in a row.
/CROSSLIST similar to list but adds totals.
09 May

SAS: Nearing The Finish Line #2

Continuing my final push. As before, it’s random stuff that I want to nail down for test preparation.

RETAIN

Can be used to keep character data as well as numeric data.

DATA

The DATA statement can be used to create multiple data sets at the same time.

data mydata mydata1;
input name $6. weight 7-10;
datalines;
Joe    195
Joe    180
Bill  2250
Jack   180
;
run;

proc print data=mydata1;
run;

Obs    name    weight
1     Joe       195 
2     Joe       180 
3     Bill     2250 
4     Jack      180

You can also conditionally output data to a data set.  For example,

data my1099 no1099;
input name $6. earnings 7-10;
if earnings >= 600 then output my1099;
else output no1099;
datalines;
Joe    195
Joe    700
Bill  2250
Jack   400
;
run;

proc print data=my1099;
run;

proc print data=no1099;
run;

Obs    name    earnings

 1     Joe        700  
 2     Bill      2250  

Obs    name    earnings

 1     Joe        195  
 2     Jack       400

This probably deserves its own article but I’m pretty burned out on writing and I knew this without having to do much research.

http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001343763.htm

PROC EXPORT

I think I’m getting loopy writing all of this stuff.  I had a post where I linked to a file saying I’d worked with PROC EXPORT, only, no PROC EXPORT, just date _null_.  Since PROC EXPORT is pretty simply I’ll add it here.

proc export data=mydata outfile='....';
run;

There are multiple options. The main one’s to note are DELIMITER (which does what you think it does) and DBMS. DBMS specifies the type of file you are writing. CSV to write a CSV file, for instance. You can read more here:

http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n0ku4pxzx3d2len10ozjgyjbrpl9.htm

Note that the delimiter option requires, I think, DBMS=DLM and the DELIMITER statement goes in the body of the PROC EXPORT.

Running Totals

This generates a running total:

tot + earnings;

This, does not.

tot = tot + earnings;

It will return a missing value because tot is not defined.

This, however, will generate a running total.

retain tot 0;
tot = tot + earnings;

Date Constants

Don’t forget the ‘d’. And, this will error.

var = '122112'd;

You need to use.

var = '21DEC2012'd;

Year can be 2 or 4 digits.

PROC FORMAT

Character formats begin with a $.  To clarify, that means when they are applied to character data.  You can apply character formats (large, small, medium, etc.) to numeric data but they do not need a $.

Cannot end in a number.  Sheesh!

-< To specify a range.

LOW, HIGH and OTHER can be be used as a range.

Numeric format names can be 32 characters long.  Character format names can only be 31 characters.  Don’t ask why, I don’t know.

The semicolon is used at the end of the format statement.  Unlike other statements, IF, for instance, it’s not applied for each statement.

NOCUM

NOCUM is a TABLES option in PROC FREQ that eliminates the cumulative totals from the PROC FREQ statement.

08 May

SAS: MISSOVER, TRUNCOVER, STOPOVER, FLOWOVER and SCANOVER

These deserve their own post, especially since, for the most part, I completely missed them. I’m not sure how much of a part they play on the test. The Study Guide barely mentions them, but, it does mention them. Some data:

Joe     100
Sam
Bill    700
Mike   1000

MISSOVER

MISSOVER applies the missing record value when a line in the data source short.

data mytest;
infile '....' missover;
input name $7. earnings 8-11;
run;

Obs    name    earnings
 1     Joe        100  
 2     Sam          .  
 3     Bill       700  
 4     Mike      1000

TRUNCOVER

TRUNCOVER tells SAS to stop reading when it hits the end of a line, even a short line. However, without any other options specified it returns the same results as MISSOVER.

data mytest;
infile '....' truncover;
input name $7. earnings 8-11;
run;

Obs    name    earnings
 1     Joe        100  
 2     Sam          .  
 3     Bill       700  
 4     Mike      1000

FLOWOVER

I should really show this first, as it’s the default. It simply says to roll on through to the next line when you are missing a value. It’s a way to skip missing records.

data mytest;
infile '....' flowover;
input name $7. earnings 8-11;
run;

or

data mytest;
infile '....';
input name $7. earnings 8-11;
run;

Obs    name    earnings
 1     Joe        100  
 2     Bill       700  
 3     Mike      1000

STOPOVER

STOPOVER tells SAS to stop reading when you find a missing record.

data mytest;
infile '....' stopover;
input name $7. earnings 8-11;
run;

ERROR: INPUT statement exceeded record length. INFILE .... OPTION STOPOVER specified.

SCANOVER
SCANOVER, is a bit weird at first glance. What it does is position the pointer to a search condition in the file.  It’s very similar to @ and even uses the same symbol.   Then, once positioned, it continues reading as normal.  I’m going to need a different set of data to demo this.

name: Joe     100
name: Sam        
Bill          700
name: Mike   1000
data mytest;
infile '....' scanover;
input @'name:' name $13. earnings 14-17;
run;

Obs      name       earnings
1     name: Sam        .

I’m going all out and making a prediction that this isn’t what you’d want to happen. This is probably more like it:

data mytest;
infile '....' scanover truncover;
input @'name: ' name $7. earnings 14-17;
run;

Obs    name    earnings
1     Joe        100  
2     Sam          .  
3     Mike      1000

Essentially what we’re doing here is having SAS hunt for ‘name:’, then when it finds it reads 7 characters for the name value. I kept trying to do it by reading 13 characters, which didn’t work too well.

That’s all I’ve got here. I would like to bring your attention to my crack use of the observation numbers for one section, and not the next. That’s 100% due to my attention to detail and approach to learning. Or, it could be that I’m NOOBing it up and forget to NOOBS them.

Here’s a PDF that goes into a lot more detail than I have, or ever will.

http://www.nesug.org/proceedings/nesug01/at/at1sa1.pdf

07 May

SAS: Nearing The Finish Line #1

This will be a bit random, but I wanted to refresh some of the things I missed in the first go round.

PROC CONTENTS vs PROC DATASETS

proc contents data=mystuff;

proc datasets;    contents = mystuff;

Essentially the same, except, contents defaults to work/user, datasets to procedure input library.  Both of these will run without an error.

proc contents;
run;

proc datasets;
run;

PAGESIZE = Length, LINESIZE = width

FIRSTOBS and OBS

Can be used in a PROC or as an option.

ID

Replaces the observation number with the values in the ID statement..  The observation number can also be removed by calling the proc print statement a bunch of noobs.

PROC SORT

You can set a sort as NOTSORTED.  I don’t know what it does but this guy wrote a whole paper on it.

http://www.nesug.org/proceedings/nesug06/po/po14.pdf

DESCENDING is only good for the record which follows it.  If you want to sort two variables in descending order then you will need to use the DESCENDING statement twice.

Unless,

proc sort data=mystuff;
	by descending _ALL_;
run;

Which works for the first variable, but not the second. However, this will work for the second, and not the first.

proc sort data=mystuff;
	by descending _NUMERIC_;
run;

_character_ will sort the first character variable.

However, if we forget about descending, this will sort all variables.

proc sort data=mystuff;
	by _all_;
run;

LABEL

Needs an equal sign and it must be set as an option in the proc print statement.

SPLIT

Split, splits labels.  The best way is an example.

data mystuff ;
input name $6. weight 7-9;
datalines;
Joe   195
Joe   180
Bill  225
Jack  180
;
run;

proc print data=mystuff noobs label split='*';
label weight="my*weight*is*too*fat";
run;
          my
         weight
          is
          too
          fat
Joe       195 
Joe       180 
Bill      225 
Jack      180

vs.

proc print data=mystuff noobs label split='*';
label weight="my weight is too fat";
run;
         my weight
name    is too fat

Joe         195   
Joe         180   
Bill        225   
Jack        180,

I thought this would be longer but I’ve done hit the end of my momentum on this for the day. There will be more.

06 May

SAS: Comments

I missed this completely, and, I probably would have gotten one of these wrong on a question.  SAS, has two forms for commenting your code.  One is fairly common, and the other unexpected.  You comment your code as follows:

/* This is a comment */

* This is also a comment;

The second one must have the semicolon at the end.

The manual also makes a note that some operating systems may have trouble with /* in the first or second column.

Honestly, I wouldn’t even blink doing that. It’s how I’ve done it a million times. The one that would scare me is dropping a * in the middle of some code.

05 May

SAS: Repeating Data

Tuco: I’m looking for the owner of that horse. He’s tall, blonde, he smokes a cigar, and he’s a pig!
The Good, the Bad and the Ugly – 1996

 can’t remember running into this kind of data in a long time.  I know it happens, and it’s out there, but it would be rare for most people.  However, this is SAS, and it’s a chapter unto itself, so here goes.

Suppose you have some data that looks like this, including the line break:

122112 500 122212 600 122312 700
122412 800 122512 900 122612 1000

What you have is repeating data that runs across multiple lines and, unfortunately, the techniques I’ve written about earlier won’t work here. Here we have 6 distinct records that span 6 lines. What we want to see is:

122112 500 
122212 600 
122312 700
122412 800 
122512 900 
122612 1000

So, how do we get that? Well, in this specific case we use the @ symbol after the input data. The @@ symbol tells SAS to hold its position in the file through multiple iterations of the data step. Something like this:

data mydata;
input mydate : mmddyy6. mysales @@;

datalines;
122112 500 122212 600 122312 700
122412 800 122512 900 122612 1000
;
run;

mydate    mysales
 19348       500 
 19349       600 
 19350       700 
 19351       800 
 19352       900 
 19353      1000

A few notes about @@ symbol:

  • Should not be used with the MISSOVER option.
  • Should not be used with column input
  • Should not be used with the @ symbol.

The @@ symbol is very straight-forward.  As a side note, I’ve got to spend time with the colon before I test.  It still feels unnatural to me.

SAS also has the @ symbol.  The @ symbol is similar to the @@ symbol, except that it doesn’t hold the INPUT statement across data step iterations.  Using the earlier example:

data mydata;
input mydate : mmddyy6. mysales @;

datalines;
122112 500 122212 600 122312 700
122412 800 122512 900 122612 1000
;
run;

mydate    mysales
19348      500  
19351      800

It’s useful, on the other hand, to be able to read a fixed number of records per line. My example has 3 records on each line so if we wanted to have it duplicate the earlier example with @@, we would do the following:

data mydata;

do ctr = 1 to 3;
    input mydate : mmddyy6. mysales @;
end;

datalines;
122112 500 122212 600 122312 700
122412 800 122512 900 122612 1000
;
run;
ctr    mydate    mysales
4      19350       700 
4      19353      1000

Despite what that looks like, it actually did work because the record it returned was the last one in the row. That ctr = 4 is also an indication of that. However, that isn’t terrible useful so lets add an OUTPUT statement, to force SAS to write data each time the loop processes.

data mydata;

do ctr = 1 to 3;
    input mydate : mmddyy6. mysales @;
	output;
end;

datalines;
122112 500 122212 600 122312 700
122412 800 122512 900 122612 1000
;
run;

ctr    mydate    mysales
 1      19348       500 
 2      19349       600 
 3      19350       700 
 1      19351       800 
 2      19352       900 
 3      19353      1000

Much, better.

You can also use multiple INPUT statements with the @ symbol to read a single variable then process the rest of a line.

data mydata;

input id @;
do ctr = 1 to 3;
    input mydate : mmddyy6. mysales @;
	output;
end;
format mydate mmddyy8.;

datalines;
100 122112 500 122212 600 122312 700
101 122112 800 122212 900 122312 1000
;
run;

 id    ctr      mydate    mysales
100     1     12/21/12       500 
100     2     12/22/12       600 
100     3     12/23/12       700 
101     1     12/21/12       800 
101     2     12/22/12       900 
101     3     12/23/12      1000

Finally, what happens when you have missing or empty columns? This requires some special processing steps.

Note: This is pretty much identical to the Study Guide. I’m using their data source, and well, it’s going to look the same, or even closer than normal, to the Study Guide.

First, some data.

data mydata;
infile '....' firstobs=5 obs=10;
input id (sales1-sales4) (comma9.);
run;

 id      sales1     sales2     sales3     sales4
1190    2189.84    5023.57    2794.67    4243.35
1382    3456.34    2065.83    3139.08    6503.49
1734    2345.83    3423.32    1034.43    1942.28
1824    1323.34    2472.85        .          .  
1943    1908.34        .          .          .  
2046    1423.52    1673.46    3276.65        .

If we run the same code I ran earlier, albeit changed slightly for this file, it will run.

data mydata;
infile '....' firstobs=5 obs=10;

input id @;
do ctr = 1 to 4;
	input sales : comma9. @;
	output;
end;

run;

 id     ctr     sales

1190     1     2189.84
1190     2     5023.57
1190     3     2794.67
1190     4     4243.35
1382     1     3456.34
1382     2     2065.83
1382     3     3139.08
1382     4     6503.49
1734     1     2345.83
1734     2     3423.32
1734     3     1034.43
1734     4     1942.28
1824     1     1323.34
1824     2     2472.85
1824     3         .  
1824     4     1943.00
2046     1     1423.52
2046     2     1673.46
2046     3     3276.65
2046     4         .

This is a nice result. And if you weren’t careful, it could fool you. If you look at the results of the code just above this exercise, you’ll see a lot more missing values. And, in fact, it completely skipped the 1943 sequence. We prevent that by using the MISSOVER statement in the INFILE statement. I hope I didn’t blow this in an earlier demo.

data mydata;
infile '....' firstobs=5 obs=10 MISSOVER;

input id @;
do ctr = 1 to 4;
	input sales : comma9. @;
	output;
end;

run;

 id     ctr     sales
1190     1     2189.84
1190     2     5023.57
1190     3     2794.67
1190     4     4243.35
1382     1     3456.34
1382     2     2065.83
1382     3     3139.08
1382     4     6503.49
1734     1     2345.83
1734     2     3423.32
1734     3     1034.43
1734     4     1942.28
1824     1     1323.34
1824     2     2472.85
1824     3         .  
1824     4         .  
1943     1     1908.34
1943     2         .  
1943     3         .  
1943     4         .  
2046     1     1423.52
2046     2     1673.46
2046     3     3276.65
2046     4         .

That’s better, and it’s correct. But we can clean it up a bit more. We don’t really need to know those values are missing. Really all we want are the records that have values. You can do that by rewriting the DO loop to use a condition and not a fixed ctr value. For example:

data mydata;
infile ‘….’ firstobs=5 obs=10 MISSOVER;

input id sales : comma9. @;
do while (sales ne .);
output;
input sales : comma9. @;
end;

id sales
1190 2189.84
1190 2189.84
1190 5023.57
1190 2794.67
1190 4243.35
1382 3456.34
1382 3456.34
1382 2065.83
1382 3139.08
1382 6503.49
1734 2345.83
1734 2345.83
1734 3423.32
1734 1034.43
1734 1942.28
1824 1323.34
1824 1323.34
1824 2472.85
1943 1908.34
1943 1908.34
2046 1423.52
2046 1423.52
2046 1673.46
2046 3276.65

I can guarantee that is confusing, and I screwed up the formatting, so here is what happened.

First, you create an input statement that reads the first id column and the first sales column. This seems unnatural at first but it makes sense because of the next step.

Second, you create a DO/WHILE loop that runs while sales is not equal to missing. Because it’s a DO/WHILE loop it will never execute if there are no sales values. This is why you read the first sales column in the first INPUT record.

Third, you immediately fire an OUTPUT statement to write the data to the data set.

Fourth, you use an INPUT to read the file for the next sales value.

Finally, the loop repeats until you read a missing record or it reads the end-of-line marker which also returns a missing record. I don’t think I did that justice. Let me try it one other way.

  1. input id sales : comma9. @; – Read the data file.
  2. do while (sales ne .); – Tests if the sales record is missing, which also happens at the end of the line.
  3. output; – Writes data to the data set.
  4. input sales : comma9. @; – Reads the next record in the file.
  5. Repeat until sales equals missing.

This is logical, after a bit, and very old school, but it makes sense after you look at it a bit.

Well, this is close to the end of my SAS BASE writing.  The book goes through an additional chapter but it’s really just a dissection of earlier stuff.  I’ll probably write a couple more posts that cover the topic, specifically areas that I don’t own completely, which is more than I’d like, actually, and I’ll be done.

Of course, by the time this goes live, I’ll have long since completed this test, and hopefully, passed.

04 May

SAS: Reading Multiple Lines

Finally, something that will be easy and clean.  Sometimes, usually with addresses, you will find data where a record spans multiple lines.  Something like this.

data mydata;
datalines;
The White House
1600 Pennsylvania Ave NW  
Washington, DC 20500
;
run

This data has a lot of problems.

  1. It spans multiple lines.
  2. Parts of it are delimited and parts aren’t.
  3. There’s a comma after the city.

As this is a post on multiple lines we’re going to focus on that.  However, the code will clean up the other problems.  Let’s go through this step-by-step.

Step 1: Read the name field.

There are several ways to do this but the easiest is to treat it as a fixed range.  This is what I did.

data mydata;
input name $ 1-30;

datalines;
The White House
1600 Pennsylvania Ave NW  
Washington DC 20500
;
run;

name
The White House         
1600 Pennsylvania Ave NW
Washington DC 20500

Now, that worked for reading the file but SAS still thinks there are 3 records where there is one. We need to fix that.

Step 2: Combine 3 lines into a single line.

SAS allows you to treat multiple lines as a single INPUT statement by using the forward-slash(/). The forward slash simply says to continue to the next line and to continue processing the INPUT statement. You use it like this.

data mydata;
input name $ 1-30 /
address $ 1-30 /
citystatezip $ 1-30;

datalines;
The White House
1600 Pennsylvania Ave NW  
Washington DC 20500
;
run;
name                  address                 citystatezip
The White House    1600 Pennsylvania Ave NW    Washington DC 20500

It works with multiple lines.

data mydata;
input name $ 1-30 /
address $ 1-30 /
citystatezip $ 1-30;

datalines;
The White House
1600 Pennsylvania Ave NW  
Washington, DC 20500
Washington Monument 2 15th St NW Washington, DC 20007
;
run;
name                   address                   citystatezip
The White House       1600 Pennsylvania Ave NW   Washington, DC 20500   
Washington Monument   2 15th St NW               Washington, DC 20007

 

Step 3: Separate the City, State and Zip fields.

I touched on this in an earlier post, and I’m already forgetting it, so it’s good practice. And it’s such a big part of everything going on in this test. I think you couldn’t begin to do enough with informats for this test. I’ll probably have to write a post that does everything weird I can think of with them.

Here is how you separate the fields.

input name $ 1-30 /
address $ 1-30 /
city $20. state $ zip $;

datalines;
The White House
1600 Pennsylvania Ave NW  
Washington, DC 20500
Washington Monument
2 15th St NW  
Washington, DC 20007
;
run;

name                   address                       city      state     zip
The White House        1600 Pennsylvania Ave NW    Washingt     DC      20500
Washington Monument    2 15th St NW                Washingt     DC      20007

Well, they are separated.

Step 4: Lets clean them up.

First, lets set the city to the proper length, and remove the comma from it.

data mydata;
length city $20.;
input name $ 1-30 /
address $ 1-30 /
city $ state $ zip $;

city = tranwrd(city, ',','');

datalines;
The White House
1600 Pennsylvania Ave NW  
Washington, DC 20500
Washington Monument
2 15th St NW  
Washington, DC 20007
;
run;
city              name               address                     state     zip
Washington    The White House        1600 Pennsylvania Ave NW     DC      20500
Washington    Washington Monument    2 15th St NW                 DC      2000

Now, with that done, there is one other trick to reading multiple line data, and that is the # symbol, which says to go to a specific line. In other words, we could have written the earlier code as follows:

data mydata;
length city $20.;
input 
#1 name $ 1-30
#2 address $ 1-30
#3 city $ state $ zip $;

city = tranwrd(city, ',','');

datalines;
The White House
1600 Pennsylvania Ave NW  
Washington, DC 20500
Washington Monument
2 15th St NW  
Washington, DC 20007
;
run;

You also do not need to write them in order. This is also valid,

input 
#3 city $ state $ zip $
#2 address $ 1-30
#1 name $ 1-30;

You can also mix and match the two options. For instance,

#1 name $ 1-30
#2 address $ 1-30 / 
city $ state $ zip $;

However, the following doesn’t work. You can’t #, then /, then #, backwards.

input 
#2 address $ 1-30 / 
city $ state $ zip $
#1 name $ 1-30;
ERROR: Old line 27 wanted but SAS is at line 29. 
       Use: INFILE N=X; , with a suitable value of x.

On the other hand, you can do this,

input 
#1 name $ 1-30 /
address $ 1-30 
#3 city $ state $ zip $;

and this,

input 
#1 name $ 1-30 
#2 address $ 1-30 
#1 name1 $ 1-30
#3 city $ state $ zip $; 

name                 address                   name1               city      state zip
The White House      1600 Pennsylvania Ave NW  The White House     Washingt     DC 20500
Washington Monument  2 15th St NW              Washington Monument Washingt     DC 20007

But, as you can see the results, will be erratic. I also pulled the LENGTH statement from the code so the city variable is defaulting to 8 characters. It’s always good to be reminded about that.

And I think this is it. For once, I didn’t swear much while writing one of these.

03 May

SAS: Additional Date Notes

Be very careful with date difference questions. This is a SAS trick that they pulled in the Study Guide.

Lets say you started something yesterday and you finished it today. Your first inclination, because the question will imply that you are calculating the difference, is to subtract yesterday’s date from today’s date which will yield 1. But not here. You started yesterday and ended today. That, my friend, means 2 is the correct answer: yesterday + today = 2. Of course, if you started at 3:00 and ended at noon, theoretically, you didn’t even work 1 day. But SAS would probably expect that answer to be 2.

Just be careful. I missed this in the practice test because I went straight to the difference calculation out of habit.

And for the record, I think it’s a nonsensical approach. There are a million better ways to approach this question than what they’ve done in the Study Guide. But, the Study Guide is what they want, so it’s what you want.

There are also some quirky things that dates can do.

data mydata;
input @1 mydate mmddyy8.;
mydate1 = mydate;
format mydate1 mmddyy10.;
datalines;
12/30/1999
;
run;

mydate       mydate1
-14612    12/30/1919

Because I read the date with mmddyy8 the date only saw the first 19, and with YEARCUTOFF set to 1910, you get the year 1919.

data mydata;
input @1 mydate mmddyy8.;
mydate1 = mydate;
format mydate1 time5.;
datalines;
12/30/1999
;
run;

mydate    mydate1
-14612     -4:03

Using the date value for a time produced 4:03 or -14612 / 3600.

Finally, I’m stating to think I’ll need to memorize these. There aren’t many here so it shouldn’t be

02 May

SAS: Dates

SAS stores dates as numbers.  January 1, 1960 is 0.  Time is the number of seconds from midnight, which is 0.

The following apply to SAS Dates.

  • The are valid from 1582 AD through 20000 AD.
  • SAS adjusts for leap years.
  • SAS does not adjust for daylight savings time.

The Study Guide lists a number of date informats.

MMDDYYw

If there are delimiters they must be consistent.

MMDDYY6: 122199

MMDDYY8: 12 21 99 or 12/21/99 or some variant of that.

MMDDYY10: 12 21 1999 or 12/21/1999 or some variant of that.

DATEw

DATE7: 21Jan99

Date9: 21Jan1999

Date11: 21-Jan-1999

DATETIMEw

Reads values that include a date and time.

Here are a few sample dates;

data mydata;
infile '....';
input name $ @10 mydate : mmddyy8. @10 myweekdate17 : mmddyy8. @10 myworddate14 : mmddyy8.;
format myweekdate17 weekdate17.;
format myworddate14 worddate14.;
run;

 name      mydate         myweekdate17      myworddate14
Akron       14339     Mon, Apr 5, 1999       Apr 5, 1999
Brown       14346    Mon, Apr 12, 1999      Apr 12, 1999
Carnes      14361    Tue, Apr 27, 1999      Apr 27, 1999
Denison     14345    Sun, Apr 11, 1999      Apr 11, 1999
Fields      14349    Thu, Apr 15, 1999      Apr 15, 1999
Jamison     14350    Fri, Apr 16, 1999      Apr 16, 1999

Remarkably, there isn’t much else in the data section.