This is another one of those areas that most of us who have written code would probably skip. It just seems so basic but it’s the type of thing that can yield a question or two on a test. So, lets take a look at the DATA step. I’m probably skipping steps here.
The SAS DATA step consists of two phases: the compilation phase and the execution phase. Within each of these phases there are multiple steps. Lets go through both.
- Input Buffer: Created to hold a record from an external file. It is created only when raw data is read, not when a SAS data set is read.
- Program Data Vector: Holds a single observation in memory and creates 2 automatic variables.
- _N_: Counts the number of times the data step begins to execute.
- _ERROR_: If there is an error it is set to 1. The default is 0.
- Syntax Checking
- Data Set Variables: As the INPUT statement is generated, a slot is added to the program data vector for each variable.
- Descriptor: The descriptor part of the data set is created.
- Initializing Variables: _N_ set to 1 and _ERROR_ set to 0.
- INFILE and INPUT statements processed.
- Lines executed in sequence.
End Of Data Step
- Data is written to the output data set.
- Control returns to the top of the data set and _N_ increments.
- _ERROR_ is reset to zero if necessary.
My guess is that the most important parts of all this are the following:
- _N_ defaulting to 1 and incrementing each time the data step is processed.
- _ERROR_ defaulting to 0 and moving to 1 when there is an error.
- Variables are set to missing at the start of each data step.