31 Mar

SAS: Anatomy of the DATA step

This is another one of those areas that most of us who have written code would probably skip. It just seems so basic but it’s the type of thing that can yield a question or two on a test. So, lets take a look at the DATA step.  I’m probably skipping steps here.

The SAS DATA step consists of two phases: the compilation phase and the execution phase. Within each of these phases there are multiple steps. Lets go through both.

Compilation Phase

  • Input Buffer: Created to hold a record from an external file.  It is created only when raw data is read, not when a SAS data set is read.
  • Program Data Vector: Holds a single observation in memory and creates 2 automatic variables.
    • _N_: Counts the number of times the data step begins to execute.
    • _ERROR_: If there is an error it is set to 1.  The default is 0.
  • Syntax Checking
  • Data Set Variables: As the INPUT statement is generated, a slot is added to the program data vector for each variable.
  • Descriptor: The descriptor part of the data set is created.

Execution Phase

  • Initializing Variables: _N_ set to 1 and _ERROR_ set to 0.
  • INFILE and INPUT statements processed.
  • Lines executed in sequence.

End Of Data Step

  • Data is written to the output data set.
  • Control returns to the top of the data set and _N_ increments.
  • _ERROR_ is reset to zero if necessary.

My guess is that the most important parts of all this are the following:

  • _N_ defaulting to 1 and incrementing each time the data step is processed.
  • _ERROR_ defaulting to 0 and moving to 1 when there is an error.
  • Variables are set to missing at the start of each data step.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695104.htm

10 Jan

My Goal Is To Figure Out Who Built The Goal Table In Your Database

One of the first jobs that rolled my way in 2005 was to set up a method for tracking sales goals and the sales relative to them. They also wanted to present it to users in our Horde portal. I did some digging, found that the ERP had a small little application that tracked goals and sales, so I figured that all I would have to do is find the table where they stored the goals, extract the data, and we would be good.

So, I found the table and fired a basic query against it and here’s roughly what came back for a single sales rep.

Year Sales
2005  
2005 300000
2005 300000
2005 300000
2005 375000
2005 375000
2005 375000
2005 425000
2005 425000
2005 425000
2005 375000
2005 375000
2005 375000

Anyone, whose done any of this kind of work can spot the problem in half a second, “how do you know which of those records apply to which month?” Well, the answer is that the second record was January, and the third February, and so on and so forth. Yes, somewhere, along the way in the development process, someone, you’d think, would have said “wut”?

In order to use it, over on the database side, I had to get tricky and make sure the data export sorted correctly (something that shouldn’t have mattered) and then put counters in the code so I knew what was what. Had to make sure that the second record was January and all that. I managed to do it and it wound up working well enough.

Honestly, I have no idea why someone would build something like this except for one possible clue. In Progress there is an array data type. I really know nothing about it, except that it exists, and these values were in fact using that data type. In Cyberquery, in order to access January’s goal data you would write something like:

WHERE
slm_goal:year = 2005

LIST
slm_goal:sales[1]

Of course, you’d expect to write the WHERE clause as

WHERE
slm_date = 1.2005

But, I digress.

And there you had it, sales data for January 2005. Of course that’s still almost impossible to work with for anything meaningful outside of the system. But, Progress is a funky thing. As I understand it, and really I don’t, the programming language is also tightly integrated with the database. It may be that somewhere in that mess, someone saw this as an easy way to deal with some problem. It’s also possible that this piece of work was never intended to go into the mainstream as it was well-hidden and our main ERP support person didn’t even know about it. Still, it was sitting there on our system.

And the final straw, that ERP, wasn’t something you bought off newegg.com unless they’re selling $500K+ products these days. It was a legitimate ERP that was in a decent chunk of the industry it specialized in.