Tag Archive for programming

Using Data Step to Create a Dataset Template from a Dataset in SAS

Recently, I wanted to make sure that some temporary datasets that were being created during data processing in a dataset creation program weren’t truncating values or differed from the variable lengths in the original. It was then that a brainwave struck me: create an empty dataset shell using data step and use that set all the variable lengths for me when the new datasets were concatenated to it. The code turned out to be very simple and here is an example of how it looked:

data shell;
stop;
set example;
run;

The STOP statement, prevents the data step from reading in any of the values in the template dataset and just its header is written out to another (empty) dataset that can be used to set things up as you would want them to be. It certainly was a quick solution in my case.

On Making PROC REPORT Work Harder

In the early years of my SAS programming career, there seemed to be just the one procedure to use if you wanted to create a summary table. That was TABULATE and it was great for generating columns according to the value of a variable such as the treatment received by a subject in a clinical study. To a point, it could generate statistics for you too and I often used it to sum frequency and percentage variables. Since then, it seems to have been enhanced a little and it surprised me with the statistics it could produce when I had a recent play. Here’s the code:

proc tabulate data=sashelp.class;
class sex;
var age;
table age*(n median*f=8. mean*f=8.1 std*f=8.1 min*f=8. max*f=8. lclm*f=8.1 uclm*f=8.1),sex / misstext=”0″;
run;

When you compare that with the idea of creating one variable per column and then defining them in PROC REPORT as many do, it has to look more elegant and the results aren’t bad either though they can be tweaked further from the quick example that I generated. That last comment brings me to the point that PROC REPORT seems to have taken over from TABULATE wherever I care to look these days and I do ask myself if it is the right tool for that for which it is being used or if it is being used in the best way.

Using Data Step to create one variable per column in a PROC REPORT output doesn’t strike me as the best way to write reusable code but there are ways to make REPORT do more for you. For example, by defining GROUP, ACROSS and ANALYSIS columns in an output, you can persuade the procedure to do the summarising for you and there’s some example code below with the comma nesting height under sex in the resulting table. Sums are created by default if you do this and forgoing an analysis column definition means that you get a frequency table, not at all a useless thing in many cases.

proc report data=sashelp.class nowd missing;
columns age sex,height;
define age / group “Age”;
define sex / across “Sex”;
define height / analysis mean f=missing. “Mean Height”;
run;

For those times when you need to create more heavily formatted statistics (summarising range as min-max rather showing min and max separately, for example), you might feel that the GROUP/ACROSS set-up’s non-display of character values puts a stop to using that approach. However, I found that making every value combination unique and attaching a cell ID helps to work around the problem. Then, you can create a format control data set from the data like in the code below and create a format from that which you can apply to the cell ID’s to display things as you need them. This method does make things more portable from situation to situation than adding or removing columns depending on the values of a classification variable.

proc sql noprint;
create table cntlin as
select distinct  “fmtname” as fmtname, cellid as start, cellid as end, decode as label
from report;
quit;

proc format lib=work cntlin=cnlin;
run;

A look at Emacs

It’s amazing what work can bring your way in terms of technology. For me, (GNU) Emacs Has proved to be such a thing recently. It may have been around since 1975, long before my adventures in computing ever started in fact, but I am asking myself why I never really have used it much. There are vague recollections of my being aware of its existence in the early days of my using UNIX over a decade ago. Was it a shortcut card with loads of seemingly esoteric keyboard shortcuts and commands that put me off it back then? The truth may have been that I got bedazzled with the world of Microsoft Windows instead and so began a distraction that lingered until very recently. As unlikely as it looks now, Word and Office would have been part of the allure of what some consider as the dark side these days. O how OpenOffice.org and their ilk have changed that state of affairs…

The unfortunate part of the Emacs story might be that its innovations were never taken up as conventions by mainstream computing. If its counterparts elsewhere used the same keyboard shortcuts, it would feel like learning such an unfamiliar tool. Still, it’s not as if there isn’t logic behind it because it will work both in a terminal session (where I may have met it for the first time) and a desktop application GUI. The latter is the easier to learn and the menus list equivalent keyboard shortcuts for many of their entries too. For a fuller experience though, I can recommend the online manual and you can buy it in paper form too if you prefer.

One thing that I discovered recently is that external factors can sour the impressions of a piece of software.For instance, I was using a UNIX session where the keyboard mapping weren’t optimal. There’s nothing like unfamiliar behaviour for throwing you off track because you felt your usual habits were being obstructed. For instance, finding that a Backspace key is behaving like a Delete one is such an obstruction. It wasn’t the fault of Emacs and I have found that using Ctrl+K (C-k in the documentation) to delete whole lines is invaluable.

Apart from keyboard mapping niggles, Emacs has to be respected as a powerful piece of software in its own right. It may not have the syntax highlighting capabilities of some, like gedit or NEdit for instance, but I have a hunch that a spot of Lisp programming would address that need. What you get instead is support for version control systems like RCS or CVS along with integration with GDB for debugging programs written in a number of languages. Then, there are features like file management, email handling, newsgroup browsing, a calendar and calculator that make you wonder if they tried to turn a text editor into something like an operating system. With Google trying to use Chrome as the basis of one, it almost feels as is Emacs was ahead of its time though that may have been more due to its needing work within a UNIX shell in those far-off pre-GUI days. It really is saying something that it has stood the test of time when so much has fallen by the wayside. Like Vi, it looks as if the esteemable piece of software is showing no signs of going away just yet. Maybe it was well designed in the beginning and the thing certainly seems more than a text editor with its extras. Well, it has offer a good reason for making its way into Linux too…

SAS9 SQL Constraints

With SAS 9, SAS Institute have introduced the sort sort of integrity constraints that have been bread and butter for relational database SQL programs but some SAS programmers may find them more restrictive than they might like. The main one that comes to my mind is the following:

proc sql noprint;
create table a as select a.*,b.var from a left join b on a.index=b.index;
quit;

Before SAS 9, that worked merrily with nary a comment but you now will see a warning like this:

WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.

In data step, the following still runs without a complaint:

data a;
merge a b(keep=index var);
by index;
run;

On the surface of it this does look inconsistent. From a database programmer’s point of view having to use different source and target datasets is no hardship but seems a little surplus to requirements for a SAS programmer trained to keep down the number of temporary datasets in an effort to reduce I/O and keep things tidy, an academic concept perhaps in these days of high processing power and large disks. Adding UNDO_POLICY=NONE to the PROC SQL line does make everything consistent again but I see this as being anathema to a database programming type. I do admit to indulging in the override for personal quick and dirty purposes but abiding by the constraint is how I do things for formal purposes like inclusion in an application.

  • As is commonly the case with places like these, all the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. With regards to any comments left on the site, I reserve the right to reject any that are inappropriate. Otherwise, whatever is said is the sole responsibility of whoever is leaving the comment.