Tag Archive for Data Step

Dealing with variable length warnings in SAS 9.2

A habit of mine is to put a LENGTH or ATTRIB statement between DATA and SET statements in a SAS data step to reset variable lengths. By default, it seems that this can triggers truncation warnings in SAS 9.2 or SAS 9.3 when it didn’t in previous versions. SAS 9.1.3, for instance, allowed you have something like the following for shortening a variable length without issuing any messages at all:

data b;
length x $100;
set a;
run;

In this case, x could have a length of 200 previously and SAS 9.1.3 wouldn’t have complained. Now, SAS 9.2 and 9.3 will issue a warning if the new length is less than the old length. This can be useful to know but it can be changed using the VARLENCHK system option. The default value is WARN but it can be set to ERROR if you really want to ensure that there is no chance of truncation. Then, you get error messages and the program fails where it normally would run with warnings. Setting the value of the option to NOWARN restores the type of behaviour seen in SAS 9.1.3 and versions prior to that.

The SAS documentation says that the ability to change VARLENCHK can be restricted by an administrator so you might need to deal with this situation in a more locked down environment. Then, one option would be to do something like the following:

data b;
drop x;
rename _x=x;
set a;
length _x $100;
_x=strip(x);
run;

It’s a bit more laborious than setting to the VARLENCHK option to NOWARN but the idea is that you create a new variable of the right length and replace the old one with it. That gets rid of warnings or errors in the log and resets the variable length as needed. Of course, you have to ensure that there is no value truncation with either remedy. If any is found, then the dataset specification probably needs updating to accommodate the length of the values in the data. After all, there is no substitute for getting to know your data and doing your own checking should you decide to take matters into your hands.

There is a use for the default behaviour though. If you use a specification to specify a shell for a dataset, then you will be warned when the shell shortens variable lengths. That allows you to either adjust the dataset or your program. Also, it gives more information when you get variable length mismatch warnings when concatenating or merging datasets. There was a time when SAS wasn’t so communicative in these situations and some investigation was needed to establish which variable was affected. Now, that has changed without leaving the option to work differently if you so do desire. Sometimes, what can seem like an added restriction can have its uses.

Using Data Step to Create a Dataset Template from a Dataset in SAS

Recently, I wanted to make sure that some temporary datasets that were being created during data processing in a dataset creation program weren’t truncating values or differed from the variable lengths in the original. It was then that a brainwave struck me: create an empty dataset shell using data step and use that set all the variable lengths for me when the new datasets were concatenated to it. The code turned out to be very simple and here is an example of how it looked:

data shell;
stop;
set example;
run;

The STOP statement, prevents the data step from reading in any of the values in the template dataset and just its header is written out to another (empty) dataset that can be used to set things up as you would want them to be. It certainly was a quick solution in my case.

Using ODS Graphics to Create Plots Using PROC LIFETEST

One of the nice things about SAS 9.2 is that creation of statistical graphics is enhanced using ODS. One of the beneficiaries of this is PROC LIFETEST, a procedure that gained a lot when data sets could be created from it using ODS OUTPUT  statements. Before that, it was a matter of creating text output and converting it to a SAS data set using Data Step and that was a nuisance on a system that attached special significance to output destinations set up using PROC PRINTTO. What you’ll find below is a sample of the type of code for creating a Kaplan-Meier survival plot for time to adverse events resulting in discontinuation of study treatment with actual and censored times. The IMAGENAME parameter on the ODS GRAPHICS statement line controls the name of the file and it is possible to change the type using the IMAGEFMT parameter too.

ods graphics on / imagename=”fig5″;
proc lifetest data=km3 method=km plots=survival;
time timetoae*cens_ae(0);
run;
ods graphics off;

On Making PROC REPORT Work Harder

In the early years of my SAS programming career, there seemed to be just the one procedure to use if you wanted to create a summary table. That was TABULATE and it was great for generating columns according to the value of a variable such as the treatment received by a subject in a clinical study. To a point, it could generate statistics for you too and I often used it to sum frequency and percentage variables. Since then, it seems to have been enhanced a little and it surprised me with the statistics it could produce when I had a recent play. Here’s the code:

proc tabulate data=sashelp.class;
class sex;
var age;
table age*(n median*f=8. mean*f=8.1 std*f=8.1 min*f=8. max*f=8. lclm*f=8.1 uclm*f=8.1),sex / misstext=”0″;
run;

When you compare that with the idea of creating one variable per column and then defining them in PROC REPORT as many do, it has to look more elegant and the results aren’t bad either though they can be tweaked further from the quick example that I generated. That last comment brings me to the point that PROC REPORT seems to have taken over from TABULATE wherever I care to look these days and I do ask myself if it is the right tool for that for which it is being used or if it is being used in the best way.

Using Data Step to create one variable per column in a PROC REPORT output doesn’t strike me as the best way to write reusable code but there are ways to make REPORT do more for you. For example, by defining GROUP, ACROSS and ANALYSIS columns in an output, you can persuade the procedure to do the summarising for you and there’s some example code below with the comma nesting height under sex in the resulting table. Sums are created by default if you do this and forgoing an analysis column definition means that you get a frequency table, not at all a useless thing in many cases.

proc report data=sashelp.class nowd missing;
columns age sex,height;
define age / group “Age”;
define sex / across “Sex”;
define height / analysis mean f=missing. “Mean Height”;
run;

For those times when you need to create more heavily formatted statistics (summarising range as min-max rather showing min and max separately, for example), you might feel that the GROUP/ACROSS set-up’s non-display of character values puts a stop to using that approach. However, I found that making every value combination unique and attaching a cell ID helps to work around the problem. Then, you can create a format control data set from the data like in the code below and create a format from that which you can apply to the cell ID’s to display things as you need them. This method does make things more portable from situation to situation than adding or removing columns depending on the values of a classification variable.

proc sql noprint;
create table cntlin as
select distinct  “fmtname” as fmtname, cellid as start, cellid as end, decode as label
from report;
quit;

proc format lib=work cntlin=cnlin;
run;

ERROR 22-322: Syntax error, expecting one of the following: a name, *.

This is one of the classic SAS errors that you can get from PROC SQL and it can be thrown by a number of things. Missing out a comma in a list of variables on a SELECT statement is one situation that will do it, as will having an extraneous one. As I discovered recently, an ill-defined SAS function nesting like LEFT(TRIM(PERIOD,BEST.)) will have the same effect; notice the missing PUT function in the example. The latter surprised me because I might have expected something more descriptive for this as would be the case in data step code. In the event, it took some looking before the problem hit me because it’s amazing how blind you can become to things that are staring you in the face. Familiarity really can make you pay less attention.

ERROR: Invalid value for width specified – width out of range

This could be the beginning of a series on error messages from PROC SQL that may appear unclear to a programmer more familiar with Data Step. The cause of my getting the message that heads this posting is that there was a numeric variable with a length less that the default of 8, not the best of situations. Sadly, the message doesn’t pin point the affected variable so it took some commenting out of pieces of code before I found the cause of the problem. That’s never to say that PROC SQL does not have debugging functionality in the form of FEEDBACK, NOEXEC, _METHOD and _TREE options on the PROC SQL line itself or the validation statement but neither of these seemed to help in this instance. Still, they’re worth keeping in mind for the future as is SAS Institute’s own page on SQL query debugging. Of course, now that I know what might be the cause, a simple PROC SQL report using the dictionary tables should help. The following code should do the needful:

proc sql;
select memname, name, type, length from dictionary.columns where libname=”DATA” and type=”num” and length ne 8;
quit;

  • As is commonly the case with places like these, all the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. With regards to any comments left on the site, I reserve the right to reject any that are inappropriate. Otherwise, whatever is said is the sole responsibility of whoever is leaving the comment.