Tag Archive for programming

ERROR: Ambiguous reference, column xx is in more than one table.

Sometimes, SAS messages are not all that they seem and a number of them are issued from PROC SQL when something goes awry with your code. In fact, I got a message like the above when ordering the results of the join using a variable that didn’t exist in either of the datasets that were joined. This type of thing has been around for a while (I have been using SAS since version 6.11 and it was there then) and it amazes me that we haven’t seen a better message in more recent versions of SAS; it was SAS 9.2 where I saw it most recently.

proc sql noprint;
select a.yy, a.yyy, b.zz
from a left join b
on a.yy=b.yy
order by xx;
quit;

Creating placeholder graphics in SAS using PROC GSLIDE for when no data are available

Recently, I found myself with a plot to produce but there were no data to be presented so a placeholder output is needed. For a lisitng or a table, this is a matter of detecting if there are observations to be listed or summarised and then issuing a placeholder lisitng using PROC REPORT if there are no data available. Using SAS/GRAPH, something similar can be acheived using one of its curiosities.

In the case of SAS/GRAPH, PROC GSLIDE looks like the tool to user for the same purpose. The procedure does get covered as part of a SAS Institute SAS/GRAPH training course but they tend to gloss over it. After all, there is little reason to go creating presentations in SAS when PowerPoint and its kind offer far more functionality. However, it would make an interesting tale to tell how GSLIDE became part of SAS/GRAPH in the first place. Its existence makes me wonder if it pre-exists the main slideshow production tools that we use today.

The code that uses PROC GSLIDE to create a placeholder graphic is as follows (detection of the number of observations in a SAS dataset is another entry on here):

proc gslide;
note height=10;
note j=center “No data are available”;
run;
quit;

PROC GSLIDE is one of those run group procedures in SAS so a QUIT statement is needed to close it. The NOTE statements specify the text to be added to the graphic. The first of these creates a blank line of the required height for placing the main text in the middle of the graphic. It is the second one that adds the centred text that tells users of the generated output what has happened.

Creating waterfall plots in SAS using PROC GCHART

Recently, I needed to create a waterfall plot couldn’t use PROC SGPLOT since it was incompatible with publishing macros that use PROC GREPLAY on the platform that I was using; SGPLOT doesn’t generate plots in SAS catalogs but directly creates graphics files instead. Therefore, I decided that PROC GCHART needed to be given a go and it delivered what was needed .

The first step is to get the data into the required sort order:

proc sort data=temp;
by descending result;
run;

Then, it is time to add an ID variable for use in the plot’s X-axis (or midpoint axis in PROC GCHART) using an implied value retention to ensure that every record in the dataset had a unique identifier:

data temp;
set temp;
id+1;
run;

After that, axes have to be set up as needed. For instance, the X-axis (the axis2 statement below) needs to be just a line with no labels or tick marks on there and the Y-axis was fully set up with these, turning the label from vertical to horizontal as needed with the ANGLE option controlling the overall angle of the word(s) and the ROTATE option dealing with the letters, and a range declaration using the ORDER option.

axis1 label=none major=none minor=none value=none;
axis2 label=(rotate=0 angle=90 “Result”) order=(-50 to 80 by 10);

With the axis statements declared, the GCHART procedure can be defined. Of this, the VBAR statement is the engine of the plot creation with the ID variable used for the midpoint axis and the result variable used as the summary variable for the Y-axis. The DISCRETE keyword is needed to produce a bar for every value of the ID variable or GCHART will bundle them by default. Next, references for the above axis statements (MAXIS option for midpoint axis and AXIS option for Y-axis) are added and the plot definition is complete. One thing that has to be remembered is that GCHART uses run group processing so a QUIT statement is needed at the end to close it at execution time. This feature has its uses and appears in other procedures too though SAS procedures generally are concluded by a RUN statement.

proc gchart data=temp;
vbar id / sumvar=result discrete axis=axis2 maxis=axis1;
run;
quit;

Smoother use of more than one SAS DMS session at a time

Unless you have access to SAS Enterprise Guide, being able to work on one project at a time can be a little inconvenient. It is possible to open up more than one Display Manager System (DMS, the traditional SAS programming interface) session at a time but you can get a pop up window for SAS documentation for second and subsequent sessions and you don’t get your settings shared across them either.

The cause of both of the above is the locking of the SASUSER directory files by the first SAS session. However, it is possible to set up a number of directories and set the -sasuser option to point at different ones for different sessions.

On Windows, the command in the SAS shortcut becomes:

C:\Program Files\SAS\SAS 9.1\sas.exe -sasuser “c:\sasuser\session 1\”

On UNIX or Linux, it would look similar to this:

sas -sasuser “~/sasuser/session1/”

The “session1″ in the folder paths above can be replaced with whatever you need and you can have as many as you want too. It might not seem much of a need but synchronising the SASUSER folders every now and again can give you a more consistent set of settings across each and you don’t get intrusive pop up boxes or extra messages in the log either.

Using Data Step to Create a Dataset Template from a Dataset in SAS

Recently, I wanted to make sure that some temporary datasets that were being created during data processing in a dataset creation program weren’t truncating values or differed from the variable lengths in the original. It was then that a brainwave struck me: create an empty dataset shell using data step and use that set all the variable lengths for me when the new datasets were concatenated to it. The code turned out to be very simple and here is an example of how it looked:

data shell;
stop;
set example;
run;

The STOP statement, prevents the data step from reading in any of the values in the template dataset and just its header is written out to another (empty) dataset that can be used to set things up as you would want them to be. It certainly was a quick solution in my case.

On Making PROC REPORT Work Harder

In the early years of my SAS programming career, there seemed to be just the one procedure to use if you wanted to create a summary table. That was TABULATE and it was great for generating columns according to the value of a variable such as the treatment received by a subject in a clinical study. To a point, it could generate statistics for you too and I often used it to sum frequency and percentage variables. Since then, it seems to have been enhanced a little and it surprised me with the statistics it could produce when I had a recent play. Here’s the code:

proc tabulate data=sashelp.class;
class sex;
var age;
table age*(n median*f=8. mean*f=8.1 std*f=8.1 min*f=8. max*f=8. lclm*f=8.1 uclm*f=8.1),sex / misstext=”0″;
run;

When you compare that with the idea of creating one variable per column and then defining them in PROC REPORT as many do, it has to look more elegant and the results aren’t bad either though they can be tweaked further from the quick example that I generated. That last comment brings me to the point that PROC REPORT seems to have taken over from TABULATE wherever I care to look these days and I do ask myself if it is the right tool for that for which it is being used or if it is being used in the best way.

Using Data Step to create one variable per column in a PROC REPORT output doesn’t strike me as the best way to write reusable code but there are ways to make REPORT do more for you. For example, by defining GROUP, ACROSS and ANALYSIS columns in an output, you can persuade the procedure to do the summarising for you and there’s some example code below with the comma nesting height under sex in the resulting table. Sums are created by default if you do this and forgoing an analysis column definition means that you get a frequency table, not at all a useless thing in many cases.

proc report data=sashelp.class nowd missing;
columns age sex,height;
define age / group “Age”;
define sex / across “Sex”;
define height / analysis mean f=missing. “Mean Height”;
run;

For those times when you need to create more heavily formatted statistics (summarising range as min-max rather showing min and max separately, for example), you might feel that the GROUP/ACROSS set-up’s non-display of character values puts a stop to using that approach. However, I found that making every value combination unique and attaching a cell ID helps to work around the problem. Then, you can create a format control data set from the data like in the code below and create a format from that which you can apply to the cell ID’s to display things as you need them. This method does make things more portable from situation to situation than adding or removing columns depending on the values of a classification variable.

proc sql noprint;
create table cntlin as
select distinct  “fmtname” as fmtname, cellid as start, cellid as end, decode as label
from report;
quit;

proc format lib=work cntlin=cnlin;
run;

  • As is commonly the case with places like these, all the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. With regards to any comments left on the site, I reserve the right to reject any that are inappropriate. Otherwise, whatever is said is the sole responsibility of whoever is leaving the comment.