Technology Tales

Adventures & experiences in contemporary technology

Creating a Data Set Containing Confidence Intervals Using PROC UNIVARIATE

5th September 2010

While you could generate data sets containing means and confidence intervals using PROC SUMMARY or PROC MEANS, curiosity and the need to verify a program using a different technique were what drove me to consider using PROC UNIVARIATE for the task. For the record, the PROC SUMMARY code is below and the only difference between it and MEANS is that it doesn’t produce output by default, something that’s not needed in this case anyway. Quite why there are two SAS procedures doing exactly the same thing is beyond me though I do wonder if the NOPRINT options was a later addition than these two procedures. The LCLM and UCLM keywords are what triggers the calculation of confidence limits and the ALPHA option controls the confidence interval used; 0.05 specifies a 95% interval, 0.1 a 90% one and so on.

proc summary data=sashelp.class mean lclm uclm alpha=0.05;
var age;
output out=sasuser.lims mean=mean lclm=lclm uclm=uclm;
run;

Given that I have had PROC UNIVARIATE producing statistics that MEANS/SUMMARY didn’t in previous versions of SAS (I believe that is was standard deviation that was absent from MEANS/SUMMARY), I might have expected the calculation and export of confidence limits to a data set to be straightforward. Sadly, it’s not a case of simply adding LCLM and UCLM keywords in the OUTPUT statement for the procedure and ODS OUTPUT is needed to create the data set instead. An ODS SELECT statement is needed to pick out the BasicIntervals output object (UNIVARIATE creates quite a few, it seems) that is created through specification of the CIBASIC and ALPHA (performs the same role as it does for PROC MEANS/SUMMARY) options on the PROC UNIVARIATE statement. The reason for the ODS LISTING and ODS RTF statements below is to stop output being sent to the output window in a standard SAS session. For some reason, it appears that you need the sending of output to one of the LISTING, HTML or RTF destinations or there will be no data in the data set; I met up with the same behaviour when using ODS PS, an ODS PRINTER destination. The data set will contain statistics for mean, standard deviation and variance so that’s why there is a WHERE clause on the ODS OUTPUT statement.

ods listing close;
ods rtf body="c:\temp\uni_eg.doc";
ods select BasicIntervals;
ods output BasicIntervals=sasuser.stats(where=(lowcase(parameter)="mean") );

proc univariate cibasic alpha=0.05 data=sashelp.class;
var age;
run;

ods output close;
ods rtf close;
ods listing;

Using ODS Graphics to Create Plots Using PROC LIFETEST

3rd September 2010

One of the nice things about SAS 9.2 is that creation of statistical graphics is enhanced using ODS. One of the beneficiaries of this is PROC LIFETEST, a procedure that gained a lot when data sets could be created from it using ODS OUTPUT  statements. Before that, it was a matter of creating text output and converting it to a SAS data set using Data Step and that was a nuisance on a system that attached special significance to output destinations set up using PROC PRINTTO. What you’ll find below is a sample of the type of code for creating a Kaplan-Meier survival plot for time to adverse events resulting in discontinuation of study treatment with actual and censored times. The IMAGENAME parameter on the ODS GRAPHICS statement line controls the name of the file and it is possible to change the type using the IMAGEFMT parameter too.

ods graphics on / imagename=”fig5″;
proc lifetest data=km3 method=km plots=survival;
time timetoae*cens_ae(0);
run;
ods graphics off;

On Making PROC REPORT Work Harder

1st September 2010

In the early years of my SAS programming career, there seemed to be just the one procedure to use if you wanted to create a summary table. That was TABULATE and it was great for generating columns according to the value of a variable such as the treatment received by a subject in a clinical study. To a point, it could generate statistics for you too and I often used it to sum frequency and percentage variables. Since then, it seems to have been enhanced a little and it surprised me with the statistics it could produce when I had a recent play. Here’s the code:

proc tabulate data=sashelp.class;
class sex;
var age;
table age*(n median*f=8. mean*f=8.1 std*f=8.1 min*f=8. max*f=8. lclm*f=8.1 uclm*f=8.1),sex / misstext="0";
run;

When you compare that with the idea of creating one variable per column and then defining them in PROC REPORT as many do, it has to look more elegant and the results aren’t bad either though they can be tweaked further from the quick example that I generated. That last comment brings me to the point that PROC REPORT seems to have taken over from TABULATE wherever I care to look these days and I do ask myself if it is the right tool for that for which it is being used or if it is being used in the best way.

Using Data Step to create one variable per column in a PROC REPORT output doesn’t strike me as the best way to write reusable code but there are ways to make REPORT do more for you. For example, by defining GROUP, ACROSS and ANALYSIS columns in an output, you can persuade the procedure to do the summarising for you and there’s some example code below with the comma nesting height under sex in the resulting table. Sums are created by default if you do this and forgoing an analysis column definition means that you get a frequency table, not at all a useless thing in many cases.

proc report data=sashelp.class nowd missing;
columns age sex,height;
define age / group "Age";
define sex / across "Sex";
define height / analysis mean f=missing. "Mean Height";
run;

For those times when you need to create more heavily formatted statistics (summarising range as min-max rather showing min and max separately, for example), you might feel that the GROUP/ACROSS set-up’s non-display of character values puts a stop to using that approach. However, I found that making every value combination unique and attaching a cell ID helps to work around the problem. Then, you can create a format control data set from the data like in the code below and create a format from that which you can apply to the cell ID’s to display things as you need them. This method does make things more portable from situation to situation than adding or removing columns depending on the values of a classification variable.

proc sql noprint;
create table cntlin as
select distinct "fmtname" as fmtname, cellid as start, cellid as end, decode as label
from report;
quit;

proc format lib=work cntlin=cnlin;
run;

A look at Emacs

10th August 2010

It’s amazing what work can bring your way in terms of technology. For me, (GNU) Emacs Has proved to be such a thing recently. It may have been around since 1975, long before my adventures in computing ever started, in fact, but I am asking myself why I never really have used it much. There are vague recollections of my being aware of its existence in the early days of my using UNIX over a decade ago. Was it a shortcut card with loads of seemingly esoteric keyboard shortcuts and commands that put me off it back then? The truth may have been that I got bedazzled with the world of Microsoft Windows instead, and so began a distraction that lingered until very recently. As unlikely as it looks now, Word and Office would have been part of the allure of what some consider as the dark side these days. O how OpenOffice.org and their ilk have changed that state of affairs…

The unfortunate part of the Emacs story might be that its innovations were never taken up as conventions by mainstream computing. If its counterparts elsewhere used the same keyboard shortcuts, it would feel like learning such an unfamiliar tool. Still, it’s not as if there isn’t logic behind it because it will work both in a terminal session (where I may have met it for the first time) and a desktop application GUI. The latter is the easier to learn, and the menus list equivalent keyboard shortcuts for many of their entries, too. For a fuller experience though, I can recommend the online manual, and you can buy it in paper form too if you prefer.

One thing that I discovered recently is that external factors can sour the impressions of a piece of software. For instance, I was using a UNIX session where the keyboard mapping weren’t optimal. There’s nothing like unfamiliar behaviour for throwing you off track because you felt your usual habits were being obstructed. For instance, finding that a Backspace key is behaving like a Delete one is such an obstruction. It wasn’t the fault of Emacs, and I have found that using Ctrl+K (C-k in the documentation) to delete whole lines is invaluable.

Apart from keyboard mapping niggles, Emacs has to be respected as a powerful piece of software in its own right. It may not have the syntax highlighting capabilities of some, like gedit or NEdit for instance, but I have a hunch that a spot of Lisp programming would address that need. What you get instead is support for version control systems like RCS or CVS, along with integration with GDB for debugging programs written in a number of languages. Then, there are features like file management, email handling, newsgroup browsing, a calendar and calculator that make you wonder if they tried to turn a text editor into something like an operating system. With Google trying to use Chrome as the basis of one, it almost feels as is Emacs was ahead of its time, though that may have been more due to its needing to work within a UNIX shell in those far-off pre-GUI days. It really is saying something that it has stood the test of time when so much has fallen by the wayside. Like Vi, it looks as if the esteemable piece of software is showing no signs of going away just yet. Maybe it was well-designed in the beginning, and the thing certainly seems more than a text editor with its extras. Well, it has to offer a good reason for making its way into Linux too…

ERROR: Invalid value for width specified – width out of range

8th June 2010

This could be the beginning of a series on error messages from PROC SQL that may appear unclear to a programmer more familiar with Data Step. The cause of my getting the message that heads this posting is that there was a numeric variable with a length less that the default of 8, not the best of situations. Sadly, the message doesn’t pin point the affected variable so it took some commenting out of pieces of code before I found the cause of the problem. That’s never to say that PROC SQL does not have debugging functionality in the form of FEEDBACK, NOEXEC, _METHOD and _TREE options on the PROC SQL line itself or the validation statement but neither of these seemed to help in this instance. Still, they’re worth keeping in mind for the future as is SAS Institute’s own page on SQL query debugging. Of course, now that I know what might be the cause, a simple PROC SQL report using the dictionary tables should help. The following code should do the needful:

proc sql;
select memname, name, type, length
from dictionary.columns
where libname="DATA" and type="num" and length ne 8;
quit;

Reading data into SAS using the EXCEL and PCFILES library engines

4th March 2010

Recently, I had the opportunity to have a look at the Excel library engine again because I need to read Excel data into SAS. You need SAS Access for PC Files licensed for it to but it does simplify the process of getting data from spreadsheets into SAS. It all revolves around setting up a library pointing at the Excel file using the Excel engine. The result is that every worksheet in the file is treated like a SAS dataset even if there names contain characters that SAS considers invalid for dataset names. The way around that is to enclose the worksheet name in single quotes with the letter n straight after the closing quote, much in the same way as you’d read in text strings as SAS date values (’04MAR2010’d, for example). In order to make all of this more, I have added some example code below.

libname testxl excel 'c:\test.xls';

data test;
set testxl.'sheet1$'n;
run;

All of the above does apply to SAS on Windows (I have used it successfully in 9.1.3 and 9.2) but there seems to be a way of using the same type of thing on UNIX too. Again, SAS Access for PC Files is needed as well as a SAS PC Files server on an available Windows machine and it is the PCFILES engine that is specified. While I cannot say that I have had the chance to see it working in practice but seeing it described in SAS Online Documentation corrected my previous misimpressions about the UNIX variant of SAS and its ability to read in Excel or Access data. Well, you learn something new every day.

%sysfunc and missing spaces

10th June 2009

Recently, I was trying something like this and noted some odd behaviour:

data _null_;
file fileref;
put "text %sysfunc(pathname(work)) more text";
run;

This is the kind of thing that I was getting:

text c:\sasworkmore text

In other words, the space after %sysfunc was being ignored and, since I was creating and executing a Windows batch file using SAS 8.2, the command line action wasn’t doing what was expected. The fix was simple but I reckoned that I’d share what I saw anyway, in case it helped anyone else:

data _null_;
file fileref;
x="text %sysfunc(pathname(work))"||" more text";
put x;
run;

AND & OR, a cautionary tale

27th March 2009

The inspiration for this post is a situation where having the string “OR” or “AND” as an input to a piece of SAS Macro code breaking a program that I had written. Here is a simplified example of what I was doing:

%macro test;
%let doms=GE GT NE LT LE AND OR;
%let lv_count=1;
%do %while (%scan(&doms,&lv_count,' ') ne );
%put &lv_count;
%let lv_count=%eval(&lv_count+1);
%end
%mend test;

%test;

The loop proceeds well until the string “AND” is met and “OR” has the same effect. The result is the following message appears in the log:

ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: %scan(&doms,&lv_count,' ') ne
ERROR: The condition in the %DO %WHILE loop, , yielded an invalid or missing value, . The macro will stop executing.
ERROR: The macro TEST will stop executing.

Both AND & OR (case doesn’t matter but I am sticking with upper case for sake of clarity) seem to be reserved words in a macro DO WHILE loop while equality mnemonics like GE cause no problem. Perhaps, the fact that and equality operator is already in the expression helps. Regardless, the fix is a simple one:

%macro test;
%let doms=GE GT NE LT LE AND OR;
%let lv_count=1;
%do %while ("%scan(&doms,&lv_count,' ')" ne "");
%put &lv_count;
%let lv_count=%eval(&lv_count+1);
%end
%mend test;

%test;

Now none of the strings extracted from the macro variable &DOMS will appear as bare words and confuse the SAS Macro processor but you do have to make sure that you are testing for the null string (“” or ”) or you’ll send your program into an infinite loop, always a potential problem with DO WHILE loops so they need to be used with care. All in all, an odd looking message gets an easy solution without recourse to macro quoting functions like %NRSTR or %SUPERQ.

Working with the ODS templates and styles when batch processing

8th December 2008

I ran into some trouble with creating new templates and styles while running a SAS job in batch mode. The cause was the user of the RSASUSER switch on the SAS command. This sets the SASUSER library to be read-only and that is what is used to store new templates and styles by default. The fix is to switch these to another library to which there is write access, WORK, for example. Here’s the line of code that achieves the manoeuvre:

ODS PATH work.templat(update) sasuser.templat(read) sashelp.tmplmst(read);

Apparently, the same change might be needed for stored processes too so it’s one to keep in mind.

SAS Macro and Dataline/Cards Statements in Data Step

28th October 2008

Recently, I tried code like this in a SAS macro:

data sections;
infile datalines dlm=",";
input graph_table_number $15. text_line @1 @;
datalines;
"11.1           ,Section 11.1",
"11.2           ,Section 11.2",
"11.3           ,Section 11.3"
;
run;

While it works in its own right, including it as part of a macro yielded this type of result:

ERROR: The macro X generated CARDS (data lines) for the DATA step, which could cause incorrect results.  The DATA step and the macro will stop executing.

A bit of googling landed me on SAS-L where I spotted a solution like this one that didn’t involving throwing everything out:

filename temp temp;

data _null_;
file temp;
put;
run;

data sections;
length graph_table_number $15 text_line $100;
infile temp dlm=",";
input @;
do _infile_=
"11.1           ,Section 11.1",
"11.2           ,Section 11.2",
"11.3           ,Section 11.3"
;
input graph_table_number $15. text_line @1 @;
output;
end;
run;

filename temp clear;

The filename statement and ensuing data step creates a dummy file in the SAS work area that gets cleared at the end of every session. That seems to fool the macro engine into thinking that input is from a file and not the CARDS/DATALINES method to which it takes grave exception. The trailing @’s hold an input record for the execution of the next INPUT statement within the same iteration of the DATA step so that the automatic variable _infile_ can be fed as part of the input process in a do block with the output statement ensure that new records from the input buffer reach the data set being created.

This method does work but I would like to know the underlying reason as to why SAS Macro won’t play well with included data entry using DATALINES or CARDS statements in a data step, particularly when it allows other methods that using either SQL insert statements or standard variable assignment in data step. I find it such a curious behaviour that I remain on the lookout for the explanation why it is like this.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.