TOPIC: SAS
Expanding the coding toolkit: Adding R and Python in a changing landscape
10th April 2021Over the years, I have taught myself a number of computing languages, with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me add two more to the repertoire: R and Python.
My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data, but ongoing changes in the life sciences sector could mean that I may need to look further afield, so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was, which causes me to rethink things.
As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially regarding the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately, even when emotions loom behind the scenes.
Since both R and Python bring large scripting ecosystems with active communities, the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem, though there are considerable differences and the platform is catching up when it comes to Data Science. While the aforementioned open-source languages may have had a head start, it appears that others are not standing still either. It is a time to have wider awareness, and online conference attendance helps with that.
The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point, and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.
The learning has been facilitated by ongoing pandemic restrictions, though things are opening up a little now. The pandemic also has given us public data that can be used for practice, since much can be gained from having one's own project instead of completing exercises from a book. Having an interesting data set with which to work is a must, and COVID-19 data contain a certain self-interest as well, while one remains mindful of the suffering and loss of life that has been happening since the pandemic first took hold.
Generating PNG files in SAS using ODS Graphics
21st December 2019Recently, I had someone ask me how to create PNG files in SAS using ODS Graphics, so I sought out the answer for them. Normally, the suggestion would have been to create RTF or PDF files instead, but there was a specific need that needed a different approach. Adding something like the following lines before an SGPLOT, SGPANEL or SGRENDER procedure should do the needful:
ods listing gpath='E:\';
ods graphics / imagename="test" imagefmt=png;
Here, the ODS LISTING statement declares the destination for the desired graphics file, while the ODS GRAPHICS statement defines the file name and type. In the above example, the file test.png would be created in the root of the E drive of a Windows machine. However, this also works with Linux or UNIX directory paths.
Using NOT IN operator type functionality in SAS Macro
9th November 2018For as long as I have been programming with SAS, there has been the ability to test if a variable does or does not have one value from a list of values in data step IF clauses or WHERE clauses in both data step and most if not all procedures. It was only within the last decade that its Macro language got similar functionality, with one caveat that I recently uncovered: you cannot have a NOT IN construct. To get that, you need to go about things differently.
In the example below, you see the NOT operator being placed before the IN operator component that is enclosed in parentheses. If this is not done, SAS produces the error messages that caused me to look at SAS Usage Note 31322. Once I followed that approach, I was able to do what I wanted without resorting to older, more long-winded coding practices.
options minoperator;
%macro inop(x);
%if not (&x in (a b c)) %then %do;
%put Value is not included;
%end;
%else %do;
%put Value is included;
%end;
%mend inop;
%inop(a);
While running the above code should produce a similar result to another featured on here in another post, the logic is reversed. There are times when such an approach is needed. One is where a few possibilities are to be excluded from a larger number of possibilities. Since programming often involves more inventive thinking, this may be one of those.
Preventing PROC SGPLOT PNG file clutter by changing the working directory in a SAS session
12th August 2014It appears that PROC SGPLOT along with other statistical graphics procedures creates image files, even if you are creating RTF or PDF files. By default, these are PNG files, but there are other possibilities. When working with PC SAS, I have seen them written to the current working directory and that could clutter up your folder structure, especially if they are unwanted.
Being unable to track down a setting that controls this behaviour, I resolved to find a way around it by sending the files to the SAS work directory so they are removed when a SAS session is ended. One option is to set the session's working directory to be the SAS work one, which can be done in SAS code without needing to use the user interface. As a result, you get some automation.
The method is implicit, though, in that you need to use an X statement to tell the operating system to change the folder for you. Here is the line of code that I have used:
x "cd %sysfunc(pathname(work))";
The X statement passes commands to an operating system's command line, and they are enclosed in quotes. %sysfunc then is a macro command that allows certain data step functions or call routines as well as some SCL functions to be executed. An example of the latter is pathname and this resolves library or file references, and it is interrogating the location of the SAS work library here so it can be passed to the operating systems cd (change directory) command for processing. Since this method works on Windows and UNIX, Linux should be covered too, offering a certain amount of automation since you don't have to specify the location of the SAS work library in every session due to the folder name changing all the while.
Of course, if someone were to tell me of another way to declare the location of the generated PNG files that works with RTF and PDF ODS destinations, then I would be all ears. Even direct output without image file creation would be even better. Until then, though, the above will do nicely.
Presenting more than one plot on a page using SAS ODS PDF
12th November 2012If you had asked me about getting two or more graphs on a page using SAS/GRAPH procedures, I might have suggested PROC GREPLAY as the means to achieve it. However, I recently came across another way to do the same thing by using ODS. It helped that the graphs were being produced using the PDF destination because I doubt that what follows will work with the RTF one.
For this three plots on a page example, I first set the orientation to landscape so that the plots can be arranged side by side in a single row:
options orientation=landscape;
Next, the PDF destination was opened with page breaks turned off for the required output file using the STARTPAGE option:
ods pdf file="c:\test.pdf" startpage=off;
The listing destination was turned off as well since it is not needed:
ods listing close;
With that complete, a page or region break gets inserted. While this could have been repeated before every procedure to get it popped into the next region on the page, that is the default behaviour for any extra procedural step, so it wasn't needed.
ods pdf startpage=now;
Then, the ODS LAYOUT feature is started so that the layout can be defined on the page:
ods layout start;
For the first plot and the one at the left of the triptych, a region was defined absolutely (grid layouts are available, but I didn't make use of them here) using ODS REGION. Since all plots were to be of the same size, the width was defined as being a third of the page and the bottom left-hand corner of the region defined to be the same as that of the plot area on the page. Titles and footnotes usefully lie outside this region in the way that SAS arranges things, so there is no further messing. With the region defined, it's a matter of running the required SAS/GRAPH procedure. In my case, this was GPLOT, but I am certain that others would work as well. The height was defined as the full possible plot height. This could have a use if I wanted more than one row of graphs on a page, with a trellis plot being an example of that sort of arrangement.
ods region x=0pct y=0pct width=33pct height=100pct;
<< SAS/GRAPH Procedure >>
For the middle plot, the starting position is moved a third of the way along the page, while the section area has the same dimensions as before. Using percentages in these definitions does make their usage easier.
ods region x=33pct y=0pct width=33pct height=100pct;
<< SAS/GRAPH Procedure >>
Lastly, the right-hand plot has a starting position two-thirds of the width of the page and the other dimensions are as per the other panels:
ods region x=66pct y=0pct width=33pct height=100pct;
<< SAS/GRAPH Procedure >>
With the last graph created, it is time to close ODS LAYOUT and the PDF destination. Then, the listing destination is reopened again.
ods layout end;
ods pdf close;
ods listing;
Update 2012-12-08: Since writing the above, I have learned that ODS LAYOUT and ODS REGION have yet to become production features of SAS with 9.3 as the latest version. However, I have encountered an alternative that uses the STARTPAGE=NEVER ODS PDF option to turn off page breaks and GOPTIONS statements to control the regions occupied by plots. It's Sample 48569 on the SAS website. Having a production equivalent is better, since pre-production features are best avoided in production code. If I had realised the status, I would have used PROC GREPLAY to achieve what I needed to do.
Using the IN operator in SAS Macro programming
8th October 2012This useful addition came in SAS 9.2, and I am amazed that it isn’t enabled by default. To accomplish that, you need to set the MINOPERATOR option, unless someone has done it for you in the SAS AUTOEXEC or another configuration program. Thus, the safety first approach is to have code like the following:
options minoperator;
%macro inop(x);
%if &x in (a b c) %then %do;
%put Value is included;
%end;
%else %do;
%put Value not included;
%end;
%mend inop;
%inop(a);
Also, the default delimiter is the space, so if you need to change that, then the MINDELIMITER option needs setting. Adjusting the above code so that the delimiter now is the comma character gives us the following:
options minoperator mindelimiter=",";
%macro inop(x);
%if &x in (a b c) %then %do;
%put Value is included;
%end;
%else %do;
%put Value not included;
%end;
%mend inop;
%inop(a);
Without any of the above, the only approach is to have the following, and that is what we had to do for SAS versions before 9.2:
%macro inop(x);
%if &x=a or &x=b or &x=c %then %do;
%put Value is included;
%end;
%else %do;
%put Value not included;
%end;
%mend inop;
%inop(a);
While it may be clunky, it does work and remains a fallback in newer versions of SAS. Saying that, having the IN operator available makes writing SAS Macro code that little bit more swish, so it's a good thing to know.
WARNING: Engine XPORT does not support SORTEDBY operations. SORTEDBY information cannot be copied.
24th July 2012When recently creating a transport file using PROC COPY and the XPORT library engine, I found the above message in the log. The code used was similar to the following:
libname tran xport "c:\temp\tran.xpt";
proc copy in=data out=tran;
run;
When I went seeking out the cause on the web, I discovered this SAS Note that dates from before the release of SAS 6.12, putting the issue at more than ten years old. My take on its continuing existence is that we still to use a transport file format that was introduced in SAS 5.x for the sake of interoperability, both between SAS versions and across alternatives to the platform.
The SORTEDBY flag in a dataset header holds the keys used to sort the data, and it isn't being copied into the XPORT transport files, hence the warning. To get rid of it, you need to remove the information manually in the data step using the SORTEDBY option on the DATA statement or using PROC DATASETS, which avoids rewriting the entire data set.
First up is the data step option:
data test(sortedby=_null_);
set sashelp.class;
run;
Then, there's PROC DATASETS:
proc datasets;
modify test(sortedby=_null_);
run;
quit;
It might seem counterproductive to exclude the information, but it makes no sense to keep what's being lost anyway. So long as the actual sort order is unchanged, and I believe that the code that that below will not alter it, we can live with its documentation in a specification until transport files created using PROC CPORT are as portable as those from PROC COPY.
ERROR: Ambiguous reference, column xx is in more than one table.
5th May 2012Sometimes, SAS messages are not all that they seem, and a number of them are issued from PROC SQL when something goes awry with your code. In fact, I got a message like the above when ordering the results of the join using a variable that didn't exist in either of the datasets that were joined. This type of thing has been around for a while (I have been using SAS since version 6.11, and it was there then) and it amazes me that we haven't seen a better message in more recent versions of SAS; it was SAS 9.2 where I saw it most recently.
proc sql noprint;
select a.yy, a.yyy, b.zz
from a left join b
on a.yy=b.yy
order by xx;
quit;
Creating placeholder graphics in SAS using PROC GSLIDE for when no data are available
18th March 2012Recently, I found myself with a plot to produce, but there were no data to be presented, so a placeholder output was needed. For a listing or a table, this is a matter of detecting if there are observations to be listed or summarised and then issuing a placeholder listing using PROC REPORT if there are no data available. Using SAS/GRAPH, something similar can be achieved using one of its curiosities.
In the case of SAS/GRAPH, PROC GSLIDE looks like the tool to user for the same purpose. The procedure does get covered as part of a SAS Institute SAS/GRAPH training course, but they tend to gloss over it. After all, there is little reason to go creating presentations in SAS when PowerPoint and its kind offer far more functionality. However, it would make an interesting tale to tell how GSLIDE became part of SAS/GRAPH in the first place. Its existence makes me wonder if it pre-exists the main slideshow production tools that we use today.
The code that uses PROC GSLIDE to create a placeholder graphic is as follows (detection of the number of observations in a SAS dataset is another entry on here):
proc gslide;
note height=10;
note j=center "No data are available";
run;
quit;
PROC GSLIDE is one of those run group procedures in SAS so a QUIT statement is needed to close it. The NOTE statements specify the text to be added to the graphic. The first of these creates a blank line of the required height for placing the main text in the middle of the graphic. It is the second one that adds the centred text that tells users of the generated output what has happened.
Smoother use of more than one SAS DMS session at a time
11th March 2012Unless you have access to SAS Enterprise Guide, being able to work on one project at a time can be a little inconvenient. It is possible to open up more than one Display Manager System (DMS, the traditional SAS programming interface) session at a time only to get a pop-up window for SAS documentation for the second and subsequent sessions. You don't get your settings shared across them, either, while also losing any changes to session options after shutdown.
The cause of both of the above is the locking of the SASUSER directory files by the first SAS session. However, it is possible to set up a number of directories and set the -sasuser option to point at different ones for different sessions.
On Windows, the command in the SAS shortcut becomes:
C:\Program Files\SAS\SAS 9.1\sas.exe -sasuser "c:\sasuser\session 1\"
On UNIX or Linux, it would look similar to this:
sas -sasuser "~/sasuser/session1/"
Since the "session1" in the folder paths above can be replaced with whatever you need, you can have as many as you want too. It might not seem much of a need but synchronising the SASUSER folders every now and again can give you a more consistent set of settings across each session, all without intrusive pop up boxes or extra messages in the log too.