Technology Tales

Adventures & experiences in contemporary technology

Creating a data-driven informat in SAS

27th September 2019

Recently, I needed to create some example data with an extra numeric identifier variable that would be assigned according to the value of a character identifier variable. Not wanting to add another dataset merge or join to the code, I decided to create an informat from data. Initially, I looked into creating a format instead but it not accomplish what I wanted to do.

data patient;
keep fmtname start end label type;
set test.dm;
by subject;
fmtname="PATIENT";
start=subject;
end=start;
label=patient;
type="I";
run;

The input data needed a little processing as shown above. The format name was added in the variable FMTNAME and TYPE variable was assigned a value of I to make this a numeric informat; to make character equivalent, a value of J is assigned. The START and END variables declare the value range associated with the value of LABEL that would become the actual value of the numeric identifier variable. The variable names are fixed because the next step will not work with different ones.

proc format lib=work cntlin=patient;
run;
quit;

To create the actual informat, the dataset is read by a FORMAT procedure with the CNTLIN parameter specifying the name of the input dataset and LIB defining the library where the format catalog is stored. When this in complete, the informat is available for use with an input function as shown in the code excerpt below.

data ae1;
set ae;
patient=input(subject,patient.);
run;

Compressing an entire dataset library using SAS

26th September 2019

Turning dataset compression for SAS datasets can produce quite a reduction in size so it often is standard practice to do just this. It can be set globally with a single command and many working systems do this for you:

options compress=yes;

It also can be done on a dataset by dataset option by adding (compress=yes) beside the dataset name on the data line of a data step. The size reduction can be gained retrospectively too but only if you create additional copies of the datasets. The following code uses the DATASETS procedure to accomplish this end at the time of copy the datasets from one library to another:

proc datasets lib=adam nolist;
copy inlib=adam outlib=adamc noclone datecopy memtype=data;
run;
quit;

The NOCLONE option on the COPY statement allows the compression status to be changed while the DATECOPY one keeps the same date/time stamp on the new file at the operating system level. Lastly, the MEMTYPE=DATA setting ensures that only datasets are processed while the NOLIST option on the PROC DATASETS line suppresses listing of dataset information.

It may be possible to do something like this with PROC COPY too but I know from use that the option presented here works as expected. It also does not involve much coding effort, which helps a lot.

Using NOT IN operator type functionality in SAS Macro

9th November 2018

For as long as I have been programming with SAS, there has been the ability to test if a variable does or does not have one value from a list of values in data step IF clauses or WHERE clauses in both data step and most if not all procedures. It was only within the last decade that its Macro language got similar functionality with one caveat that I recently uncovered: you cannot have a NOT IN construct. To get that, you need to go about things in a different way.

In the example below, you see the NOT operator being placed before the IN operator component that is enclosed in parentheses. If this is not done, SAS produces the error messages that caused me to look at SAS Usage Note 31322. Once I followed that approach, I was able to do what I wanted without resorting to older more long-winded coding practices.

options minoperator;

%macro inop(x);

%if not (&x in (a b c)) %then %do;
%put Value is not included;
%end;
%else %do;
%put Value is included;
%end;

%mend inop;

%inop(a);

Running the above code should produce a similar result to another featured on here in another post but the logic is reversed. There are times when such an approach is needed. One is where a small number of possibilities is to be excluded from a larger number of possibilities. Programming often involves more inventive thinking and this may be one of those.

WARNING: No bars were drawn. This could have been caused by ORDER= on the AXIS statement. You might wish to use the MIDPOINTS= option on the VBAR statement instead.

25th September 2015

What you see above is a an error issued by a SAS program like what a colleague at work recently found. The following code will reproduce this so let us walk through the steps to explain a possible cause for this.

The first stage is to create a test dataset containing variables y and x, for the vertical and midpoint axes, respectively, and populating these using a CARDS statement in a data step:

data a;
input y x;
cards;
1 5
3 9
;
run;

Now, we define an axis with tick marks for particular values that will be used as the definition for the midpoint or horizontal axis of the chart:

axis1 order=(1 3);

Then, we try creating the chart using the GCHART procedure that comes with SAS/GRAPH and this is what results in the error message being issued in the program log:

proc gchart data=a;
vbar x / freq=y maxis=axis1;
run;
quit;

The cause is that the midpoint axis tick marks are no included in the data so changing these to the actual values of the x variable removes the message and allows the creation of the required chart. Thus, the AXIS1 statement needs to become the following:

axis1 order=(5 9);

Another solution is to remove the MAXIS option from the VBAR statement and let GCHART be data driven. However, if requirements do not allow this, create a shell dataset with all expected values for the midpoint axis with y set 0 since that is used for presenting frequencies as per the FREQ option in the VBAR statement.

Changing the working directory in a SAS session

12th August 2014

It appears that PROC SGPLOT and other statistical graphics procedures create image files even if you are creating RTF or PDF files. By default, these are PNG files but there are other possibilities. When working with PC SAS , I have seen them written to the current working directory and that could clutter up your folder structure, especially if they are unwanted.

Being unable to track down a setting that controls this behaviour, I resolved to find a way around it by sending the files to the SAS work directory so they are removed when a SAS session is ended. One option is to set the session’s working directory to be the SAS work one and that can be done in SAS code without needing to use the user interface. As a result, you get some automation.

The method is implicit though in that you need to use an X statement to tell the operating system to change folder for you. Here is the line of code that I have used:

x "cd %sysfunc(pathname(work))";

The X statement passes commands to an operating system’s command line and they are enclosed in quotes. %sysfunc then is a macro command that allows certain data step functions or call routines as well as some SCL functions to be executed. An example of the latter is pathname and this resolves library or file references and it is interrogating the location of the SAS work library here so it can be passed to the operating systems cd (change directory) command for processing. This method works on Windows and UNIX so Linux should be covered too, offering a certain amount of automation since you don’t have to specify the location of the SAS work library in every session due to the folder name changing all the while.

Of course, if someone were to tell me of another way to declare the location of the generated PNG files that works with RTF and PDF ODS destinations, then I would be all ears. Even direct output without image file creation would be even better. Until then though, the above will do nicely.

Some SAS Macro code for detecting the presence or absence of a variable in a dataset

4th December 2013

Recently, I needed to put in place some code to detect the presence or absence of a variable in a dataset and I chose SAS Macro programming as the way to do what I wanted. The logic was based on a SAS sample that achieved the same result in a data step and some code that I had for detecting the presence or absence of a dataset. Mixing the two together gave me something like the following:

%macro testvar(ds=,var=);

%let dsid=%sysfunc(open(&ds,in));
%let varexist=%sysfunc(varnum(&dsid,&var));
%if &dsid > 0 %then %let rc=%sysfunc(close(&dsid));

%if &varexist gt 0 %then %put Info: Variable &var is in the &ds dataset;
%else %put Info: Variable &var is not in the &ds dataset;

%mend testvar;

%testvar(ds=dataset,var=var);

What this does is open up a dataset and look for the variable number in the dataset. In datasets, variables are numbered from left to right with 1 for the first one, 2 for the second and so on. If the variable is not in the dataset, the result is 0 so you know that it is not there. All of this is what the VARNUM SCL function within the SYSFUNC macro function does. In the example, this resolves to %sysfunc(varnum(&dsid,var)) with no quotes around the variable name like you would do in data step programming. Once you have the variable number or 0, then you can put in place some conditional logic that makes use of the information like what you see in the above simple example. Of course, that would be expanded to something more useful in real life but I hope it helps to show you the possibilities here.

Presenting more than one plot on a page using SAS ODS PDF

12th November 2012

If you had asked me about getting two or more graphs on a page using SAS/GRAPH procedures, I might have suggested PROC GREPLAY as the means to achieve it. However, I recently came across another way to do the same thing by using ODS. It helped that the graphs were being produced using the PDF destination because I don’t think that what follows will work with the RTF one.

For this three plots on a page example, I first set the orientation to landscape so that the plots can be arranged side by side in a single row:

options orientation=landscape;

Next, the PDF destination was opened with page breaks turned off for the required output file using the STARTPAGE option:

ods pdf file="c:\test.pdf" startpage=off;

The listing destination was turned off as well since it is not needed:

ods listing close;

With that complete, a page or region break gets inserted. This could have been repeated before every procedure to get it popped into the next region on the page but that is the default behaviour for any extra procedural step so it wasn’t needed.

ods pdf startpage=now;

Then, the ODS LAYOUT feature is started so that the layout can be defined on the page:

ods layout start;

For the first plot and the one at the left of the triptych, a region was defined absolutely (grid layouts are available but I didn’t make use of them here) using ODS REGION. Since all plots were to be of the same size, the width was defined as being a third of the page and the bottom left hand corner of the region defined to be the same as that of the plot area on the page. Titles and footnotes usefully lie outside this region in the way that SAS arranges things so there is no further messing. With the region define, it’s a matter of running the required SAS/GRAPH procedure. In my case, this was GPLOT but I am certain that others would work as well. The height was defined as the full possible plot height. This could have a use if I wanted more than one row of graphs on a page with a trellis plot being an example of that sort of arrangement.

ods region x=0pct y=0pct width=33pct height=100pct;

<< SAS/GRAPH Procedure >>

For the middle plot, the starting position is moved a third of the way along the page while the section area has the same dimensions as before. Using percentages in these definitions does make their usage easier.

ods region x=33pct y=0pct width=33pct height=100pct;

<< SAS/GRAPH Procedure >>

Lastly, the right-hand plot has a starting position two-thirds of the width of the page and the other dimensions are as per the other panels:

ods region x=66pct y=0pct width=33pct height=100pct;

<< SAS/GRAPH Procedure >>

With the last graph created, it is time to close ODS LAYOUT and the PDF destination. Then, the listing destination is reopened again.

ods layout end;
ods pdf close;
ods listing;

Update 2012-12-08: Since writing the above, I have learned that ODS LAYOUT and ODS REGION have yet to become production features of SAS with 9.3 as the latest version. However, I have encountered an alternative that uses the STARTPAGE=NEVER ODS PDF option to turn off page breaks and GOPTIONS statements to control the regions occupied by plots. It’s Sample 48569 on the SAS website. Having a production equivalent is better since pre-production features are best avoided in production code. If I had realised the status, I would have used PROC GREPLAY to achieve what I needed to do.

Using the IN operator in SAS Macro programming

8th October 2012

This useful addition came in SAS 9.2 and I am amazed that it isn’t enabled by default. To accomplish that, you need to set the MINOPERATOR option unless someone has done it for you in the SAS AUTOEXEC or another configuration program. Thus, the safety first approach is to have code like the following:

options minoperator;

%macro inop(x);

%if &x in (a b c) %then %do;
%put Value is included;
%end;
%else %do;
%put Value not included;
%end;

%mend inop;

%inop(a);

Also, the default delimiter is the space, so if you need to change that, then the MINDELIMITER option needs setting. Adjusting the above code so that the delimiter now is the comma character gives us the following:

options minoperator mindelimiter=",";

%macro inop(x);

%if &x in (a,b,c) %then %do;
%put Value is included;
%end;
%else %do;
%put Value not included;
%end;

%mend inop;

%inop(a);

Without any of the above, the only approach is to have the following and that is what we had to do for SAS versions prior to 9.2:

%macro inop(x);

%if &x=a or &x=b or &x=c %then %do;
%put Value is included;
%end;
%else %do;
%put Value not included;
%end;

%mend inop;

%inop(a);

It may be clunky but it does work and remains a fallback in newer versions of SAS. Saying that, having the IN operator available makes writing SAS Macro code that little bit more swish so it’s a good thing to know.

ERROR: This range is repeated, or values overlap: – .

15th September 2012

This is another posting in an occasional series on SAS error and warning messages that aren’t as clear as they’d need to be. What produced the message was my creation of a control data set that I then wished to use to create a data-driven (in)format. It was the PROC FORMAT step that issued the message and I got no (in)format created. However, there were no duplicate entries in the control data set as the message suggested to me so a little more investigation was needed.

What that revealed was that there might be one variable missing from the data set that I needed to have. The SAS documentation has FMTNAME, START and LABEL as compulsory variables with they containing the following: format name, initial value and displayed value. My intention was to create a numeric code variable for one containing character strings using my data-driven format with then numbers specified within a character variable as it should be. What was missing then was TYPE.

This variable can be one of the following values: C for character formats, I for numeric informats, J for character informats, N for numeric formats and P for picture formats. Due to it being a conversion from character values to numeric ones, I set the values of TYPE to I and used an input function to do the required operations. The code for successfully creating the informat is below:

proc sql noprint;
create table tpts as
select distinct "_vstpt" as fmtname,
"I" as type,
vstpt as start,
vstpt as end,
strip(put(vstptnum,best.)) as label
from test
where not missing(vstptnum);
quit;

proc format library=work cntlin=tpts;
run;
quit;

Though I didn’t need to do it, I added an END variable too for sake of completeness. In this case, the range is such that its start and end are the same and there are cases where that will not be the case though I am not dwelling on those.

WARNING: Engine XPORT does not support SORTEDBY operations. SORTEDBY information cannot be copied.

24th July 2012

When recently creating a transport file using PROC COPY and the XPORT library engine, I found the above message in the log. The code used was similar to the following:

libname tran xport "c:\temp\tran.xpt";

proc copy in=data out=tran;
run;

When I went seeking out the cause on the web, I discovered this SAS Note that dates from before the release of SAS 6.12, putting the issue at more than ten years old. My take on its continuing existence is that we still to use a transport file format that was introduced in SAS 5.x for sake of interoperability, both between SAS versions and across alternatives to the platform.

The SORTEDBY flag in a dataset header holds the keys used to sort the data and it isn’t being copied into the XPORT transport files, hence the warning. To get rid of it, you need to remove the information manually in data step using the SORTEDBY option on the DATA statement or using PROC DATASETS, which avoids rewriting the entire data set.

First up is the data step option:

data test(sortedby=_null_);
set sashelp.class;
run;

Then, there’s PROC DATASETS:

proc datasets;
modify test(sortedby=_null_);
run;
quit;

It might seem counterproductive to exclude the information but it makes no sense to keep what’s being lost anyway. So long as the actual sort order is unchanged, and I believe that the code that that below will not alter it, we can live with its documentation in a specification until transport files created using PROC CPORT are as portable as those from PROC COPY.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.