Technology Tales

Adventures in consumer and enterprise technology

TOPIC: SAS INSTITUTE

Transferring data between SAS and R

5th June 2008

A question regarding the ability to transfer of data between SAS and R set me off on a spot of investigation a while back, and I have always planned to share the results of my labours. Once I managed to locate the required documentation, things became clearer with further inspection. Functions from the foreign package seem to offer the most from the data import and export point of view, so they're what I'll be featuring in this posting.

Here, I am starting with importing, and using the read.ssd function makes life so much easier for getting SAS data into R. When I discovered that the foreign package may not be loaded by default, that could be determined easily using the following command:

search()

If package:foreign isn't in the list, then you need to issue the following function call:

library(foreign)

Of course, if the foreign package isn't installed, none of this will work. It should live in the library sub-folder of the main R installation directory, but if it isn't there, then downloading the relevant binary package from CRAN is in order. Assuming that all is installed, then a command like the following will perform the needful:

read.ssd("c:/data","data1",sascmd="C:/Program Files/SAS Institute/SAS/V8/sas.exe")

This creates a temporary SAS program that converts the SAS data set into a transport file for reading by another R function that is called in the background, read.xport. From my experience, it all seems to work fairly seamlessly.

To get data out of R and into SAS is a multi-stage process, even with the foreign package. While there are other ways, using the write.foreign seems more useful than most. Here is an example function call:

write.foreign(data1,"C:/test.txt","C:/test.sas",package="SAS",dataname="data1",validvarname="V7")

While no SAS data sets are created at this stage, a text file is generated along with a SAS program for converting it into a data set. Running the SAS program is a separate step that follows the creation of the two files. Even if it is less streamlined than read.ssd, write.foreign does make it easier to transfer data into SAS than having to write a program from scratch to read in write.table output.

In summary, R can neither read nor write SAS data sets by itself, so you need SAS installed to really make things happen. SAS gets called by read.ssd and I feel that it would be better if was called by write.foreign also rather than a SAS program generated for execution later on. Even so, it is good to see some custom functionality being provided that makes life easier. There's also the hmisc package, but my experiences while working with that on S-Plus have been such that it compares less favourably with foreign on the reliability front. Saying that, things may have changed since I last tried it.

New version of SAS on the way

16th January 2008

This is something of a newsflash posting, but this morning's issue of the SAS Tech Report newsletter has said at last when SAS 9.2 is expected to be released. Though SAS has been talking a bit about 9.2, dates were elusive and, to a point, they still are. Nevertheless, hearing the Q1 of this year is the time slot for the unveiling is better than knowing nothing at all. Am I alone in wondering if it is coming later than was planned?

Controlling what the wpgm command calls in Windows SAS

30th November 2007

I was setting up a key mapping in SAS 8.1 such that the log and output windows are cleared and a SAS program run in the most recently used program editor window. The idea was that debugging would be easier, and the command was what you see below:

log; clear; output; clear; wpgm; submit

I was having trouble getting SAS to pick up the most recently used Enhanced Editor window, and it was opening up an old style Program Editor window in its place. If I had wanted to use that, I would have used pgm and not wpgm. What was conspiring against me was a pesky system option. Pottering over to Tools > Options > Preferences and navigating to the Edit tab brought me to the cause of the problem: the Use Enhanced Editor check box was in the clear, and fixing that set me on my way. SAS 9 could also be afflicted by the same irritation and that is where I got the screenshot that you see below where everything is hunky-dory.

SAS 9 Edit Preferences

Append or update?

25th November 2007

SAS can generate many types of output: plain text, XML, PDF, RTF, Excel, etc. With all of these and the SAS procedures like PROC REPORT, PROC TABULATE and so on, it might seem surprising for me to say that I have been generating output with data step PUT and FILE statements. There was, of course, a reason for this: creating text files for loading into a new database-driven software application. At one stage, I also did some data interleaving at the output stage and that's when I discovered that the default behaviour for SAS FILE statements is to completely overwrite a file unless the MOD option was specified. Adding that switches on APPEND behaviour. The code below adds a header in one step, while adding data below it in another. While I know that there are better ways to achieve this, like setting up your data as you want it or using _N_ to ensure that something only appears once, here's another way. As per the Perl, there's often more than one way to do something with SAS.

data _null_;
    file ds_data;
    put "fieldtype;datasetname;datasetlabel;datasetlayout;datasetclass;datasetstandardversion";
run;
data _null_;
    set ds_ispec;
    file ds_data mod;
    line="datasetstandard;"||trim(memname)||";"||trim(memlabel)||";;;"||trim(memver);
    put line;
run;

Porting SAS files to other platforms and versions

1st October 2007

SAS uses its transport file format to port files between operating and, where the need arises, different software versions. As with many things, there is more than one method to create these transport files: PROC CPORT/CIMPORT and PROC COPY with the XPORT engine. The former method is for within version transfer of SAS files between different operating systems (UNIX to Windows, for instance) and the latter is for cross-version transfer (SAS9 to SAS 8, for example). SAS Institute has a page devoted to this subject which may share more details.

SAS Institute enters the blogosphere

19th September 2007

To get to the blogs hosted by SAS Institute, all you need to do is go here. I have to say that there is quite a spread of subject matter ranging from the high-level business strategy offerings through to detailed snippets for SAS programmers. There appears to be a lot here for anyone interested in SAS and business intelligence. I must take a longer look.

Update: I have since discovered a central listing of SAS Institute RSS feeds. The list is well worth your perusal.

SAS9 SQL Constraints

23rd July 2007

With SAS 9, SAS Institute has introduced the sort of integrity constraints that have been bread and butter for relational database SQL programs, but some SAS programmers may find them more restrictive than they might like. The main one that comes to my mind is the following:

proc sql noprint;
    create table a as select a.*,b.var from a left join b on a.index=b.index;
quit;

Before SAS 9, that worked merrily with nary a comment, only for you now to see a warning like this:

WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.

In the data step, the following still runs without a complaint:

data a;
    merge a b(keep=index var);
    by index;
run;

On the surface of it, this does look inconsistent. From a database programmer's point of view having to use different source and target datasets is no hardship but seems a little surplus to requirements for a SAS programmer trained to keep down the number of temporary datasets to reduce I/O and keep things tidy, an academic concept perhaps in these days of high processing power and large disks. While adding UNDO_POLICY=NONE to the PROC SQL line does make everything consistent again, I see this as being anathema to a database programming type. Though I do admit to indulging in the override for personal quick and dirty purposes, abiding by the constraint is how I do things for formal purposes like inclusion in an application to a regulatory authority like FDA.

WARNING: The quoted string currently being processed has become more than 262 characters long…

20th June 2007

This is a SAS error that can be seen from time to time:

WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks.

In the days before SAS version 8, this was something that needed to be immediately corrected. In these days of SAS character variables extending beyond 200 characters in length, it becomes a potential millstone around a SAS programmer's neck. If you run a piece of code like this:

data _null_;
    x="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
run;

What you get back is the warning message at the heart of the matter. While the code is legitimate and works fine, the spurious error is returned because SAS hasn't found a closing quote by the required position and the 262-character limit is a hard constraint that cannot be extended. There is another way, though: the new QUOTELENMAX option in SAS9. Setting it as follows removes the messages in most situations (yes, I did find one where it didn't play ball):

options noquotelenmax;

This does, however, beg the question as to how you check for unbalanced quotes in SAS logs these days; clearly, looking for a closing quote is an outmoded approach. Thanks to code highlighting, it is far easier to pick them out before the code gets submitted. The other question that arises is why you would cause this to happen anyway, but there are occasions where you assign the value of a macro variable to a data set one and the string is longer than the limit set by SAS. Here's some example code:

data _null_;
    length y $400;
    y=repeat("f",400);
    call symput("y",y)
run;
data _null_;
    x="&y";
run;

My own weakness is where I use PROC SQL to combine strings into a macro variable, a lazy man's method of combining all distinct values for a variable into a delimited list like this:

proc sql noprint;
    select distinct compress(string_var) into :vals separated by " " from dataset;
quit;

Of course, creating a long delimited string using the CATX (new to SAS9) function avoids the whole situation and there are other means, but there may be occasions, like the use of system macro variables, where it is unavoidable and NOQUOTELENMAX makes a much better impression when these arise.

A peculiarity with PROC EXPORT

10th June 2007

I have just encountered an issue with PROC EXPORT that I did not expect to see: it needs to run in a windowing environment. The way that I found this was that I was running a SAS macro as part of a batch job in a headless UNIX session and my program stopped dead with the job needing to be killed; that returned a message containing something about SAS/FSP and SAS/AF which does explain things. Still, this was not something that I would have expected with an export to a CSV file; the behaviour sounds more what you see with the likes of PROC GPLOT or PROC REPORT. As it happened, adding the -noterminal option to the batch command line sorted things out.

Using alternative editors for SAS programming

5th June 2007

When it comes to writing SAS programs, most use the tools that SAS gives us, be it Enterprise Guide, the Enhanced Editor or the Program Editor. While Enterprise Guide can work with UNIX SAS as the processing engine, it is very much a Windows tool and the Enhanced Editor functionality is provided through Windows-only programming (ActiveX, I seem to recall). However, that means that creature comforts are left behind you if you turn to writing SAS code using UNIX SAS; you have only got the good old-fashioned Program Editor supplied by SAS itself. However, there is a trick that you can use to make life more comfortable: SAS does allow you to submit the contents of your paste buffer (or clipboard) using the command SUBMIT BUFFER=DEFAULT and this can be assigned to a function key for ease of use (I use the same key to clear the log and output screens at the same time). In the Windows, you may need to explicitly copy the code to do this but, in UNIX, merely highlighting a section of code with an editor like NEdit will do the trick and, given that NEdit is reasonably pleasant tool for code cutting (the ability to define its macros with a spot of scripting is a definite plus point), this makes life more comfortable again.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.