Technology Tales

Adventures & experiences in contemporary technology

Transferring data between SAS and R

5th June 2008

A question regarding the ability to transfer of data between SAS and R set me off on a spot of investigation a while back and I have always planned to share the results of my labours. Once I managed to locate the required documentation, things became clearer with further inspection. Functions from the foreign package seem to offer the most from the data import and export point of view so they’re what I’ll be featuring in this posting.

I’ll start with importing and using the read.ssd function makes life so much easier for getting SAS data into R. I discovered that the foreign package may not be loaded by default but you can determine this easily by issuing the following command:

search()

If “package:foreign” isn’t in the list, then you need to issue the following function call:

library(foreign)

Of course, if the foreign package isn’t installed, none of this will work. It should live in the library sub-folder of the main R installation directory but if it isn’t there, then downloading the relevant binary package from CRAN is in order. Assuming that all is installed, then a command like the following will perform the needful:

read.ssd("c:/data","data1",sascmd="C:/Program Files/SAS Institute/SAS/V8/sas.exe")

This creates a temporary SAS program that converts the SAS data set into a transport file for reading by another R function that is called in the background, read.xport. Form my experience, it all seems to work fairly seamlessly.

To get data out of R and into SAS is a multi-stage process, even with the foreign package. There are other ways but using the write.foreign seems more useful than most and here’s an example function call:

write.foreign(data1,"C:/test.txt","C:/test.sas",package="SAS",dataname="data1",validvarname="V7")

No SAS data sets are created at this stage but a text file is generated along with a SAS program for converting it into a data set. Running the SAS program is a separate step that follows the creation of the two files. Even if it is less streamlined than read.ssd, write.foreign does make easier to transfer data into SAS than having to write a program from scratch to read in write.table output.

In summary, R can neither read or write SAS data sets by itself so you need SAS installed to really make things happen. SAS gets called by read.ssd and I feel that it would be better if was called by write.foreign also rather than a SAS program generated for execution later on. Even so, it is good to see some custom functionality being provided that makes life easier. There’s also the hmisc package but my experiences while working with that on S-Plus have been such that it compares less favourably with foreign on the reliability front. Saying that, things may have changed since I last tried it.

SAS Data Step Hash Objects and Memory

3rd June 2008

Using hash objects in SAS data step code offers some great advantages from the speed point of view; having a set of data in memory rather than on disk makes things much faster. However, that means that you need to keep more of an eye on the amount of memory that’s being used. The first thing is to work out how much memory is available and it’s not necessarily the total amount installed on the system or, for that matter, the amount of memory per processor on a multi-processor system. What you really need is the number, in bytes, that is stored in the XMRLMEM system option and here’s a piece of code that’ll do just that:

data _null_;
mem=getoption('xmrlmem');
put mem;
run;

The XMRLMEM is itself an option that you can only declare in the system call that starts SAS up in the first place and there are advantages to keeping it under control, particularly on large multi-user servers. However, if your hash objects start to exceed what is available, here’s the sort of thing that you can expect to see:

ERROR: Hash object added 49136 items when memory failure occurred.
FATAL: Insufficient memory to execute data step program. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of insufficient memory.
NOTE: SAS set option OBS=0 and will continue to check statements. This may cause NOTE: No observations in data set.

Those messages are a cue for you to learn to keep those hash objects and to only ever make them as large as your memory settings will allow. Another thing to note is that hash objects are best retained for rather fixed data volumes instead of ones that could outgrow their limits. There’s a certain amount of common sense in operation here but it may be that promoters of hash objects don’t mention their limitations as much as they should. If you want to find out more, SAS have a useful paper on their website and the their Knowledge Base has more on the error messages that you can get.

New version of SAS on the way

16th January 2008

This is something of a newsflash posting but this morning’s issue of the SAS Tech Report newsletter has said at last when SAS 9.2 is expected to be released. SAS have been talking a bit about 9.2 but dates were elusive and, to a point, they still are. Nevertheless, hearing the Q1 of this year is the time slot for the unveiling is better than knowing nothing at all. Am I alone in wondering if it is coming later than was planned?

Why I’ll be keeping Windows close to hand for a while to come

2nd December 2007

Even though I have moved to Linux and it has been fulfilling nearly all of my home computing needs, I do and plan to continue to retain access to Windows courtesy of virtualisation technology. Keeping current with the world of the ever pervasive Windows is one motivation but there are others. In fact, now that Windows is more of a sideline, I may even get my hands on Vista at some point to take a further in-depth look at it, hopefully without having to suffer the consequences of my curiosity.

Talking of other reasons for hanging onto Windows, listening to music secured by DRM does come to mind. DRM is seen in a negative light by many in the open source world so Linux remains unencumbered by the beast. That isn’t necessarily a bad thing and the whole furore about Vista and DRM earlier this year had me wondering about a Linux future. However, I have been known to buy music from iTunes and would like to continue doing so. WINE might be one way to achieve this but retaining Windows seems a sounder option. That way, I am saved from having to convert my protected music files into either Ogg Vorbis or FLAC; the latter involves a lossless compression unlike the former so the files are bigger with the additional quality that an audiophile would seek. MP3 is another option but there are those in the Linux world who frown upon anything patented. That makes getting MP3 support an additional task for those of us wanting it.

In my wisdom, I have succumbed to the delights of expensive web development tools like Altova’s XMLSpy and Adobe’s Dreamweaver. While I have found a way to get Quanta Plus to edit files on the web server directly and code hacking is my main way to improve my websites, I still will be having a bimble into Dreamweaver from time to time. I have yet to see XMLSpy’s grid view replicated in the open source world so that should remain a key tool in my arsenal. While I haven’t been looking too hard at open source XML editors recently, there remains unexplored functionality in XMLSpy that I should really explore to see if it could be harnessed.

I have included implicit references to this already but keeping Windows around also allows you to continue using familiar software. For some, this might be Microsoft Office but OpenOffice and Evolution have usurped this in my case. Photoshop Elements is a better example for me. Digitial transfers from scanners and DSLR’s will stay in the world of Linux but virtualisation allows me to process the images whatever way i want and I might just stick with the familiar for now before jumping ship to GIMP at some point in the future. With all that is written on Photoshop, having it there for learning new things seems a very sensible idea.

While open source software can conceivably address every possible, there are bound to be niches that remain outside of its reach. I use mapping software from Anquet when planning hillwalking excursions. It seems very much to be a Windows only offering and I have already downloaded a good amount of mapping so Windows has to stay if I need to use this and the routes that I have plotted out before now. Another piece of software that find its way into this bracket is my copy of SAS Learning Edition; there are times when a spot of learning at home goes a long way at work.

So, in summary, my reasons for keeping Windows around are as follows:

  • Learning new things about the thing since I am unlikely to escape its influence in the world of work
  • Using iTunes to download new music and to continue to listen to what I have already
  • Using and learning about industry standard web development tools like Dreamweaver and XMLSpy
  • Easing the transition, by continuing to use Photoshop Elements for example
  • Using niche software like Anquet mapping

I suppose that many will relate to the above but Linux still has plenty to take over some of the above. In time, DRM may disappear from the music scene and not before time; accountants and shareholders may need to learn to trust customers. NVu and Quanta Plus could yet usurp Dreamweaver and there may be an open source alternative to XMLSpy like there is for so many other areas. The Photoshop versus GIMP choice will continue to prevent itself and all that is written about the former makes it seem silly to throw it away, however good the latter is. Even with changing over Linux equivalents of applications fulfilling standard needs, it still leaves niche applications like hillwalking mapping  and that, together with the need to know what Windows might offer in the enterprise space, could be the enduring reasons for keeping it near to hand. That said, I can now go through whole days without firing a Windows VM up and that is a big change from how it was a few months ago. I suppose that it’s all too easy to stick with using one operating system at a time and that is Linux for me these days.

Controlling what the wpgm command calls in Windows SAS

30th November 2007

I was setting up a key mapping in SAS 8.1 such that the log and output windows are cleared and a SAS program run in the most recently used program editor window. The idea was that debugging would be easier and command was what you see below:

log; clear; output; clear; wpgm; submit

I was having trouble getting SAS to pick up the most recently used Enhanced Editor window and it was opening up an old style Program Editor window in its place. If I had wanted to use that, I would have used pgm and not wpgm. What was conspiring against me was a pesky system option. Pottering over to Tools > Options > Preferences and navigating to the Edit tab brought me to the cause of the problem: the Use Enhanced Editor check box was in the clear and fixing that set me on my way. SAS 9 could also be afflicted by the same irritation and that is where i got the screenshot that you see below where everything is hunky dory.

SAS 9 Edit Preferences

Append or update?

25th November 2007

SAS can generate many types of output: plain text, XML, PDF, RTF, Excel, etc. With all of these and the SAS procedures like PROC REPORT, PROC TABULATE and so on, it might seem surprising for me to say that I have been generating output with data step PUT and FILE statements. There was, of course, a reason for this: creating text files for loading into a new database-driven software application. At one stage, I also did some data interleaving at the output stage and that’s when I discovered that the default behaviour for SAS FILE statements is to completely overwrite a file unless the MOD option was specified. Adding that switches on APPEND behaviour. The code below adds a header in one step while adding data below it in another. I know that there are slicker ways to achieve this like setting up your data as you want it or using _N_ to ensure that something only appears once but here’s another way. As per the Perl, there’s often more than one way to do something with SAS.

data _null_;
file ds_data;
put "fieldtype;datasetname;datasetlabel;datasetlayout;datasetclass;datasetstandardversion";
run;

data _null_;
set ds_ispec;
file ds_data mod;
line="datasetstandard;"||trim(memname)||";"||trim(memlabel)||";;;"||trim(memver);
put line;
run;

About workspaces…

16th November 2007

One of the nice things about the world of Linux and UNIX is the availability of multiple workspaces. In Window, you only ever get one and the likes of me can easily fill up that task bar. So the idea of parceling off different applications to different screens is useful from a housekeeping point of view so long as icons only appear in the task bar foe the open workspace; Ubuntu respects this but openSUSE doesn’t, a possible source of irritation.

However, a case can be made that UNIX/Linux needs workspaces more than Windows because of the multi-window interfaces of some of the software applications. The trouble with each of these sub-windows is that an entry appears in the task bar for each of this, creating a mess very quickly. And it can also be an issue working out which window closes the lot.

Examples of the above that come to my mind include GIMP, XSane and SAS. The Windows version of the latter’s DMS is confined to a single application window while the UNIX incarnation is composed of a window each for individual components like program editor, log, output, etc. Typing "bye" in the command line of the program editor is enough to dispatch the GUI. With GIMP, Ctrl+Q will close it down in any window apart from the "Tip of the Day" one that pops up when GIMP is first started. The same sort of behaviour also seems to dispatch XSane too.

Switching form one workspace to another is as easy as clicking the relevant icon in the task bar in all of the UNIX variants that I have used. Switching an application from one workspace to another has another common thread: finding the required entry in the application window menu.

In Ubuntu, I have seen other ways of working with workspaces. In the interface with visual effects turned off, hovering over the workspace icons in the task bar allows you to move from one to another with the wheel of your mouse. Moving an application between workspaces can be done as simply as dragging boxes from one task bar icon to another. Turning on the visual effects changes things, though. It might appear that the original functionality still works but that seems not to be the case: a matter for Canonical to resolve, perhaps?

The visual effects do provide other ways around this though. Keeping all your application windows minimised means that you can run through workspaces themselves with your wheel mouse. Moving applications between workspaces becomes as simple as grabbing the title bar and pulling the window left or right until it changes workspace. Be careful that you do the job fully though or you could have an application sitting astride two workspaces. It would appear that ideas from the sharing of a desktop across multiple monitors have percolated through to workspace behaviour.

Aside (regarding Ubuntu visual effects): I don’t know who came up with the idea of having windows wobble when they’re being moved around but it certainly is unusual, as is seeing what happens when you try prising a docked window from its mooring (particularly when you’re pulling it up from the bottom task bar). The sharper font display and bevelled screen furniture make more sense to me though; they certainly make a UI more appealing and modern.

Escaping brackets in SAS macro language

14th November 2007

Rendering opening and closing brackets as pieces in SAS macro language programming caused me a bit of grief until I got it sorted a few months back. All of the usual suspects for macro quoting (or escaping in other computer languages) let me down: even the likes of %SUPERQ or %NRBQUOTE didn’t do the trick. The honours were left to %NRQUOTE(%(), which performed what was required very respectably indeed. The second "%" escapes the bracket for %NRQUOTE to do the rest.

A throwback to the past: an appearance of MACROGEN

4th October 2007

Recently, I was reviewing a log of a program being run by SAS 9.1.3 on a Solaris system and spotted lines like the following:

MACROGEN(MACRO1):   OPTIONS NOMPRINT NOMPRINTNEST

NOTE: PROCEDURE DISPLAY used (Total process time):
real time           0.73 seconds
cpu time            0.50 seconds

MPRINT(MACRO1):   SOURCE SOURCE2 NOTES;

The appearance of the word MACROGEN made me wonder if there was another system option that I had missed. A quick search of the SAS website threw up a support note that shed some light on the situation. Apparently, MACROGEN is the SAS v5 forbear of today’s MPRINT, MLOGIC, and SYMBOLGEN options and would seem to be obsolete in these days. Having started programming SAS in the days of version 6, I had missed out on MACROGEN and so use its replacements instead, hence my never coming across the option. Quite what it’s doing showing up in a SAS 9 log is another story: and there I was thinking that SAS 9 was the result of a full rewrite… Now, I am not so sure but at least I know what MACROGEN is if someone ever takes the time to ask me.

Porting SAS files to other platforms and versions

1st October 2007

SAS uses its transport file format to port files between operating and, where the need arises, different software versions. As with a lot of things, there is more than one method to create these transport files: PROC CPORT/CIMPORT and PROC COPY with the XPORT engine. The former method is for within version transfer of SAS files between different operating systems (UNIX to Windows, for instance) and the latter is for cross-version transfer (SAS9 to SAS 8, for example. SAS Institute have a page devoted to this subject which may share more details.

  • All the views that you find expressed on here in postings and articles are mine alone and not those of any organisation with which I have any association, through work or otherwise. As regards editorial policy, whatever appears here is entirely of my own choice and not that of any other person or organisation.

  • Please note that everything you find here is copyrighted material. The content may be available to read without charge and without advertising but it is not to be reproduced without attribution. As it happens, a number of the images are sourced from stock libraries like iStockPhoto so they certainly are not for abstraction.

  • With regards to any comments left on the site, I expect them to be civil in tone of voice and reserve the right to reject any that are either inappropriate or irrelevant. Comment review is subject to automated processing as well as manual inspection but whatever is said is the sole responsibility of the individual contributor.