TOPIC: HASHING
SAS Data Step Hash Objects and Memory
3rd June 2008Using hash objects in SAS data step code offers some great advantages from the speed point of view; having a set of data in memory rather than on disk makes things much faster. However, that means that you need to keep more of an eye on the amount of memory that's being used. The first thing is to work out how much memory is available, and it's not necessarily the total amount installed on the system or, for that matter, the amount of memory per processor on a multiprocessor system. What you really need is the number, in bytes, that is stored in the XMRLMEM system option and here's a piece of code that'll do just that:
data _null_;
    mem=getoption('xmrlmem');
    put mem;
run;The XMRLMEM is itself an option that you can only declare in the system call that starts SAS up in the first place, and there are advantages to keeping it under control, particularly on large multi-user servers. However, if your hash objects start to exceed what is available, here's the sort of thing that you can expect to see:
ERROR: Hash object added 49136 items when memory failure occurred.
FATAL: Insufficient memory to execute data step program. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of insufficient memory.
NOTE: SAS set option OBS=0 and will continue to check statements. This may cause NOTE: No observations in data set.
Those messages are a cue for you to learn to keep those hash objects and to only ever make them as large as your memory settings will allow. Another thing to note is that hash objects are best retained for rather fixed data volumes instead of ones that could outgrow their limits. There's a certain amount of common sense in operation here, but it may be that promoters of hash objects don't mention their limitations as much as they should. If you want to find out more, SAS has a useful paper on their website and their Knowledge Base has more on the error messages that you can get.