TOPIC: HASH-BASED DATA STRUCTURES
Using associative arrays in scripting for BASH version 4 and above
29th April 2023Associated arrays get called different names in different computing languages: dictionaries, hash tables and so on. What is held in common is that they essentially are lists of key value pairs. In the case of BASH, you need at least version 4 to make use of this facility. In Linux Mint, I get 5.1.16, but macOS users apparently are still on BASH 3, so this post may not help them.
To declare an associative array in a later version of BASH, the following command gets issued:
declare -A hashtable
The code to add a key value pair then takes the following form:
hashtable[key1]=value1
Several values can be added to an empty array like this:
hashtable=( ["key1"]="value1" ["key2"]="value2" )
Declaration and instantiation of an associative can be done in the same line as follows:
declare -A hashtable=( ["key1"]="value1 ["key2"]="value2")
Handily, it is possible to loop through the entries in an associative array. It is possible to do this for keys and for values, once you expand out the appropriate list. The following expands a list of values:
"${hashtable[@]}"
Expanding a list of values needs something like this:
"${!hashtable[@]}"
Looping through a list of values needs something like the following:
for val in "${hashtable[@]}"; do echo "$val"; done;
The above has been placed on a single line with semicolon delimiters for brevity, but this can be put on several lines with no semicolons for added clarity as long as correct indentation is followed. It is also possible to similarly loop through a list of keys:
for key in "${!hashtable[@]}"; do echo "key: $key, value ${hashtable[$key]}"; done;
For the example associative array declared earlier, the last line produces this output, resolving the value using the supplied key:
key: key2, value value2
key: key1, value value1
All of this found a use in a script that I created for adding new Markdown files to a Hugo instance because there was more than one shortcode that I wished to apply due to my having more than one content directory in use.
SAS Data Step Hash Objects and Memory
3rd June 2008Using hash objects in SAS data step code offers some great advantages from the speed point of view; having a set of data in memory rather than on disk makes things much faster. However, that means that you need to keep more of an eye on the amount of memory that's being used. The first thing is to work out how much memory is available, and it's not necessarily the total amount installed on the system or, for that matter, the amount of memory per processor on a multiprocessor system. What you really need is the number, in bytes, that is stored in the XMRLMEM
system option and here's a piece of code that'll do just that:
data _null_;
mem=getoption('xmrlmem');
put mem;
run;
The XMRLMEM
is itself an option that you can only declare in the system call that starts SAS up in the first place, and there are advantages to keeping it under control, particularly on large multi-user servers. However, if your hash objects start to exceed what is available, here's the sort of thing that you can expect to see:
ERROR: Hash object added 49136 items when memory failure occurred.
FATAL: Insufficient memory to execute data step program. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of insufficient memory.
NOTE: SAS set option OBS=0 and will continue to check statements. This may cause NOTE: No observations in data set.
Those messages are a cue for you to learn to keep those hash objects and to only ever make them as large as your memory settings will allow. Another thing to note is that hash objects are best retained for rather fixed data volumes instead of ones that could outgrow their limits. There's a certain amount of common sense in operation here, but it may be that promoters of hash objects don't mention their limitations as much as they should. If you want to find out more, SAS has a useful paper on their website and their Knowledge Base has more on the error messages that you can get.