Technology Tales

Adventures in consumer and enterprise technology

TOPIC: PROGRAMMING LANGUAGE COMPARISONS

Performing parallel processing in Perl scripting with the Parallel::ForkManager module

30th September 2019

In a previous post, I described how to add Perl modules in Linux Mint, while mentioning that I hoped to add another that discusses the use of the Parallel::ForkManager module. This is that second post, and I am going to keep things as simple and generic as they can be. There are other articles like one on the Perl Maven website that go into more detail.

The first thing to do is ensure that the Parallel::ForkManager module is called by your script; having the line of code presented below near the top of the file will do just that. Without this step, the script will not be able to find the required module by itself and errors will be generated.

use Parallel::ForkManager;

Then, the maximum number of threads needs to be specified. While that can be achieved using a simple variable declaration, the following line reads this from the command used to invoke the script. It even tells a forgetful user what they need to do in its own terse manner. Here $0 is the name of the script and N is the number of threads. Not all these threads will get used and processing capacity will limit how many actually are in use, which means that there is less chance of overwhelming a CPU.

my $forks = shift or die "Usage: $0 N\n";

Once the maximum number of available threads is known, the next step is to instantiate the Parallel::ForkManager object as follows to use these child processes:

my $pm = Parallel::ForkManager->new($forks);

With the Parallel::ForkManager object available, it is now possible to use it as part of a loop. A foreach loop works well, though only a single array can be used, with hashes being needed when other collections need interrogation. Two extra statements are needed, with one to start a child process and another to end it.

foreach $t (@array) {
my $pid = $pm->start and next;
<< Other code to be processed >>
$pm->finish;
}

Since there is often other processing performed by script, and it is possible to have multiple threaded loops in one, there needs to be a way of getting the parent process to wait until all the child processes have completed before moving from one step to another in the main script and that is what the following statement does. In short, it adds more control.

$pm->wait_all_children;

To close, there needs to be a comment on the advantages of parallel processing. Modern multicore processors often get used in single threaded operations, which leaves most of the capacity unused. Utilising this extra power then shortens processing times markedly. To give you an idea of what can be achieved, I had a single script taking around 2.5 minutes to complete in single threaded mode, while setting the maximum number of threads to 24 reduced this to just over half a minute while taking up 80% of the processing capacity. This was with an AMD Ryzen 7 2700X CPU with eight cores and a maximum of 16 processor threads. Surprisingly, using 16 as the maximum thread number only used half the processor capacity, so it seems to be a matter of performing one's own measurements when making these decisions.

Using NOT IN operator type functionality in SAS Macro

9th November 2018

For as long as I have been programming with SAS, there has been the ability to test if a variable does or does not have one value from a list of values in data step IF clauses or WHERE clauses in both data step and most if not all procedures. It was only within the last decade that its Macro language got similar functionality, with one caveat that I recently uncovered: you cannot have a NOT IN construct. To get that, you need to go about things differently.

In the example below, you see the NOT operator being placed before the IN operator component that is enclosed in parentheses. If this is not done, SAS produces the error messages that caused me to look at SAS Usage Note 31322. Once I followed that approach, I was able to do what I wanted without resorting to older, more long-winded coding practices.

options minoperator;

%macro inop(x);
    %if not (&x in (a b c)) %then %do;
        %put Value is not included;
    %end;
    %else %do;
        %put Value is included;
    %end;
%mend inop;

%inop(a);

While running the above code should produce a similar result to another featured on here in another post, the logic is reversed. There are times when such an approach is needed. One is where a few possibilities are to be excluded from a larger number of possibilities. Since programming often involves more inventive thinking, this may be one of those.

AND & OR, a cautionary tale

27th March 2009

The inspiration for this post is a situation where having the string "OR" or "AND" as an input to a piece of SAS Macro code, breaking a program that I had written. Here is a simplified example of what I was doing:

%macro test;
    %let doms=GE GT NE LT LE AND OR;
    %let lv_count=1;
    %do %while (%scan(&doms,&lv_count,' ') ne );
        %put &lv_count;
        %let lv_count=%eval(&lv_count+1);
    %end;
%mend test;

%test;

The loop proceeds well until the string "AND" is met and "OR" has the same effect. The result is the following message appears in the log:

ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: %scan(&doms,&lv_count,' ') ne
ERROR: The condition in the %DO %WHILE loop, , yielded an invalid or missing value, . The macro will stop executing.
ERROR: The macro TEST will stop executing.

Both AND & OR (case doesn't matter, but I am sticking with upper case for sake of clarity) seem to be reserved words in a macro DO WHILE loop, while equality mnemonics like GE cause no problem. Perhaps, the fact that and equality operator is already in the expression helps. Regardless, the fix is simple:

%macro test;
    %let doms=GE GT NE LT LE AND OR;
    %let lv_count=1;
    %do %while ("%scan(&doms,&lv_count,' ')" ne "");
        %put &lv_count;
        %let lv_count=%eval(&lv_count+1);
    %end;
%mend test;

%test;

Now none of the strings extracted from the macro variable &DOMS will appear as bare words and confuse the SAS Macro processor, but you do have to make sure that you are testing for the null string ("" or '') or you'll send your program into an infinite loop, always a potential problem with DO WHILE loops so they need to be used with care. All in all, an odd-looking message gets an easy solution without recourse to macro quoting functions like %NRSTR or %SUPERQ.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.