TOPIC: COMPUTER ARCHITECTURE
Performing parallel processing in Perl scripting with the Parallel::ForkManager module
30th September 2019In a previous post, I described how to add Perl modules in Linux Mint, while mentioning that I hoped to add another that discusses the use of the Parallel::ForkManager
module. This is that second post, and I am going to keep things as simple and generic as they can be. There are other articles like one on the Perl Maven website that go into more detail.
The first thing to do is ensure that the Parallel::ForkManager
module is called by your script; having the line of code presented below near the top of the file will do just that. Without this step, the script will not be able to find the required module by itself and errors will be generated.
use Parallel::ForkManager;
Then, the maximum number of threads needs to be specified. While that can be achieved using a simple variable declaration, the following line reads this from the command used to invoke the script. It even tells a forgetful user what they need to do in its own terse manner. Here $0 is the name of the script and N is the number of threads. Not all these threads will get used and processing capacity will limit how many actually are in use, which means that there is less chance of overwhelming a CPU.
my $forks = shift or die "Usage: $0 N\n";
Once the maximum number of available threads is known, the next step is to instantiate the Parallel::ForkManager
object as follows to use these child processes:
my $pm = Parallel::ForkManager->new($forks);
With the Parallel::ForkManager
object available, it is now possible to use it as part of a loop. A foreach
loop works well, though only a single array can be used, with hashes being needed when other collections need interrogation. Two extra statements are needed, with one to start a child process and another to end it.
foreach $t (@array) {
my $pid = $pm->start and next;
<< Other code to be processed >>
$pm->finish;
}
Since there is often other processing performed by script, and it is possible to have multiple threaded loops in one, there needs to be a way of getting the parent process to wait until all the child processes have completed before moving from one step to another in the main script and that is what the following statement does. In short, it adds more control.
$pm->wait_all_children;
To close, there needs to be a comment on the advantages of parallel processing. Modern multicore processors often get used in single threaded operations, which leaves most of the capacity unused. Utilising this extra power then shortens processing times markedly. To give you an idea of what can be achieved, I had a single script taking around 2.5 minutes to complete in single threaded mode, while setting the maximum number of threads to 24 reduced this to just over half a minute while taking up 80% of the processing capacity. This was with an AMD Ryzen 7 2700X CPU with eight cores and a maximum of 16 processor threads. Surprisingly, using 16 as the maximum thread number only used half the processor capacity, so it seems to be a matter of performing one's own measurements when making these decisions.
Interrogating Solaris hardware for installed CPU and memory resources
2nd October 2008There are times when working with a Solaris server that you need to know a little more about the hardware configuration. Knowing how much memory that you have and how many processors there are can be very useful to know if you are not to hog such resources.
The command for revealing how much memory has been installed is:
prtconf -v
Since memory is often allocated to individual CPU's, then knowing how many are on the system is a must. This command will give you the bare number:
psrinfo -p
The following variant provides the full detail that you see below it:
psrinfo -v
Output:
Status of virtual processor 0 as of: 10/06/2008 16:47:54
on-line since 09/13/2008 14:47:52.
The sparcv9 processor operates at 1503 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 10/06/2008 16:47:54
on-line since 09/13/2008 14:47:49.
The sparcv9 processor operates at 1503 MHz,
and has a sparcv9 floating point processor.
For a level intermediate between both extremes, try this to get what you see below it:
psrinfo -vp
Output:
The physical processor has 1 virtual processor (0)
UltraSPARC-IIIi (portid 0 impl 0x16 ver 0x34 clock 1503 MHz)
The physical processor has 1 virtual processor (1)
UltraSPARC-IIIi (portid 1 impl 0x16 ver 0x34 clock 1503 MHz)