For loop | Technology Tales

Adding visual appeal to bash command line scripts with colour variables on Linux

23^rd November 2025

While I was updating some scripts to improve their functionality, I made some unexpected discoveries. One involved adding some colour to the output, and a second will come up later. The colours can be defined as values of variables, as you can see below:

# Colours RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' NC='\033[0m' # no colour

In all cases, \033 is the shell escape sequence while [ is the control sequence initiator and m closes the sequence for colour definitions like we have here. A numeric value of 0 resets things to the default, which is how it is used in the no colour (NC) case that we have above to ensure that the colouration does not overflow beyond the intended text. Otherwise, 31 specifies red, 32 specifies green and 33 specifies yellow, giving options to use later on in the code. All of this is in line with the ANSI standard.

This is how these colour variables get used:

echo -e "\n${YELLOW}$(printf '*' {1..40}) All done $(printf '*' {1..40})${NC}\n"

The above is for an example with yellow text produced using ${YELLOW} segment after the newline sequence (\n) that is activated b y the -e switch passed to the echo command. This is turned off by the ${NC} portion at the end of the text, again before a terminating newline sequence. One extra addition here is the part that outputs forty asterisks: $(printf '*' {1..40}). You could have $(printf '*%.0s' {1..40}) instead, which is clearer to some because of the null output character sequence %.0s. In the earlier example, I opted for the simpler option.

Taking control of Ruff checks on Python scripts

22^nd October 2025

Positron is becoming my tool of choice for developing Python code. Along from using a Python console like a REPL environment, it also includes Ruff for checking code compliance. One of its rules is that Python modules must be declared at the top. However, I want to use some code that checks for the present of any modules used in a script, installing those that are missing. This means that import statements appear later in a script that Ruff recommends, making me wish for a way to turn off that check since things run well anyway. The chosen solution is to create a file called pyproject.toml in the directory where my scripts are store and add the following lines in there to accomplish what I want:

[tool.ruff] ignore = ["E402"]

Here, it helps if you open a folder in Positron, achieving the same outcome as you would in the VSCode on which the IDE is based. While I have only listed one check here, you also can have a comma-delimited list of quoted strings if you need to switch off more than one rule at once.

PandasGUI: A simple solution for Pandas DataFrame inspection from within VSCode

2^nd September 2025

One of the things that I miss about Spyder when running Python scripts is the ability to look at DataFrames easily. Recently, I was checking a VAT return only for tmux to truncate how much of the DataFrame I could see in output from the print function. While closing tmux might have been an idea, I sought the DataFrame windowing alternative. That led me to the pandasgui package, which did exactly what I needed, apart from pausing the script execution to show me the data. The installed was done using pip:

pip install pandasgui

Once that competed, I could use the following code construct to accomplish what I wanted:

import pandasgui

pandasgui.show(df)

In my case, there were several lines between the two lines above. Nevertheless, the first line made the pandasgui package available to the script, while the second one displayed the DataFrame in a GUI with scrollbars and cells, among other things. That was close enough to what I wanted to leave me able to complete the task that was needed of me.

String replacement in BASH scripting

28^th April 2023

During creation of new posts for a Hugo deployed website, I found myself using the same directories again and again. Since I invariably ended up making typing mistakes when I did so, I fancied the idea of using shortcodes instead.

Because I wanted to turn the shortcode into the actual directory name, I chose the use of text replacement in BASH scripting. Thankfully, this is simple and avoids the use of regular expressions, which can bring their own problems. The essential syntax is as follows:

variable="${variable/search text/replacement}"

For the variable, the search text is substituted with the replacement straightforwardly. It is even possible to include the search and replacement text in variables. In the example below, this is achieved using variables called original and replacement.

variable="${variable/$original/$replacement}"

Doing this got me my translatable shortcodes and converted them into actual directory names for the hugo command to process. There may be other uses yet.

Changing the number of lines produced by the tail command in a Linux or UNIX session

25^th April 2023

Since I often use the tail command to look at the end of a log file and occasionally in combination with the watch command for constant updates, I got to wondering if the number of lines issued by the tail command could be changed. That is a simple thing to do with the -n switch. All you need is something like the following:

tail -n 20 logfile.log

Here the value of 20 is the number of lines produced when it would be 10 by default, and logfile.log gets replaced by the path and name of what you are examining. One thing to watch is that your terminal emulator can show all the lines being displayed. If you find that you are not seeing all the lines that you expect, then that might be the cause.

While you could find this by looking through the documentation, things do not always register with you during dry reading of something laden with lists of parameters or switches. That is an affliction with tools that do a lot and/or allow a lot of customisation.

Using multi-line commenting in Perl to inactivate blocks of code during testing

26^th December 2019

Recently, I needed to inactivate blocks of code in a Perl script while doing some testing. Since this is something that I often do in other computing languages, I sought the same in Perl. To accomplish that, I need to use the POD methodology. This meant enclosing the code as follows.

=start

<< Code to be inactivated by inclusion in a comment >>

=cut

While the =start line could use any word after the equality sign, it seems that =cut is required to close the multi-line comment. If this was actual programming documentation, then the comment block should include some meaningful text for use with perldoc. However, that was not a concern here because the commenting statements would be removed afterwards anyway. It also is good practice not to leave commented code in a production script or program to avoid any later confusion.

In my case, this facility allowed me to isolate the code that I had to alter and test before putting everything back as needed. It also saved time since I did not need to individually comment out every executable line because multiple lines could be inactivated at a time.

Performing parallel processing in Perl scripting with the Parallel::ForkManager module

30^th September 2019

In a previous post, I described how to add Perl modules in Linux Mint, while mentioning that I hoped to add another that discusses the use of the Parallel::ForkManager module. This is that second post, and I am going to keep things as simple and generic as they can be. There are other articles like one on the Perl Maven website that go into more detail.

The first thing to do is ensure that the Parallel::ForkManager module is called by your script; having the line of code presented below near the top of the file will do just that. Without this step, the script will not be able to find the required module by itself and errors will be generated.

use Parallel::ForkManager;

Then, the maximum number of threads needs to be specified. While that can be achieved using a simple variable declaration, the following line reads this from the command used to invoke the script. It even tells a forgetful user what they need to do in its own terse manner. Here $0 is the name of the script and N is the number of threads. Not all these threads will get used and processing capacity will limit how many actually are in use, which means that there is less chance of overwhelming a CPU.

my $forks = shift or die "Usage: $0 N\n";

Once the maximum number of available threads is known, the next step is to instantiate the Parallel::ForkManager object as follows to use these child processes:

my $pm = Parallel::ForkManager->new($forks);

With the Parallel::ForkManager object available, it is now possible to use it as part of a loop. A foreach loop works well, though only a single array can be used, with hashes being needed when other collections need interrogation. Two extra statements are needed, with one to start a child process and another to end it.

foreach $t (@array) { my $pid = $pm->start and next; << Other code to be processed >> $pm->finish; }

Since there is often other processing performed by script, and it is possible to have multiple threaded loops in one, there needs to be a way of getting the parent process to wait until all the child processes have completed before moving from one step to another in the main script and that is what the following statement does. In short, it adds more control.

$pm->wait_all_children;

To close, there needs to be a comment on the advantages of parallel processing. Modern multicore processors often get used in single threaded operations, which leaves most of the capacity unused. Utilising this extra power then shortens processing times markedly. To give you an idea of what can be achieved, I had a single script taking around 2.5 minutes to complete in single threaded mode, while setting the maximum number of threads to 24 reduced this to just over half a minute while taking up 80% of the processing capacity. This was with an AMD Ryzen 7 2700X CPU with eight cores and a maximum of 16 processor threads. Surprisingly, using 16 as the maximum thread number only used half the processor capacity, so it seems to be a matter of performing one's own measurements when making these decisions.

Creating a data-driven informat in SAS

27^th September 2019

Recently, I needed to create some example data with an extra numeric identifier variable that would be assigned according to the value of a character identifier variable. Not wanting to add another dataset merge or join to the code, I decided to create an informat from data. Initially, I looked into creating a format instead, but it did not accomplish what I wanted to do.

data patient;
    keep fmtname start end label type;
    set test.dm;
    by subject;
    fmtname="PATIENT";
    start=subject;
    end=start;
    label=patient;
    type="I";
run;

The input data needed a little processing as shown above. The format name was defined in the variable FMTNAME and the TYPE variable was assigned a value of I to make this a numeric informat; to make character equivalent, a value of J was assigned. The START and END variables declare the value range associated with the value of the LABEL variable that would become the actual value of the numeric identifier variable. The variable names are fixed because the next step will not work with different ones.

proc format lib=work cntlin=patient; run; quit;

To create the actual informat, the dataset is read by the FORMAT procedure with the CNTLIN parameter specifying the name of the input dataset and LIB defining the library where the format catalogue is stored. When this in complete, the informat is available for use with an input function as shown in the code excerpt below.

data ae1;
    set ae;
    patient=input(subject,patient.);
run;

Using NOT IN operator type functionality in SAS Macro

9^th November 2018

For as long as I have been programming with SAS, there has been the ability to test if a variable does or does not have one value from a list of values in data step IF clauses or WHERE clauses in both data step and most if not all procedures. It was only within the last decade that its Macro language got similar functionality, with one caveat that I recently uncovered: you cannot have a NOT IN construct. To get that, you need to go about things differently.

In the example below, you see the NOT operator being placed before the IN operator component that is enclosed in parentheses. If this is not done, SAS produces the error messages that caused me to look at SAS Usage Note 31322. Once I followed that approach, I was able to do what I wanted without resorting to older, more long-winded coding practices.

options minoperator;

%macro inop(x);
    %if not (&x in (a b c)) %then %do;
        %put Value is not included;
    %end;
    %else %do;
        %put Value is included;
    %end;
%mend inop;

%inop(a);

While running the above code should produce a similar result to another featured on here in another post, the logic is reversed. There are times when such an approach is needed. One is where a few possibilities are to be excluded from a larger number of possibilities. Since programming often involves more inventive thinking, this may be one of those.

ERROR: This range is repeated, or values overlap: - .

15^th September 2012

This is another posting in an occasional series on SAS error and warning messages that aren't as clear as they'd need to be. What produced the message was my creation of a control data set that I then wished to use to create a data-driven (in)format. It was the PROC FORMAT step that issued the message, and I got no (in)format created. However, there were no duplicate entries in the control data set as the message suggested to me, so a little more investigation was needed.

What that revealed was that there might be one variable missing from the data set that I needed to have. The SAS documentation has defined FMTNAME, START and LABEL as compulsory variables, with each of them containing the following: format name, initial value and displayed value. My intention was this: to create a numeric code variable for one containing character strings using my data-driven format, with then numbers specified within a character variable as it should be. What was missing then was TYPE.

This variable can be one of the following values: C for character formats, I for numeric informats, J for character informats, N for numeric formats and P for picture formats. Due to it being a conversion from character values to numeric ones, I set the values of TYPE to I and used an input function to do the required operations. The code for successfully creating the informat is below:

proc sql noprint;
    create table tpts as
        select distinct "_vstpt" as fmtname,
                        "I" as type,
                        vstpt as start,
                        vstpt as end,
                        strip(put(vstptnum,best.)) as label
            from test
                where not missing(vstptnum);
quit;

proc format library=work cntlin=tpts; run; quit;

Though I didn't need to do it, I added an END variable too for the sake of completeness. In this case, the range is such that its start and end are the same and there are cases where that will not be the case, though I am not dwelling on those.

« Older Entries «