Technology Tales

Adventures in consumer and enterprise technology

TOPIC: SCRIPTING LANGUAGES

Expanding the coding toolkit: Adding R and Python in a changing landscape

10th April 2021

Over the years, I have taught myself a number of computing languages, with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me add two more to the repertoire: R and Python.

My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data, but ongoing changes in the life sciences sector could mean that I may need to look further afield, so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was, which causes me to rethink things.

As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially regarding the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately, even when emotions loom behind the scenes.

Since both R and Python bring large scripting ecosystems with active communities, the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem, though there are considerable differences and the platform is catching up when it comes to Data Science. While the aforementioned open-source languages may have had a head start, it appears that others are not standing still either. It is a time to have wider awareness, and online conference attendance helps with that.

The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point, and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.

The learning has been facilitated by ongoing pandemic restrictions, though things are opening up a little now. The pandemic also has given us public data that can be used for practice, since much can be gained from having one's own project instead of completing exercises from a book. Having an interesting data set with which to work is a must, and COVID-19 data contain a certain self-interest as well, while one remains mindful of the suffering and loss of life that has been happening since the pandemic first took hold.

Using multi-line commenting in Perl to inactivate blocks of code during testing

26th December 2019

Recently, I needed to inactivate blocks of code in a Perl script while doing some testing. Since this is something that I often do in other computing languages, I sought the same in Perl. To accomplish that, I need to use the POD methodology. This meant enclosing the code as follows.

=start

<< Code to be inactivated by inclusion in a comment >>

=cut

While the =start line could use any word after the equality sign, it seems that =cut is required to close the multi-line comment. If this was actual programming documentation, then the comment block should include some meaningful text for use with perldoc. However, that was not a concern here because the commenting statements would be removed afterwards anyway. It also is good practice not to leave commented code in a production script or program to avoid any later confusion.

In my case, this facility allowed me to isolate the code that I had to alter and test before putting everything back as needed. It also saved time since I did not need to individually comment out every executable line because multiple lines could be inactivated at a time.

Creating a data-driven informat in SAS

27th September 2019

Recently, I needed to create some example data with an extra numeric identifier variable that would be assigned according to the value of a character identifier variable. Not wanting to add another dataset merge or join to the code, I decided to create an informat from data. Initially, I looked into creating a format instead, but it did not accomplish what I wanted to do.

data patient;
    keep fmtname start end label type;
    set test.dm;
    by subject;
    fmtname="PATIENT";
    start=subject;
    end=start;
    label=patient;
    type="I";
run;

The input data needed a little processing as shown above. The format name was defined in the variable FMTNAME and the TYPE variable was assigned a value of I to make this a numeric informat; to make character equivalent, a value of J was assigned. The START and END variables declare the value range associated with the value of the LABEL variable that would become the actual value of the numeric identifier variable. The variable names are fixed because the next step will not work with different ones.

proc format lib=work cntlin=patient;
run;
quit;

To create the actual informat, the dataset is read by the FORMAT procedure with the CNTLIN parameter specifying the name of the input dataset and LIB defining the library where the format catalogue is stored. When this in complete, the informat is available for use with an input function as shown in the code excerpt below.

data ae1;
    set ae;
    patient=input(subject,patient.);
run;

Searching file contents using PowerShell

25th October 2018

Having made plenty of use of grep on the Linux/UNIX command and findstr on the legacy Windows command line, I wondered if PowerShell could be used to search the contents of files for a text string. Usefully, this turns out to be the case, but I found that the native functionality does not use what I have used before. The form of the command is given below:

Select-String -Path <filename search expression> -Pattern "<search expression>" > <output file>

While you can have the output appear on the screen, it always seems easier to send it to a file for subsequent use, and that is what I am doing above. The input to the -Path switch can be a filename or a wildcard expression, while that to the -Pattern can be a text string enclosed in quotes or a regular expression. Given that it works well once you know what to do, here is an example:

Select-String -Path *.sas -Pattern "proc report" > c:\temp\search.txt

The search.txt file then includes both the file information and the text that has been found for the sake of checking that you have what you want. What you do next is up to you.

Reloading .bashrc within a BASH terminal session

3rd July 2016

BASH is a command-line interpreter that is commonly used by Linux and UNIX operating systems. Chances are that you will find yourself in a BASH session if you start up a terminal emulator in many of these, though there are others like KSH and SSH too.

BASH comes with its own configuration files and one of these is located in your own home directory, .bashrc. Among other things, it can become a place to store command shortcuts or aliases. Here is an example:

alias us='sudo apt-get update && sudo apt-get upgrade'

Such a definition needs there to be no spaces around the equals sign, and the actual command to be declared in single quotes. Doing anything other than this will not work, as I have found. Also, there are times when you want to update or add one of these and use it without shutting down a terminal emulator and restarting it.

To reload the .bashrc file to use the updates contained in there, one of the following commands can be issued:

source ~/.bashrc

. ~/.bashrc

Both will read the file and execute its contents so you get those updates made available so you can continue what you are doing. There appears to be a tendency for this kind of thing in the world of Linux and UNIX because it also applies to remounting drives after a change to /etc/fstab and restarting system services like Apache, MySQL or Nginx. The command for the former is below:

sudo mount -a

Often, the means for applying the sorts of in-situ changes that you make are simple ones too, and anything that avoids system reboots has to be good since you have less work interruptions.

Smarter file renaming using PowerShell

14th November 2014

It appears that the Rename-Item commandlet in PowerShell is a very useful tool when it comes to smarter renaming of files. Even text substitution is a possibility, and what follows is an example that takes the output of the Dir command for listing the files in a directory and replaces hyphens with underscores in each one.

Dir | Rename-Item –NewName { $_.name –replace “-“,”_” }

The result is that something like the-file.txt becomes the_file.txt. This behaviour is reminiscent of the rename command found on Linux and UNIX systems, where regular expressions can be used, like in the following example that has the same result as the above:

rename 's/-/_/g' *

In both cases, you do need to be careful as to what files are in a directory for this, though the wildcard syntax on Linux or UNIX will be more familiar to anyone who has worked with files via almost any command line. Another thing to watch in the UNIX world is that * parses the whole directory structure, and that could be something that is not wanted for much of the time.

All of this is a far cry from the capabilities of the ren or rename command used in the days of MS-DOS and what has become the legacy Windows command line. Apart from simple renaming, any attempt at tweaking a filename through substitution ended up with the extra string getting appended to filenames when I tried it. Thus, the PowerShell option looks better in comparison.

Changing file timestamps using Windows PowerShell

29th October 2014

Recently, a timestamp got changed on an otherwise unaltered file on me and I needed to change it back. Luckily, I found an answer on the web that used PowerShell to do what I needed, and I am recording it here for future reference. The possible commands are below:

$(Get-Item temp.txt).creationtime=$(Get-Date "27/10/2014 04:20 pm")
$(Get-Item temp.txt).lastwritetime=$(Get-Date "27/10/2014 04:20 pm")
$(Get-Item temp.txt).lastaccesstime=$(Get-Date "27/10/2014 04:20 pm")

The first of these did not interest me, since I wanted to leave the file creation date as it was. The last write and access times were another matter because these needed altering. The Get-Item commandlet brings up the file, so its properties can be set. Here, these include creationtime, lastwritetime and lastaccesstime. The Get-Date commandlet reads in the provided date and time for use in the timestamp assignment. While PowerShell itself is case-insensitive, I have opted to show the camel case that is produced when you are tabbing through command options for the sake of clarity.

The Get-Item and Get-Date have aliases of gi and gd, respectively, and the Get-Alias commandlet will show you a full list while Get-Command (gcm) gives you a list of commandlets. Issuing the following gets you a formatted list that is sent to a text file:

gcm | Format-List > temp2.txt

There is some online help, but it is not quite as helpful as it ought to be, so I have popped over to Microsoft Learn whenever I needed extra enlightenment. Here is a command that pops the full thing into a text file:

Get-Help Format-List -full > temp3.txt

In fact, getting a book might be the best way to find your way around PowerShell because of all its commandlets and available objects.

For now, other commands that I have found useful include the following:

Get-Service | Format-List
New-Item -Name test.txt -ItemType "file"

The first of these gets you a list of services, while the second creates a new blank text file for you, and it can create new folders for you too. Other useful commandlets are below:

Get-Location (gl)
Set-Location (sl)
Copy-Item
Remove-Item
Move-Item
Rename-Item

The first of the above is like the cwd or pwd commands that you may have seen elsewhere, in that the current directory location is given. Then, the second will change your directory location for you. After that, there are commandlets for copying, deleting, moving and renaming files. These also have aliases, so users of the legacy Windows command line or a UNIX or Linux shell can use something that is familiar to them.

Little fixes like the one with which I started this piece are all good to know, but it is in scripting that PowerShell really is said to show its uses. Having seen the usefulness of such things in the world on Linux and UNIX, I cannot disagree with that, and PowerShell has its own IDE too. That may be just as well, given how much there is to learn. That especially is the case when you might need to issue the following command in a PowerShell session opened using the Run as Administrator option just to get the execution as you need it:

Set-ExecutionPolicy RemoteSigned

Issuing Get-ExecutionPolicy will show you if this is needed when the response is: Restricted. A response of RemoteSigned shows you that all is in order, though you need to check that any script you then run has no nasty payload in there, which is why execution is restrictive in the first place. This sort of thing is yet another lesson to be learnt with PowerShell.

Saving Windows Command Prompt & Powershell command history to a file for later useage

15th May 2013

It's remarkable what ideas Linux gives that you wouldn't encounter that clearly in the world of Windows. One of these is output and command line history, so a script can be created. In the Windows world, this would be called a batch file. Linux usefully has the history command, and it does the needful for taking a snapshot like so:

history > ~/commands.sh

All the commands stored in a terminal's command history get stored in the commands.sh in the user's home area. The command for doing the same thing from the Windows command line is not as obvious because it uses the doskey command that is intended for command line macro writing and execution. Usefully, it has a history option that tells it to output all the commands issued in a command line session. Unless, you create a file with them in there, there appears to be no way to store all those commands across sessions, unlike UNIX and Linux. Therefore, a command like the following is a partial solution that is more permanent than using the F7 key on your keyboard:

doskey /history > c:\commands.bat

Windows PowerShell has something similar too, and it even has aliases of history and even h. All PowerShell scripts have file extensions of ps1 and the example below follows that scheme:

get-history > c:\commands.ps1

However, I believe that even PowerShell doesn't carry over command history between sessions, though Microsoft is working on adding this useful functionality. While they could co-opt Cygwin of course, that doesn't seem to be their way of going about things.

Renaming multiple files in Linux

19th August 2012

The Linux and UNIX command mv has a number of limitations, such as not overwriting destination files and not renaming multiple files using wildcards. The only solution to the first that I can find is one that involves combining the cp and rm commands. For the second, there's another command: rename. Here's an example like what I used recently:

rename s/fedora/fedora2/ fedora.*

The first argument in the above command is a regular expression much like what Perl is famous for implementing; in fact, it is Perl-compatible ones (PCRE) that are used. The s before the first slash stands for substitute, with fedora being the string that needs to be replaced and fedora2 being what replaces it. The third command is the file name glob that you want to use, fedora.* in this case. Therefore, all files in a directory named fedora will be renamed fedora2 regardless of the file type. The same sort of operation can be performed for all files with the same extension when it needs to be changed, htm to html, for instance. Of course, there are other uses, but these are handy ones to know.

Shell swapping in Windows: PowerShell and the legacy command prompt

28th April 2010

Until the advent of PowerShell, Windows had been the poor relation when it came to working from the command line when compared with UNIX, Linux and so on. A recent bit of fiddling had me trying to run FTP from the legacy command prompt when I ran into problems with UNC address resolution (it's unsupported by the old technology) and mapping of network drives. It turned out that my error 85 was being caused by an unavailable drive letter that the net use command didn't reveal as being in use. Reassuringly, this wasn't a Vista issue that I couldn't circumvent.

During this spot of debugging, I tried running batch files in PowerShell and discovered that you cannot run them there like you would from the old command prompt. In fact, you need a line like the following:

cmd /c script.bat

In other words, you have to call cmd.exe like perl.exe, wscript.exe and cscript.exe for batch files to execute. If I had time, I might have got to exploring the use ps1 files for setting up PowerShell commandlets, but that is something that needs to wait until another time. What I discovered though is that UNC addressing can be used with PowerShell without the need for drive letter mappings, not a bad development at all. While on the subject of discoveries, I discovered that the following command opens up a command prompt shell from PowerShell without any need to resort to the Start Menu:

cmd /k

Entering the exit command returns you to the PowerShell command line again, and entering cmd /? reveals the available options for the command, so you need never be constrained by your own knowledge or its limitations.

  • The content, images, and materials on this website are protected by copyright law and may not be reproduced, distributed, transmitted, displayed, or published in any form without the prior written permission of the copyright holder. All trademarks, logos, and brand names mentioned on this website are the property of their respective owners. Unauthorised use or duplication of these materials may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

  • All comments on this website are moderated and should contribute meaningfully to the discussion. We welcome diverse viewpoints expressed respectfully, but reserve the right to remove any comments containing hate speech, profanity, personal attacks, spam, promotional content or other inappropriate material without notice. Please note that comment moderation may take up to 24 hours, and that repeatedly violating these guidelines may result in being banned from future participation.

  • By submitting a comment, you grant us the right to publish and edit it as needed, whilst retaining your ownership of the content. Your email address will never be published or shared, though it is required for moderation purposes.