High-level programming languages

Expanding the coding toolkit: Adding R and Python in a changing landscape

10^th April 2021

Over the years, I have taught myself a number of computing languages, with some coming in useful for professional work while others came in handy for website development and maintenance. The collection has grown to include HTML, CSS, XML, Perl, PHP and UNIX Shell Scripting. The ongoing pandemic allowed to me add two more to the repertoire: R and Python.

My interest in these arose from my work as an information professional concerned with standardisation and automation of statistical results delivery. To date, the main focus has been on clinical study data, but ongoing changes in the life sciences sector could mean that I may need to look further afield, so having extra knowledge never hurts. Though I have been a SAS programmer for more than twenty years, its predominance in the clinical research field is not what it was, which causes me to rethink things.

As it happens, I would like to continue working with SAS since it does so much and thoughts of leaving it after me bring sadness. It also helps to know what the alternatives might be and to reject some management hopes about any newcomers, especially regarding the amount of code being produced and the quality of graphs being created. Use cases need to be assessed dispassionately, even when emotions loom behind the scenes.

Since both R and Python bring large scripting ecosystems with active communities, the attraction of their adoption makes a deal of sense. SAS is comparable in the scale of its own ecosystem, though there are considerable differences and the platform is catching up when it comes to Data Science. While the aforementioned open-source languages may have had a head start, it appears that others are not standing still either. It is a time to have wider awareness, and online conference attendance helps with that.

The breadth of what is available for any programming language more than stymies any attempt to create a truly all encompassing starting point, and I have abandoned thoughts of doing anything like that for R. Similarly, I will not even try such a thing for Python. Consequently, this means that my sharing of anything learned will be in the form of discrete postings from time to time, especially given ho easy it is to collect numerous website links for sharing.

The learning has been facilitated by ongoing pandemic restrictions, though things are opening up a little now. The pandemic also has given us public data that can be used for practice, since much can be gained from having one's own project instead of completing exercises from a book. Having an interesting data set with which to work is a must, and COVID-19 data contain a certain self-interest as well, while one remains mindful of the suffering and loss of life that has been happening since the pandemic first took hold.

Using multi-line commenting in Perl to inactivate blocks of code during testing

26^th December 2019

Recently, I needed to inactivate blocks of code in a Perl script while doing some testing. Since this is something that I often do in other computing languages, I sought the same in Perl. To accomplish that, I need to use the POD methodology. This meant enclosing the code as follows.

=start

<< Code to be inactivated by inclusion in a comment >>

=cut

While the =start line could use any word after the equality sign, it seems that =cut is required to close the multi-line comment. If this was actual programming documentation, then the comment block should include some meaningful text for use with perldoc. However, that was not a concern here because the commenting statements would be removed afterwards anyway. It also is good practice not to leave commented code in a production script or program to avoid any later confusion.

In my case, this facility allowed me to isolate the code that I had to alter and test before putting everything back as needed. It also saved time since I did not need to individually comment out every executable line because multiple lines could be inactivated at a time.

Using the IN operator in SAS Macro programming

8^th October 2012

This useful addition came in SAS 9.2, and I am amazed that it isn’t enabled by default. To accomplish that, you need to set the MINOPERATOR option, unless someone has done it for you in the SAS AUTOEXEC or another configuration program. Thus, the safety first approach is to have code like the following:

options minoperator;

%macro inop(x);
    %if &x in (a b c) %then %do;
        %put Value is included;
    %end;
    %else %do;
        %put Value not included;
    %end;
%mend inop;

%inop(a);

Also, the default delimiter is the space, so if you need to change that, then the MINDELIMITER option needs setting. Adjusting the above code so that the delimiter now is the comma character gives us the following:

options minoperator mindelimiter=",";

%macro inop(x);
    %if &x in (a b c) %then %do;
        %put Value is included;
    %end;
    %else %do;
        %put Value not included;
    %end;
%mend inop;

%inop(a);

Without any of the above, the only approach is to have the following, and that is what we had to do for SAS versions before 9.2:

%macro inop(x);
    %if &x=a or &x=b or &x=c %then %do;
        %put Value is included;
    %end;
    %else %do;
        %put Value not included;
    %end;
%mend inop;

%inop(a);

While it may be clunky, it does work and remains a fallback in newer versions of SAS. Saying that, having the IN operator available makes writing SAS Macro code that little bit more swish, so it's a good thing to know.

Understanding Perl binding operators for pattern matching

20^th May 2009

While this piece is as much an aide de memoire for myself as anything else, putting it here seems worthwhile if it answers questions for others. The binding operators, =~ or !~, come in handy when you are framing conditional statements in Perl using Regular Expressions, for example, testing whether x =~ /\d+/ or not. The =~ variant is also used for changing strings using the s/[pattern1]/[pattern2]/ regular expression construct (here, s stands for "substitute"). What has brought this to mind is that I wanted to ensure that something was done for strings that did not contain a certain pattern, and that's where the !~ binding operator came in useful; ^~ might have come to mind for some reason, but it wasn't what I needed.

Controlling the post revision feature in WordPress 2.6

21^st July 2008

While this may seem esoteric for some, I like to be in charge of the technology that I use. So, when Automattic included post revision retention to WordPress 2.6, I had my reservations about how much it would clutter my database with things that I didn't need. Thankfully, there is a way to control the feature, but you won't find the option in the administration screens (they seem to view this as an advanced setting and so don't want to be adding clutter to the interface for the sake of something that only a few might ever use); you have to edit wp-config.php yourself to add it. Here are the lines that can be added and the effects that they have:

Code: define('WP_POST_REVISIONS','0');

Effect: turns off post revision retention

Code: define('WP_POST_REVISIONS','-1');

Effect: turns it on (the default setting)

Code: define('WP_POST_REVISIONS','2');

Effect: only retains two previous versions of a post (the number can be whatever you want, so long as it's an integer with a value more than zero).

Update (2008-07-23):

There is now a plugin from Dion Hulse that does the above for you and more.

java.net.MalformedURLException: unknown protocol: j

15^th December 2007

While I know that there are better things to call a blog post than to use part of an error message that I got from Saxonica's Saxon when I was converting XML files into PHP equivalents for the visitor information section of my main website, it is handy for anyone else needing to look up a solution when they encounter it. In my case, I use the open source Saxon-B rather than the commercial Saxon-SA, and it fulfils all of my needs. Version 8 and later (it has now reached 9.0.0.2) handle the XSLT 2.0 features that I need to make the transformations really clever.

Also, because Saxon is available as a jar file, it is cross-platform so long as you have Java on board. There are, however, some slight differences in behaviour. Now, I run the thing on Linux, where any Windows-style file locations are not recognised. When I had the file path in a DTD declaration starting with J:\, that was thought to be a protocol like file, http, https, ftp and so on because of the colon. Since there's no j protocol, Java gets confused, issuing the rather obscure error that titles this post. Otherwise, the migration of the Perl script that creates XSLT files and fires off the required XML to PHP transformations was a fairly straightforward exercise once file locations and shebang line were set right.

Using SAS FILE statements with the MOD option to append output to plain text files

25^th November 2007

SAS can generate many types of output: plain text, XML, PDF, RTF, Excel, etc. With all of these and the SAS procedures like PROC REPORT, PROC TABULATE and so on, it might seem surprising for me to say that I have been generating output with data step PUT and FILE statements. There was, of course, a reason for this: creating text files for loading into a new database-driven software application. At one stage, I also did some data interleaving at the output stage and that's when I discovered that the default behaviour for SAS FILE statements is to completely overwrite a file unless the MOD option was specified. Adding that switches on APPEND behaviour. The code below adds a header in one step, while adding data below it in another. While I know that there are better ways to achieve this, like setting up your data as you want it or using _N_ to ensure that something only appears once, here's another way. As per the Perl, there's often more than one way to do something with SAS.

data _null_;
    file ds_data;
    put "fieldtype;datasetname;datasetlabel;datasetlayout;datasetclass;datasetstandardversion";
run;

data _null_;
    set ds_ispec;
    file ds_data mod;
    line="datasetstandard;"||trim(memname)||";"||trim(memlabel)||";;;"||trim(memver);
    put line;
run;

Setting up a test web server on Ubuntu

1^st November 2007

Installing all the bits and pieces is painless enough so long as you know what's what; Synaptic does make it thus. Interestingly, Ubuntu's default installation is a lightweight affair with the addition of any additional components involving downloading the packages from the web. The whole process is all very well integrated and doesn't make you sweat every time you need to install additional software. In fact, it resolves any dependencies for you so that those packages can be put in place too; it lists them, you select them and Synaptic does the rest.

Returning to the job in hand, my shopping list included Apache, Perl, PHP and MySQL, the usual suspects in other words. Perl was already there, as it is on many UNIX systems, so installing the appropriate Apache module was all that was needed. PHP needed the base installation as well as the additional Apache module. MySQL needed the full treatment too, though its being split up into different pieces confounded things a little for my tired mind. Then, there were the MySQL modules for PHP to be set in place too.

The addition of Apache preceded all of these, but I have left it until now to describe its configuration, something that took longer than for the others; the installation itself was as easy as it was for the others. However, what surprised me were the differences in its configuration set up when compared with Windows. There are times when we get the same software but on different operating systems, which means that configuration files get set up differently. The first difference is that the main configuration file is called apache2.conf on Ubuntu rather than httpd.conf as on Windows. Like its Windows counterpart, Ubuntu's Apache does use subsidiary configuration files. However, there is an additional layer of configurability added courtesy of a standard feature of UNIX operating systems: symbolic links. Rather than having a single folder with the all configuration files stored therein, there are two pairs of folders, one pair for module configuration and another for site settings: mods-available/mods-enabled and sites-available/sites-enabled, respectively. In each pair, there is a folder with all the files and another containing symbolic links. It is the presence of a symbolic link for a given configuration file in the latter that activates it. I learned all this when trying to get mod_rewrite going and changing the web server folder from the default to somewhere less susceptible to wrecking during a re-installation or, heaven forbid, a destructive system crash. It's unusual, but it does work, even if it takes that little bit longer to get things sorted out when you first meet up with it.

Apart from the Apache set up and finding the right things to install, getting a test web server up and running was a fairly uneventful process. All's working well now, and I'll be taking things forward from here; making website Perl scripts compatible with their new world will be one of the next things that need to be done.

Numeric for loops in Korn shell scripting: from ksh88 to ksh93

18^th October 2007

The time-honoured syntax for a for loop in a UNIX script is what you see below, and that is what works with the default shell in Sun's Solaris UNIX operating system, ksh88.

for i in 1 2 3 4 5 6 7 8 9 10
do
    if [[ -d dir$i ]]
    then
        :
    else
        mkdir dir$i
    fi
done

However, there is a much nicer syntax supported since the advent of ksh93. It follows C language conventions found in all sorts of places like Java, Perl, PHP and so on. Here is an example:

for (( i=1; i<11; i++ ))
do
    if [[ -d dir$i ]]
    then
        :
    else
        mkdir dir$i
    fi
done

Perl vs. PHP: A Personal Experience

11^th June 2007

Ever since I converted it from a client-side JavaScript-powered affair, my online photo gallery has been written in Perl. There have been some challenges along the way, figuring out how to use hash tables has been one, but everything has worked as expected. However, I am now wondering if it is better to write things in PHP for the sake of consistency with the rest of the website. I had a go a rewriting the random photo page and, unless I have been missing something in the Perl world, things do seem more succinct with PHP. For instance, actions that formerly involved several lines of code can now be achieved in one. Reading the contents of a file into an array and stripping HTML/XML tags from a string fall into this category, and seeing the number of lines of code halving is a striking observation. I am not going to completely abandon Perl, it's a very nice language, but I do rather suspect that there is now an increased chance of my having a website whose server-side processing needs are served entirely by PHP.

« Older Entries « » Newer Entries »