Scripting languages | Technology Tales

Creating a data-driven informat in SAS

27^th September 2019

Recently, I needed to create some example data with an extra numeric identifier variable that would be assigned according to the value of a character identifier variable. Not wanting to add another dataset merge or join to the code, I decided to create an informat from data. Initially, I looked into creating a format instead, but it did not accomplish what I wanted to do.

data patient;
    keep fmtname start end label type;
    set test.dm;
    by subject;
    fmtname="PATIENT";
    start=subject;
    end=start;
    label=patient;
    type="I";
run;

The input data needed a little processing as shown above. The format name was defined in the variable FMTNAME and the TYPE variable was assigned a value of I to make this a numeric informat; to make character equivalent, a value of J was assigned. The START and END variables declare the value range associated with the value of the LABEL variable that would become the actual value of the numeric identifier variable. The variable names are fixed because the next step will not work with different ones.

proc format lib=work cntlin=patient; run; quit;

To create the actual informat, the dataset is read by the FORMAT procedure with the CNTLIN parameter specifying the name of the input dataset and LIB defining the library where the format catalogue is stored. When this in complete, the informat is available for use with an input function as shown in the code excerpt below.

data ae1;
    set ae;
    patient=input(subject,patient.);
run;

Searching file contents using PowerShell

25^th October 2018

Having made plenty of use of grep on the Linux/UNIX command and findstr on the legacy Windows command line, I wondered if PowerShell could be used to search the contents of files for a text string. Usefully, this turns out to be the case, but I found that the native functionality does not use what I have used before. The form of the command is given below:

Select-String -Path <filename search expression> -Pattern "<search expression>" > <output file>

While you can have the output appear on the screen, it always seems easier to send it to a file for subsequent use, and that is what I am doing above. The input to the -Path switch can be a filename or a wildcard expression, while that to the -Pattern can be a text string enclosed in quotes or a regular expression. Given that it works well once you know what to do, here is an example:

Select-String -Path *.sas -Pattern "proc report" > c:\temp\search.txt

The search.txt file then includes both the file information and the text that has been found for the sake of checking that you have what you want. What you do next is up to you.

Reloading .bashrc within a BASH terminal session

3^rd July 2016

BASH is a command-line interpreter that is commonly used by Linux and UNIX operating systems. Chances are that you will find yourself in a BASH session if you start up a terminal emulator in many of these, though there are others like KSH and SSH too.

BASH comes with its own configuration files and one of these is located in your own home directory, .bashrc. Among other things, it can become a place to store command shortcuts or aliases. Here is an example:

alias us='sudo apt-get update && sudo apt-get upgrade'

Such a definition needs there to be no spaces around the equals sign, and the actual command to be declared in single quotes. Doing anything other than this will not work, as I have found. Also, there are times when you want to update or add one of these and use it without shutting down a terminal emulator and restarting it.

To reload the .bashrc file to use the updates contained in there, one of the following commands can be issued:

source ~/.bashrc

. ~/.bashrc

Both will read the file and execute its contents so you get those updates made available so you can continue what you are doing. There appears to be a tendency for this kind of thing in the world of Linux and UNIX because it also applies to remounting drives after a change to /etc/fstab and restarting system services like Apache, MySQL or Nginx. The command for the former is below:

sudo mount -a

Often, the means for applying the sorts of in-situ changes that you make are simple ones too, and anything that avoids system reboots has to be good since you have less work interruptions.

Smarter file renaming using PowerShell

14^th November 2014

It appears that the Rename-Item commandlet in PowerShell is a very useful tool when it comes to smarter renaming of files. Even text substitution is a possibility, and what follows is an example that takes the output of the Dir command for listing the files in a directory and replaces hyphens with underscores in each one.

Dir | Rename-Item –NewName { $_.name –replace “-“,”_” }

The result is that something like the-file.txt becomes the_file.txt. This behaviour is reminiscent of the rename command found on Linux and UNIX systems, where regular expressions can be used, like in the following example that has the same result as the above:

rename 's/-/_/g' *

In both cases, you do need to be careful as to what files are in a directory for this, though the wildcard syntax on Linux or UNIX will be more familiar to anyone who has worked with files via almost any command line. Another thing to watch in the UNIX world is that * parses the whole directory structure, and that could be something that is not wanted for much of the time.

All of this is a far cry from the capabilities of the ren or rename command used in the days of MS-DOS and what has become the legacy Windows command line. Apart from simple renaming, any attempt at tweaking a filename through substitution ended up with the extra string getting appended to filenames when I tried it. Thus, the PowerShell option looks better in comparison.

Changing file timestamps using Windows PowerShell

29^th October 2014

Recently, a timestamp got changed on an otherwise unaltered file on me and I needed to change it back. Luckily, I found an answer on the web that used PowerShell to do what I needed, and I am recording it here for future reference. The possible commands are below:

$(Get-Item temp.txt).creationtime=$(Get-Date "27/10/2014 04:20 pm") $(Get-Item temp.txt).lastwritetime=$(Get-Date "27/10/2014 04:20 pm") $(Get-Item temp.txt).lastaccesstime=$(Get-Date "27/10/2014 04:20 pm")

The first of these did not interest me, since I wanted to leave the file creation date as it was. The last write and access times were another matter because these needed altering. The Get-Item commandlet brings up the file, so its properties can be set. Here, these include creationtime, lastwritetime and lastaccesstime. The Get-Date commandlet reads in the provided date and time for use in the timestamp assignment. While PowerShell itself is case-insensitive, I have opted to show the camel case that is produced when you are tabbing through command options for the sake of clarity.

The Get-Item and Get-Date have aliases of gi and gd, respectively, and the Get-Alias commandlet will show you a full list while Get-Command (gcm) gives you a list of commandlets. Issuing the following gets you a formatted list that is sent to a text file:

gcm | Format-List > temp2.txt

There is some online help, but it is not quite as helpful as it ought to be, so I have popped over to Microsoft Learn whenever I needed extra enlightenment. Here is a command that pops the full thing into a text file:

Get-Help Format-List -full > temp3.txt

In fact, getting a book might be the best way to find your way around PowerShell because of all its commandlets and available objects.

For now, other commands that I have found useful include the following:

Get-Service | Format-List New-Item -Name test.txt -ItemType "file"

The first of these gets you a list of services, while the second creates a new blank text file for you, and it can create new folders for you too. Other useful commandlets are below:

Get-Location (gl) Set-Location (sl) Copy-Item Remove-Item Move-Item Rename-Item

The first of the above is like the cwd or pwd commands that you may have seen elsewhere, in that the current directory location is given. Then, the second will change your directory location for you. After that, there are commandlets for copying, deleting, moving and renaming files. These also have aliases, so users of the legacy Windows command line or a UNIX or Linux shell can use something that is familiar to them.

Little fixes like the one with which I started this piece are all good to know, but it is in scripting that PowerShell really is said to show its uses. Having seen the usefulness of such things in the world on Linux and UNIX, I cannot disagree with that, and PowerShell has its own IDE too. That may be just as well, given how much there is to learn. That especially is the case when you might need to issue the following command in a PowerShell session opened using the Run as Administrator option just to get the execution as you need it:

Set-ExecutionPolicy RemoteSigned

Issuing Get-ExecutionPolicy will show you if this is needed when the response is: Restricted. A response of RemoteSigned shows you that all is in order, though you need to check that any script you then run has no nasty payload in there, which is why execution is restrictive in the first place. This sort of thing is yet another lesson to be learnt with PowerShell.

Saving Windows Command Prompt & Powershell command history to a file for later useage

15^th May 2013

It's remarkable what ideas Linux gives that you wouldn't encounter that clearly in the world of Windows. One of these is output and command line history, so a script can be created. In the Windows world, this would be called a batch file. Linux usefully has the history command, and it does the needful for taking a snapshot like so:

history > ~/commands.sh

All the commands stored in a terminal's command history get stored in the commands.sh in the user's home area. The command for doing the same thing from the Windows command line is not as obvious because it uses the doskey command that is intended for command line macro writing and execution. Usefully, it has a history option that tells it to output all the commands issued in a command line session. Unless, you create a file with them in there, there appears to be no way to store all those commands across sessions, unlike UNIX and Linux. Therefore, a command like the following is a partial solution that is more permanent than using the F7 key on your keyboard:

doskey /history > c:\commands.bat

Windows PowerShell has something similar too, and it even has aliases of history and even h. All PowerShell scripts have file extensions of ps1 and the example below follows that scheme:

get-history > c:\commands.ps1

However, I believe that even PowerShell doesn't carry over command history between sessions, though Microsoft is working on adding this useful functionality. While they could co-opt Cygwin of course, that doesn't seem to be their way of going about things.

Renaming multiple files in Linux

19^th August 2012

The Linux and UNIX command mv has a number of limitations, such as not overwriting destination files and not renaming multiple files using wildcards. The only solution to the first that I can find is one that involves combining the cp and rm commands. For the second, there's another command: rename. Here's an example like what I used recently:

rename s/fedora/fedora2/ fedora.*

The first argument in the above command is a regular expression much like what Perl is famous for implementing; in fact, it is Perl-compatible ones (PCRE) that are used. The s before the first slash stands for substitute, with fedora being the string that needs to be replaced and fedora2 being what replaces it. The third command is the file name glob that you want to use, fedora.* in this case. Therefore, all files in a directory named fedora will be renamed fedora2 regardless of the file type. The same sort of operation can be performed for all files with the same extension when it needs to be changed, htm to html, for instance. Of course, there are other uses, but these are handy ones to know.

Shell swapping in Windows: PowerShell and the legacy command prompt

28^th April 2010

Until the advent of PowerShell, Windows had been the poor relation when it came to working from the command line when compared with UNIX, Linux and so on. A recent bit of fiddling had me trying to run FTP from the legacy command prompt when I ran into problems with UNC address resolution (it's unsupported by the old technology) and mapping of network drives. It turned out that my error 85 was being caused by an unavailable drive letter that the net use command didn't reveal as being in use. Reassuringly, this wasn't a Vista issue that I couldn't circumvent.

During this spot of debugging, I tried running batch files in PowerShell and discovered that you cannot run them there like you would from the old command prompt. In fact, you need a line like the following:

cmd /c script.bat

In other words, you have to call cmd.exe like perl.exe, wscript.exe and cscript.exe for batch files to execute. If I had time, I might have got to exploring the use ps1 files for setting up PowerShell commandlets, but that is something that needs to wait until another time. What I discovered though is that UNC addressing can be used with PowerShell without the need for drive letter mappings, not a bad development at all. While on the subject of discoveries, I discovered that the following command opens up a command prompt shell from PowerShell without any need to resort to the Start Menu:

cmd /k

Entering the exit command returns you to the PowerShell command line again, and entering cmd /? reveals the available options for the command, so you need never be constrained by your own knowledge or its limitations.

About Perl's Binding Operator

20^th May 2009

While this piece is as much an aide de memoire for myself as anything else, putting it here seems worthwhile if it answers questions for others. The binding operators, =~ or !~, come in handy when you are framing conditional statements in Perl using Regular Expressions, for example, testing whether x =~ /\d+/ or not. The =~ variant is also used for changing strings using the s/[pattern1]/[pattern2]/ regular expression construct (here, s stands for "substitute"). What has brought this to mind is that I wanted to ensure that something was done for strings that did not contain a certain pattern, and that's where the !~ binding operator came in useful; ^~ might have come to mind for some reason, but it wasn't what I needed.

Are ten seconds enough?

27^th April 2009

Fasthosts, the hosting provider for what you find here, has, in their wisdom, decided to limit the execution time for ASP scripts to 15 seconds and 10 seconds for any others. I haven't used Perl sufficiently in this shared hosting set up to determine how that is affected. In contrast, I can share my experiences on the PHP side and you may have noticed occasional glitches. They have also disabled the set_time_limit PHP function, so you cannot easily address the matter yourself where you need to do it. You almost get the feeling that they don't trust the abilities, actions and oversight of their users. Personally, I reckon that the ten-second limit is too short and that something of the order of 20 or 30 seconds would be better. If it all gets too restrictive, I suppose that there are other providers, though I think that I would avoid resellers after a previous less than glorious experience. There's the dedicated server option too if I was feeling flush, not so likely given the economic times in which we live.

« Older Entries « » Newer Entries »